Chinese Mandarin Lip Reading (CMLR) Dataset


CMLR dataset was collected by the Visual Intelligence and Pattern Analysis (VIPA) group of Zhejiang University. It was designed to facilitate research on visual speech recognition, sometimes also referred to as automatic lip reading.

The dataset consists of 102,072 spoken sentences from 11 speakers, recorded between June 2009 and June 2018 from national news program “News Broadcast”. Each sentence is up to 29 Chinese characters in length and does not contain English letters, Arabic numerals and rare punctuation. The alignment boundary of each word (in seconds) is also included in the sentence. The dataset statistics are given in the table below.

Set # sentences # phrases # characters
Train 71,448 22,959 3,360
Validation 10,206 10,898 2,540
Test 20,418 14,478 2,834
All 102,072 25,633 3,517


The CMLR dataset is public to universities and research institutes for research purpose only. Before using the CMLR dataset, you are recommended to refer to the following paper:

[1] Ya Zhao, Rui Xu, and Mingli Song. A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading. ACM International Conference on Multimedia in Asia 2019

[2] Ya Zhao, Rui Xu, Xinchao Wang, Peng Hou, Haihong Tang, Mingli Song. Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers. The Thirty-Fourth AAAI Conference on Artificial Intelligence

Download Link: (Extraction code: emqx )