CMLR dataset was collected by the Visual Intelligence and Pattern Analysis (VIPA)
group of Zhejiang University. It was designed to facilitate research on visual speech recognition, sometimes also referred
to as automatic lip reading.
The dataset consists of 102,072 spoken sentences from 11 speakers, recorded between June 2009 and June 2018
from national news program “News Broadcast”. Each sentence is up to 29 Chinese characters in length and does not
contain English letters, Arabic numerals and rare punctuation. The alignment boundary of each word (in
seconds) is also included in the sentence. The dataset statistics are given in the table below.
Set
# sentences
# phrases
# characters
Train
71,448
22,959
3,360
Validation
10,206
10,898
2,540
Test
20,418
14,478
2,834
All
102,072
25,633
3,517
Downloads
The CMLR dataset is public to universities and research institutes for research
purpose only. Before using the CMLR dataset, you are recommended to refer to the following paper:
[1] Ya Zhao, Rui Xu, and Mingli Song. A Cascade Sequence-to-Sequence Model for
Chinese
Mandarin Lip Reading. ACM International Conference on Multimedia in Asia 2019
[2] Ya Zhao, Rui Xu, Xinchao Wang, Peng Hou, Haihong Tang, Mingli Song. Hearing
Lips: Improving Lip Reading by Distilling Speech Recognizers. The Thirty-Fourth AAAI Conference on Artificial
Intelligence