TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Audio-Visual Speech Recognition	LRW	PBL	Top-1 Accuracy	98.3	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/part-based-lipreading-for-audio-visual-speech/audio-visual-speech-recognition-on-lrw)](https://paperswithcode.com/sota/audio-visual-speech-recognition-on-lrw?p=part-based-lipreading-for-audio-visual-speech)`

Part-based Lipreading for Audio-Visual Speech Recognition

IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2020 · Ziling Miao, Hong Liu, Bing Yang ·

Lipreading is an important component of audio-visual speech recognition. However, lips are usually modeled as a whole in lipreading, which ignores that each part of lip focuses on different characteristics of mouth and the overall model can not fit each part perfectly. Besides, features based on the whole lip usually vary a lot according to different speakers, which leads that the training databases usually need to contain as much speakers as possible. In this paper, A part-based lipreading (PBL) method is proposed to deal with the mismatch between an overall lip model and the separate parts of lips, also the excessive dependence of models on the speakers in training set. PBL models lips partly and predicts jointly. It employs a uniform partition strategy on convolutional features and generates several part-level sub-results for final prediction. Experiments are performed on a large publicly available dataset (LRW) and part of it (p-LRW, 65 words), in order to simulate the progressive instructions in the working scene of robots. Word accuracy of PBL reaches 82.8% on LRW and 88.9% on p-LRW. Finally, an end-to-end audio-visual speech recognition system using PBL is established and achieves 98.3% word accuracy on LRW.

PDF