Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

MowCowWhoHow III

(2,103 posts)
Sat Nov 5, 2016, 12:48 PM Nov 2016

DeepMind Lipreading AI achieves 93% accuracy

Abstract: Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first lipreading model to operate at sentence-level, using a single end-to-end speaker-independent deep model to simultaneously learn spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 93.4% accuracy, outperforming experienced human lipreaders and the previous 79.6% state-of-the-art accuracy.

TL;DR: LipNet is the first lipreading model to operate at sentence-level, using a single end-to-end speaker-independent deep model to simultaneously learn spatiotemporal visual features and a sequence model.

http://openreview.net/forum?id=BkjLkSqxg


4 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
DeepMind Lipreading AI achieves 93% accuracy (Original Post) MowCowWhoHow III Nov 2016 OP
That's a bit worrisome for future privacy considering how many cameras there are. Foggyhill Nov 2016 #1
Next challenge: automating the production of Bad Lipreading videos ! nt eppur_se_muova Nov 2016 #2
Mandatory video: DetlefK Nov 2016 #3
I have a fool proof solution to beat the lip reading AI!!! Javaman Nov 2016 #4

Foggyhill

(1,060 posts)
1. That's a bit worrisome for future privacy considering how many cameras there are.
Sat Nov 5, 2016, 12:59 PM
Nov 2016

In London they are EVERYWHERE.
Latest Discussions»Culture Forums»Science»DeepMind Lipreading AI ac...