Welcome to DU!
The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards.
Join the community:
Create a free account
Support DU (and get rid of ads!):
Become a Star Member
Latest Breaking News
General Discussion
The DU Lounge
All Forums
Issue Forums
Culture Forums
Alliance Forums
Region Forums
Support Forums
Help & Search
Science
Related: About this forumDeepMind Lipreading AI achieves 93% accuracy
Abstract: Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first lipreading model to operate at sentence-level, using a single end-to-end speaker-independent deep model to simultaneously learn spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 93.4% accuracy, outperforming experienced human lipreaders and the previous 79.6% state-of-the-art accuracy.
TL;DR: LipNet is the first lipreading model to operate at sentence-level, using a single end-to-end speaker-independent deep model to simultaneously learn spatiotemporal visual features and a sequence model.
http://openreview.net/forum?id=BkjLkSqxg
TL;DR: LipNet is the first lipreading model to operate at sentence-level, using a single end-to-end speaker-independent deep model to simultaneously learn spatiotemporal visual features and a sequence model.
http://openreview.net/forum?id=BkjLkSqxg
InfoView thread info, including edit history
TrashPut this thread in your Trash Can (My DU » Trash Can)
BookmarkAdd this thread to your Bookmarks (My DU » Bookmarks)
4 replies, 1062 views
ShareGet links to this post and/or share on social media
AlertAlert this post for a rule violation
PowersThere are no powers you can use on this post
EditCannot edit other people's posts
ReplyReply to this post
EditCannot edit other people's posts
Rec (4)
ReplyReply to this post
4 replies
= new reply since forum marked as read
Highlight:
NoneDon't highlight anything
5 newestHighlight 5 most recent replies
DeepMind Lipreading AI achieves 93% accuracy (Original Post)
MowCowWhoHow III
Nov 2016
OP
That's a bit worrisome for future privacy considering how many cameras there are.
Foggyhill
Nov 2016
#1
Foggyhill
(1,060 posts)1. That's a bit worrisome for future privacy considering how many cameras there are.
In London they are EVERYWHERE.
eppur_se_muova
(36,299 posts)2. Next challenge: automating the production of Bad Lipreading videos ! nt
DetlefK
(16,423 posts)3. Mandatory video:
Javaman
(62,534 posts)4. I have a fool proof solution to beat the lip reading AI!!!