DOI: 10.1101/471581Nov 18, 2018Paper

How predictive can be predictions in the neurocognitive processing of auditory and audiovisual speech? A deep learning study.

BioRxiv : the Preprint Server for Biology
Thomas HueberJean-Luc Schwartz


Sensory processing is increasingly conceived in a predictive framework in which neurons would constantly process the error signal resulting from the comparison of expected and observed stimuli. Surprisingly, few data exist on the amount of predictions that can be computed in real sensory scenes. Here, we focus on the sensory processing of auditory and audiovisual speech. We propose a set of computational models based on artificial neural networks (mixing deep feed-forward and convolutional networks) which are trained to predict future audio observations from 25ms to 250ms past audio or audiovisual observations (i.e. including lip movements). Experiments are conducted on the multispeaker NTCD-TIMIT audiovisual speech database. Predictions are efficient in a short temporal range (25-50ms), predicting 40 to 60% of the variance of the incoming stimulus, which could result in potentially saving up to 2/3 of the processing power. Then they quickly decrease to vanish after 100ms. Adding information on the lips slightly improves predictions, with a 5 to 10% increase in explained variance. Interestingly the visual gain vanishes more slowly, and the gain is maximum for a delay of 75ms between image and predicted sound.

Related Concepts

Reversal Learning
Research Study
Patient Observation
Sensory Process
Feeding of Newborn
Motor Speech Treatment Using Audiovisual Equipment

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.