< Back to previous page

Project

Reinforcement learning of articulatory gestures of speech, and their usability in speech recognition (FWOTM1047)

Modern automatic speech recognition systems still do not match
human speech recognition abilities, even though they are trained with
superhuman amount of speech material. Human children learn to
recognize speech perfectly from a fraction of that speech input.
Humans learn not only to recognize, but also to produce speech
sounds, and the motor areas corresponding to speech production in
the brain are involved also in speech perception. Previous research
has shown that the production modality of speech, i.e. physical
speech articulation, can be a useful as a complementary
representation when recognizing speech. However, the existing
research on this field rely on measured articulatory data, or
phonetically programmed speech synthesizers. I hypothesize that if
speech articulation is learned autonomously using artificial
intelligence, via a modern deep Q-learning agent without human
programming of articulatory gestures or measured articulatory data,
the learned articulations are more natural and allow for more robust
speech representations than seen in previous research. During my
senior postdoctoral fellowship, I will implement a learning agent
whose goal is to learn to imitate the articulations of real recorded
human speech input. The agent must learn auditory speech
representations that make articulatory reproduction of the heard
speech sounds possible. These speech representations are expected
to be robust, and of benefit also for traditional speech recognition
purposes.
Date:1 Oct 2021 →  Today
Keywords:Speech acquisition, articulatory modeling, speech perception
Disciplines:Psycholinguistics and neurolinguistics, Language acquisition, Machine learning and decision making, Signal processing not elsewhere classified, Audio and speech computing