Automatic Speech Recognition Systems are becoming more and more relevant in everyday household and our purpose is to improve the robustness of Automatic Speech Recognition (ASR) systems for non-native speakers.
Automatic Speech Recognition Systems are becoming more and more relevant in everyday household use, as intelligent Virtual Assistants - such as Alexa and Siri - must be able to quickly understand and respond to the verbal requests of their owners. Such Automatic Speech Recognition systems require large amounts of (training) data in order to achieve qualitative levels of precision and as such, there is a stronger than ever before demand to increase the size of accented speech datasets. In this sense, we have worked on several projects that have as a purpose generating qualitative, accented speech corpuses via synthetic methods involving deep learning. The success of our approach would eliminate the need of acquiring human resources, qualitative recording gear and studio time - thus allowing easy access to virtually unlimited volumes of new, unseen accented speech at only a fraction of the current costs.
We have worked on several projects that have as a purpose improving the robustness of Automatic Speech Recognition (ASR) systems for non-native speakers. The variability in speech voices and accents pose a significant challenge to state-of-the-art speech systems. We strive to increase the variance of our datasets by augmenting with new samples using various methods such as:
The past few years have seen exciting developments in the use of deep neural networks to synthesize natural-sounding human speech. Those powerful state-of-the-art deep learning architectures enable us to obtain qualitative synthesized speech, unachievable in the past. Some of the methods we are currently investigating are:
Using the aforementioned tools and techniques, we are building systems with state-of-the-art performance that can: