Unit selection speech synthesis for text-to-speech systems
Date of Issue2017-04-25
School of Computer Science and Engineering
Institute for Infocomm Research (I2R)
Speech is the means of communication in the vocal form, used to express one’s emotions, thoughts and feelings. Research in the field of speech generation has been ongoing for several decades, and it has evidently made significant progress with the introduction of systems like Siri, Alexa and Google Assistant. With a rise in conversational form of interactions between humans and computers, it becomes crucial to make the speech technology as realistic, reliable and intelligent to be useful to the masses. Several techniques have been developed and explored, which has helped incorporate these systems into our everyday lives like automated responses on the telephones, announcements on the train or metro station or as an aid to those who are visually blind or those who have lost their ability to speak. Despite the complexities and the challenges involved, it comes as no surprise that this field has received a lot of attention and resources during the last few decades, with the main goal of creating systems that mimic human understanding of speech. This report focuses on the concatenative synthesis approach to build the text-to-speech system, while maintaining speech intelligibility and quality at appropriate levels.
Final Year Project (FYP)
Nanyang Technological University