Abstract: Fluent and confident speech is desirable to every speaker. But professional speech delivering requires a great deal of experience and practice. In this project, we propose a speech stream manipulation system which can help non-professional speakers to produce fluent, professional-like speech content, in turn contributing towards better listener engagement and comprehension. We propose to achieve this task by manipulating the disfluencies in human speech, like the sounds uh and um called the filler words and awkward long silences. Given any unrehearsed speech we segment and silence the filled pauses and doctor the duration of imposed silence as well as other long pauses (disfluent) by a predictive model learned using professional speech dataset. Finally, we output a audio stream in which speaker is expected to sound more fluent, confident and practiced compared to the original speech he/she recorded. According to our quantitative evaluation, we significantly increase the fluency of speech by reducing rate of pauses and fillers.
These are some example outputs of our system-
Either choose from following samples or upload a speech recording. Make sure to record with minimal background noise for best results. Wait for a while for the audio to get processed after uploading!
Some people have unintentional behavior while speaking publicly that may affect how listeners receive their speech. We are looking to help in such situations with computational tools as this, built from publicly available datsets. To that end we are asking for your unrehearsed speech.
This tool completely for your aid. Any data will not be stored, and not shared publicly. Tool involves transferring the recorded audio to the server but as soon as you close the session the your recording is deleted. Feel free to try!
If you have any questions about the tool, please feel free to contact-