Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT Model by me, Manish Sharma and Rob Clark is an attempt to marry the two worlds of Natural Language Understanding (NLU) and Text-To-Speech.
the idea is that the prosody of synthetic speech improves if the a BERT model is involved, as BERT models incorprate syntactic en semantic (world) knowledge.
StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes by Manish Sharma, me and Rob Clark is about distilling Parallel WaveNet models.
Parallalel Wavenet student models are typically distilled using the original dataset the teacher WaveNet model was trained on.
This doesn't work all that well if that dataset is relatively small, and the idea of this paper is to add additional synthesized speech samples (generated by the teacher model) to the dataset the used for distilling student model. Nice and simple, and it works!
This paper describes the variational auto-encoder (VAE) network used currently for text-to-speech (TTS) synthesis in the Google Assistant for the most frequently used voices.
Enjoy reading it!
Friday December 15 2017 I successfully defended my thesis, Text Understanding for Computers, at the Agnietenkapel in Amsterdam.
Many thanks to my committee members: prof. dr. Krisztian Balog (University of Stavanger), prof. dr. Antal van den Bosch (Radboud University, Meertens Instituut), prof. dr. Franciska de Jong (Utrecht University), dr. Evangelos Kanoulas (University of Amsterdam), dr. Christof Monz (University of Amsterdam), prof. dr. Khalil Sima'an (University of Amsterdam), dr. Aleksandr Chuklin (Google Research) and dr. Claudia Hauff (Delft University of Technology). Also, many thanks to my co-promotor Joris van Eijnatten (Utrecht University), and most of all, to my supervisor Maarten de Rijke.
Here is a PDF of the book.
Stay tuned for the PDF...
The final slides are now available on nn4ir.com.
Here is the pre-print on arXiv.
Here is the link to the interview on the New Scienist website.
BTW, I also designed the logo... ;-)
Camera-ready PDFs will follow shortly.
The algorithms I developed to monitor changes in vocabulary over time will be implemented in a tool that discloses a corpus of digitized historical Dutch newspapers (covering the last four centuries) used by digital humanities scholars.