I am really looking forward to giving a talk at the research group I was part of when I did my PhD. I wouldn't be surprised somehow, btw, if the audience is not completely up to speed with all ins and outs of speech synthesis/TTS (I certainly wasn't back when I was still there). So, it is an interesting challenge for me to come up with a nice talk anyway!
Slides coming soon...
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks by Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, myself, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu and Rob Clark was accepted to INTERSPEECH 2022.
This paper is about transferring the accent of one speaker to another speaker, who does not have that accent, while preserving the speaker characteristics of the target speaker.
High quality transfer models are available, but they are typically expensive to run, and they can have reliability issues.
Other models may be more efficient and reliable, but they might not be as good at accent transfer.
This paper shows how to use speech data generated by the high quality, but expensive model, to train an efficient and reliable model.
by me, Manish Sharma and Rob Clark is an attempt to marry the two worlds of Natural Language Understanding (NLU) and Text-To-Speech.
the idea is that the prosody of synthetic speech improves if the a BERT model is involved, as BERT models incorprate syntactic en semantic (world) knowledge.
StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes by Manish Sharma, me and Rob Clark is about distilling Parallel WaveNet models.
Parallalel Wavenet student models are typically distilled using the original dataset the teacher WaveNet model was trained on.
This doesn't work all that well if that dataset is relatively small, and the idea of this paper is to add additional synthesized speech samples (generated by the teacher model) to the dataset the used for distilling student model. Nice and simple, and it works!
The blog post is based on our SSW10 paper.
This paper describes the variational auto-encoder (VAE) network used currently for text-to-speech (TTS) synthesis in the Google Assistant for the most frequently used voices.
The talk will be about my work on byte-level machine reading models.
The slides are over here.
In my first ever blogpost, published on Medium, I try to explain how byte-level models work, how they compare to character-level NLP models, and to word-level models.
Enjoy reading it!
The slides can be downloaded as one file over here, but are also available as separate slide decks per session from the NN4IR website.
Lastly, we also wrote this overview paper.
The slides are available as one file over here, or per session from the NN4IR website.
Additionally, here is the overview paper.
Friday December 15 2017 I successfully defended my thesis, Text Understanding for Computers, at the Agnietenkapel in Amsterdam.
Many thanks to my committee members: prof. dr. Krisztian Balog (University of Stavanger), prof. dr. Antal van den Bosch (Radboud University, Meertens Instituut), prof. dr. Franciska de Jong (Utrecht University), dr. Evangelos Kanoulas (University of Amsterdam), dr. Christof Monz (University of Amsterdam), prof. dr. Khalil Sima'an (University of Amsterdam), dr. Aleksandr Chuklin (Google Research) and dr. Claudia Hauff (Delft University of Technology). Also, many thanks to my co-promotor Joris van Eijnatten (Utrecht University), and most of all, to my supervisor Maarten de Rijke.
Here is a PDF of the book.
Stay tuned for the PDF...
Please read the excellent blogpost on the ACM website. And thanks everyone for tweeting.
The final slides are now available on nn4ir.com.
Here is the pre-print on arXiv.
Here is the link to the interview on the New Scienist website.
BTW, I also designed the logo... ;-)
Short Text Similarity with Word Embeddings
Many, many thanks to all annotators who contributed their time and effort!!!
Camera-ready PDFs will follow shortly.
The algorithms I developed to monitor changes in vocabulary over time will be implemented in a tool that discloses a corpus of digitized historical Dutch newspapers (covering the last four centuries) used by digital humanities scholars.