A Study of Raters' Sensitivity to Inter-sentence Pause Durations in American English Speech by Paul Owoicho, Josh Camp, en me is accepted for Speech Prosody 2024 (SP2024).
This paper is based on work Paul did as part of an internship in our team at Google in London.
The paper is about the length of pauses between sentences. Speech synthesis is typically doen sentence by sentence, so one has to make a decisiosn is to how much silence to put in between these sentences. What we discovered during the internship is that people do not seem to be very sensitive to the difference lengths for these pauses between sentences, unless the diference is huge.
This is good news in a way (as in: you might not have to care for this much), but on the other hand it is a pity if you are working on models predicting the lengths of these pauses, as any improvement your model is making is unlikely to be picked up by rathers using the current evaluation methods.
I am proud that the first ever internship I supervised lead to interesting findings, and a nice publication too. Way to go Paul!
I am one of the first ever recipients of a new ACL peer review award.
I am very honoured by this! Many thanks to the ACL'23 organisation.
I am honoured to have received an ICASSP 2023 Outstanding Reviewer Award. Many thanks to the ICASSP 2023 organisation!
MOS vs. AB: Evaluating Text-to-Speech Systems Reliably Using Clustered Standard Errors was accepted for INTERSPEECH 2023, in Dublin.
This is a very interesting paper I think. Everyone doing research in TTS has to evaluate their TTS systems. One decision that will always pop up is: what method do we choose? Do we go for an MOS test, testing the system by itself, or do we go for a side-by-side comparison, comparing the new system to another one (or to recorded speech).
How to choose one over the other? Does it matter? Is one more robust or more sensitive than the other?
If these considerations have ever occurred to you... read the paper ;-)
I'll be giving a talk at SEA (Search Engines Amsterdam), which is run by IRLab Amsterdam. The talk will be about "Improving Speech Synthesis by Leveraging Pretrained Language Models".
I am really looking forward to giving a talk at the research group I was part of when I did my PhD. I wouldn't be surprised somehow, btw, if the audience is not completely up to speed with all ins and outs of speech synthesis/TTS (I certainly wasn't back when I was still there). So, it is an interesting challenge for me to come up with a nice talk anyway!
Slides coming soon...
Paper accepted at INTERSPEECH 2022!Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks by Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, myself, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu and Rob Clark was accepted to INTERSPEECH 2022.
This paper is about transferring the accent of one speaker to another speaker, who does not have that accent, while preserving the speaker characteristics of the target speaker.
High quality transfer models are available, but they are typically expensive to run, and they can have reliability issues.
Other models may be more efficient and reliable, but they might not be as good at accent transfer.
This paper shows how to use speech data generated by the high quality, but expensive model, to train an efficient and reliable model.
US Patent 11,295,725 Self-training WaveNet for text-to-speech
by Manish Sharma, me and Rob Clark was published.
US Patent 16,867,427 Speech Synthesis Prosody Using A BERT Model by me, Manish Sharma, Rob Clark and Aliaksei Severyn was published.
Two papers were accepted to INTERSPEECH 2020!
Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT Model by me, Manish Sharma and Rob Clark is an attempt to marry the two worlds of Natural Language Understanding (NLU) and Text-To-Speech.
the idea is that the prosody of synthetic speech improves if the a BERT model is involved, as BERT models incorprate syntactic en semantic (world) knowledge.
StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes by Manish Sharma, me and Rob Clark is about distilling Parallel WaveNet models.
Parallalel Wavenet student models are typically distilled using the original dataset the teacher WaveNet model was trained on.
This doesn't work all that well if that dataset is relatively small, and the idea of this paper is to add additional synthesized speech samples (generated by the teacher model) to the dataset the used for distilling student model. Nice and simple, and it works!
The full paper, Frugal Paradigm Completion, by Alex Erdmann, me, Markus Becker and Christian Schallhart, about automatic completion of morphological paradigms (e.g., all forms of a verb or a noun) is accepted to The 58th annual meeting of the Association for Computational Linguistics (ACL 2020).
This work is based on Alex' internship in our TTS team in London last year.
I wrote a post on the Google AI blog about our recent work on evaluating long-form text-to-speech, as I thought there were some takeways that might be of interest to a broader audience.
The blog post is based on our SSW10 paper.
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs, by Rob Clark, Hanna Silen, myself and Ralph Leith, was accepted to SSW10, the 10th ISCA Speech Synthesis Workshop, to be held 20-22 September, 2019, Vienna, Austria. The workshop is a satellite event of the INTERSPEECH 2019 conference in Graz, Austria.
Personal Knowledge Graphs: A Research Agenda, by Krisztian Balog and myself, was accepted to ICTIR 2019, the 9th International Conference on the Theory of Information Retrieval, to be held October 2-5, 2019 in Santa Clara, California.
I accepted the invitation to join the Program Committee of the 28th ACM International Conference on Information and Knowledge Management (CIKM), November 3rd-7th, 2019 in Peking.
The very first text-to-speech paper I contributed to, CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network, by Vincent Wan, Chun-an Chan, myself, Jakub Vit and Rob Clark is accepted to ICML 2019, in Los Angeles, California.
This paper describes the variational auto-encoder (VAE) network used currently for text-to-speech (TTS) synthesis in the Google Assistant for the most frequently used voices.
I'll be giving a talk at the Alan Turing Institute in London, hosted by the South England Natural Language Processing Meetup.
The talk will be about my work on byte-level machine reading models.
The slides are over here.
I accepted the invitation to join the Program Committee for the 1st International Workshop on Computational Approaches to Historical Language Change.
LangChange 2019 will be held in Florence, Italy in conjunction with ACL 2019, on July 28 - August 2, 2019.
Many people work on sequence-to-sequence models that read characters. Character-level RNNs are great, but, as we showed in our AAAI paper, models reading bytes can be better still.
In my first ever blogpost, published on Medium, I try to explain how byte-level models work, how they compare to character-level NLP models, and to word-level models.
Enjoy reading it!
It was a pleasure to be invited to give a presentation at the Dive into New Deep Learning Models for Natural Language Processing Meetup about our Byte-level Machine Reading across Morphologically Varied Languages paper.
For the third, and last, time NN4IR, this time at ECIR 2018, in Grenoble.
Many thanks to my co-presenters, Christophe Van Gysel, Maarten de Rijke and Bhaskar Mitra, and also, of course, to the ones who couldn't make, but who did put a lot of efforts into the slides, Hosein Azarbonyad, Alexey Borisov and Mostafa Dehghani!
The slides can be downloaded as one file over here, but are also available as separate slide decks per session from the NN4IR website.
Lastly, we also wrote this overview paper.
The NN4IR tutorial at WSDM 2018 in Los Angeles, California, was a success.
Due to (totally pointless) visa issues, Mostafa Dehghani and Maarten de Rijke couldn't be there.
Many, many thanks, however, to Hosein Azarbonyad for stepping in and helping us out!
And of course, may thanks to Alexey Borisov and Christophe Van Gysel as well!
The slides are available as one file over here, or per session from the NN4IR website.
Additionally, here is the overview paper.
AAAI-18, in New Orleans, was great!
I graduated...!!
Friday December 15 2017 I successfully defended my thesis, Text Understanding for Computers, at the Agnietenkapel in Amsterdam.
Many thanks to my committee members: prof. dr. Krisztian Balog (University of Stavanger), prof. dr. Antal van den Bosch (Radboud University, Meertens Instituut), prof. dr. Franciska de Jong (Utrecht University), dr. Evangelos Kanoulas (University of Amsterdam), dr. Christof Monz (University of Amsterdam), prof. dr. Khalil Sima'an (University of Amsterdam), dr. Aleksandr Chuklin (Google Research) and dr. Claudia Hauff (Delft University of Technology). Also, many thanks to my co-promotor Joris van Eijnatten (Utrecht University), and most of all, to my supervisor Maarten de Rijke.
Here is a PDF of the book.
After having done the NN4IR tutorial at SIGIR 2017, we will present (a much updated version of) the tutorial again at WSDM 2018. The team will be nearly the same: Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, and me.
My paper "Byte-level Machine Reading across Morphologically Varied Languages" with Llion Jones and Daniel Hewlett of Google Research is accepted to the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), in New Orleans.
Stay tuned for the PDF...
I am honoured! I received an Outstanding Paper Reviewer Award at the 26th ACM International Conference on Information and Knowledge Management CIKM 2017.
Many thanks again to my co-presenters Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke and Bhaskar Mitra, and to everyone attending, for making the NN4IR tutorial at SIGIR 2017 in Tokyo a great success, in a packed room.
Please read the excellent blogpost on the ACM website. And thanks everyone for tweeting.
The final slides are now available on nn4ir.com.
I was asked to join the program committee of the 2018 edition of The Web Conference (27th edition of the former WWW conference), in Lyon, France.
Together with Mostafa Dehghani (UvA), Jaap Kamps (UvA), Scott Roy (Google) and Ryen White (Microsoft Research), I joined the program committee of SCAI'17 — Search-Oriented Conversational AI, held October 1 in Amsterdam, and co-located with ICTIR'17.
My paper Attentive Memory Networks: Efficient Machine Reading for Conversational Search, with Maarten de Rijke is accepted for the 1st International Workshop on Conversational Approaches to Information Retrieval (CAIR'17) in collaboration with SIGdial at SIGIR 2017 in Tokyo, Japan.
I will be giving a SIGIR 2017 tutorial in Tokyo, Japan, on neural networks for Information Retrieval (NN4IR), together with Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke and Bhaskar Mitra.
I am honoured to be invited to give a talk at the 14th SIKS/Twente Seminar on Searching and Ranking, Text as social and cultural data.
This symposium is organized together with the PhD defense of Dong Nguyen.
I have been invited to join the Program Committee of KDD 2017, a premier interdisciplinary conference bringing together researchers and practitioners from data science, data mining, knowledge discovery, large-scale data analytics, and big data, which is held in Halifax, Nova Scotia, Canada, August 13 - 17, 2017
Spring and summer in California again!!! I am going to do a second internship at Google Research in Mountain View, April until July. I'll be working with Dana Movshovitz-Attias.
Wonderful! The full paper Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity, by Hosein Azerbonyad, Mostafa Dehghani, me, Maarten Marx, Jaap Kamps en Maarten de Rijke is accepted for the 39th European Conference on Information Retrieval (ECIR 2017) in Aberdeen!
I gave a talk about Siamese CBOW at SEA, Search Engines Amsterdam, a series of monthly talks, where academia and industry meet. Here are the slides I used.
Organising BNAIC 2016 was a lot of fun. I debuted as a session chair, in the Natural Language Processing session. I was also the Demo chair of the organising committee. We had a very nice demo session, I think, with "Autonomous Robot Soccer Matches" by Caitlin Lagrand et al. as BNAIC SKBS Demo Award winner.
This is the official version of Siamese CBOW: Optimizing Word Embeddings for Sentence Representations, the full paper I wrote with Alexey Borisov and Maarten de Rijke, which I presented last week at ACL 2016 in Berlin.
Our workshop paper Design and implementation of ShiCo: Visualising shifting concepts over time, written together with Carlos Martinez-Ortiz, Melvin Wevers, Pim Huijnen, Jaap Verheul and Joris van Eijnatten is accepted to the HistoInformatics2016 workshop held in conjunction with the Digital Humanities 2016 conference.
Great stuff!! My full paper Siamese CBOW: Optimizing Word Embeddings for Sentence Representations that I wrote together with Alexey Borisov and Maarten de Rijke is accepted for ACL 2016, which is held in Berlin.
Here is the pre-print on arXiv.
I am quite thrilled and honoured by this... I was interviewed by the New Scientist.
Here is the link to the interview on the New Scienist website.
Our demo paper "ShiCo: A Visualization tool for Shifting Concepts Through Time" that I wrote together with Carlos Martinez-Ortiz, Melvin Wevers, Pim Huijnen, Jaap Verheul and Joris van Eijnatten is accepted for DHBenelux 2016.
I'll be the demo chair for BNAIC 2016, the Annual Benelux Conference on Artificial Intelligence.
BTW, I also designed the logo... ;-)
Summer in California!!! I am going to do an internship at Google Research in Mountain View, May until August. I'll be working with Mat Kelcey.
The abstract of my CIKM'15 paper Short Text Similarity with Word Embeddings was accepted for the Dutch-Belgian Information Retrieval workshop (DIR2015) in Amsterdam, Holland.
I gave two talks on CIKM'15 in Melbourne. Here are the slides:
My research about sentence samantics and changes in word usage through time made it to the UvA homepage.
I went to the Google NLP PhD Summit in Zurich and it was great! I met a lot of very interesting people and had a lot of nice discussions.
Today, Agnes van Belle, an AI master student I supervised, graduated.
She wrote a nice thesis called Historical Document Retrieval with Corpus-derived Rewrite Rules.
Here is the final version of my CIKM 2015 paper Short Text Similarity with Word Embeddings with Maarten de Rijke.
Here is the final version of my CIKM 2015 paper Ad Hoc Monitoring of Vocabulary Shifts over Time with Melvin Wevers, Pim Huijnen and Maarten de Rijke.
The dataset we made for het CIKM 2015 paper "Ad Hoc Monitoring of Vocabulary Shifts over Time" with Melvin Wevers, Pim Huijnen and Maarten de Rijke is now publicly available.
Many, many thanks to all annotators who contributed their time and effort!!!
Yes! Yes! Nice! Nice! Both full paper submissions to CIKM 2015 are accepted.
I am going to Melbourne!
These are the papers:
Camera-ready PDFs will follow shortly.
The IPM paper Evaluating Document Filtering Systems over Time with Krisztian Balog and Maarten de Rijke is online. Here is the official link and it can also be downloaded here.
The NLeSc PathFinder grant proposal that I co-wrote is accepted. In the proposal we describe a system for monitoring shifts in vocabulary over time.The algorithms I developed to monitor changes in vocabulary over time will be implemented in a tool that discloses a corpus of digitized historical Dutch newspapers (covering the last four centuries) used by digital humanities scholars.
Nice! My paper called Evaluating Document Filtering Systems over Time with Krisztian Balog and Maarten de Rijke was accepted for the IPM special issue on Time and IR. PDF will follow soon.
The abstract Concepts Through Time: Tracing Concepts In Dutch Newspapers
Discourse (1890-1990) Using Word Embeddings which I co-wrote with Melvin Wevers and Pim Huijnen is accepted for Digital Humanities 2015 (DH2015) in Sydney, Australia.
Here are some very simple and high-level slides on word2vec that I made for a reading group on our group. Nothing special, just what it is (not) and what it is used for.
Last year, I participated in the Cumulative Citation Recommendation task (CCR) of the Knowledge Base Acceleration (KBA) track of the Text REtrieval Conference, TREC 2013. This is the notebook paper describing our approach.
Today I presented my work on "Time-Aware Chi-squared for Document Filtering over Time" at CLIN24 in Leiden. This is largely the same presentation I held earlier at the TAIA workshop at SIGIR 2013 in Dublin and at TREC 2013 in Gaithersburg.