A Study of Raters' Sensitivity to Inter-sentence Pause Durations in American English Speech by Paul Owoicho, Josh Camp, en me is accepted for Speech Prosody 2024 (SP2024).

This paper is based on work Paul did as part of an internship in our team at Google in London.

The paper is about the length of pauses between sentences. Speech synthesis is typically doen sentence by sentence, so one has to make a decisiosn is to how much silence to put in between these sentences. What we discovered during the internship is that people do not seem to be very sensitive to the difference lengths for these pauses between sentences, unless the diference is huge.
This is good news in a way (as in: you might not have to care for this much), but on the other hand it is a pity if you are working on models predicting the lengths of these pauses, as any improvement your model is making is unlikely to be picked up by rathers using the current evaluation methods.

I am proud that the first ever internship I supervised lead to interesting findings, and a nice publication too. Way to go Paul!

US Patent 17/659,840 Key Frame Networks by me, Toby Hawker and Rob Clark has been published.
I am one of the first ever recipients of a new ACL peer review award. I am very honoured by this! Many thanks to the ACL'23 organisation.
I am honoured to have received an ICASSP 2023 Outstanding Reviewer Award. Many thanks to the ICASSP 2023 organisation!

MOS vs. AB: Evaluating Text-to-Speech Systems Reliably Using Clustered Standard Errors was accepted for INTERSPEECH 2023, in Dublin.

This is a very interesting paper I think. Everyone doing research in TTS has to evaluate their TTS systems. One decision that will always pop up is: what method do we choose? Do we go for an MOS test, testing the system by itself, or do we go for a side-by-side comparison, comparing the new system to another one (or to recorded speech).

How to choose one over the other? Does it matter? Is one more robust or more sensitive than the other?

If these considerations have ever occurred to you... read the paper ;-)

I'll be giving a talk at SEA (Search Engines Amsterdam), which is run by IRLab Amsterdam. The talk will be about "Improving Speech Synthesis by Leveraging Pretrained Language Models".

I am really looking forward to giving a talk at the research group I was part of when I did my PhD. I wouldn't be surprised somehow, btw, if the audience is not completely up to speed with all ins and outs of speech synthesis/TTS (I certainly wasn't back when I was still there). So, it is an interesting challenge for me to come up with a nice talk anyway!

Slides coming soon...

Paper accepted at INTERSPEECH 2022!

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks by Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, myself, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu and Rob Clark was accepted to INTERSPEECH 2022.

This paper is about transferring the accent of one speaker to another speaker, who does not have that accent, while preserving the speaker characteristics of the target speaker. High quality transfer models are available, but they are typically expensive to run, and they can have reliability issues. Other models may be more efficient and reliable, but they might not be as good at accent transfer. This paper shows how to use speech data generated by the high quality, but expensive model, to train an efficient and reliable model.

US Patent 11,295,725 Self-training WaveNet for text-to-speech by Manish Sharma, me and Rob Clark was published.
US Patent 16,867,427 Speech Synthesis Prosody Using A BERT Model by me, Manish Sharma, Rob Clark and Aliaksei Severyn was published.
Two papers were accepted to INTERSPEECH 2020!

Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT Model by me, Manish Sharma and Rob Clark is an attempt to marry the two worlds of Natural Language Understanding (NLU) and Text-To-Speech. the idea is that the prosody of synthetic speech improves if the a BERT model is involved, as BERT models incorprate syntactic en semantic (world) knowledge.

StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes by Manish Sharma, me and Rob Clark is about distilling Parallel WaveNet models. Parallalel Wavenet student models are typically distilled using the original dataset the teacher WaveNet model was trained on. This doesn't work all that well if that dataset is relatively small, and the idea of this paper is to add additional synthesized speech samples (generated by the teacher model) to the dataset the used for distilling student model. Nice and simple, and it works!

The full paper, Frugal Paradigm Completion, by Alex Erdmann, me, Markus Becker and Christian Schallhart, about automatic completion of morphological paradigms (e.g., all forms of a verb or a noun) is accepted to The 58th annual meeting of the Association for Computational Linguistics (ACL 2020). This work is based on Alex' internship in our TTS team in London last year.
I wrote a post on the Google AI blog about our recent work on evaluating long-form text-to-speech, as I thought there were some takeways that might be of interest to a broader audience.

The blog post is based on our SSW10 paper.

Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs, by Rob Clark, Hanna Silen, myself and Ralph Leith, was accepted to SSW10, the 10th ISCA Speech Synthesis Workshop, to be held 20-22 September, 2019, Vienna, Austria. The workshop is a satellite event of the INTERSPEECH 2019 conference in Graz, Austria.
Personal Knowledge Graphs: A Research Agenda, by Krisztian Balog and myself, was accepted to ICTIR 2019, the 9th International Conference on the Theory of Information Retrieval, to be held October 2-5, 2019 in Santa Clara, California.
I accepted the invitation to join the Program Committee of the 28th ACM International Conference on Information and Knowledge Management (CIKM), November 3rd-7th, 2019 in Peking.
The very first text-to-speech paper I contributed to, CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network, by Vincent Wan, Chun-an Chan, myself, Jakub Vit and Rob Clark is accepted to ICML 2019, in Los Angeles, California.

This paper describes the variational auto-encoder (VAE) network used currently for text-to-speech (TTS) synthesis in the Google Assistant for the most frequently used voices.

I'll be giving a talk at the Alan Turing Institute in London, hosted by the South England Natural Language Processing Meetup.

The talk will be about my work on byte-level machine reading models.

The slides are over here.

I accepted the invitation to join the Program Committee for the 1st International Workshop on Computational Approaches to Historical Language Change. LangChange 2019 will be held in Florence, Italy in conjunction with ACL 2019, on July 28 - August 2, 2019.
Many people work on sequence-to-sequence models that read characters. Character-level RNNs are great, but, as we showed in our AAAI paper, models reading bytes can be better still.

In my first ever blogpost, published on Medium, I try to explain how byte-level models work, how they compare to character-level NLP models, and to word-level models.

Enjoy reading it!

I've got a job at Google in London!
It is a research position in the Text-To-Speech group of Rob Clark. The research will be about text-to-speech and natural language understanding. In particular, how the latter can help improving the first.

I am really looking forward to this!!!

For the third, and last, time NN4IR, this time at ECIR 2018, in Grenoble. Many thanks to my co-presenters, Christophe Van Gysel, Maarten de Rijke and Bhaskar Mitra, and also, of course, to the ones who couldn't make, but who did put a lot of efforts into the slides, Hosein Azarbonyad, Alexey Borisov and Mostafa Dehghani!

The slides can be downloaded as one file over here, but are also available as separate slide decks per session from the NN4IR website. Lastly, we also wrote this overview paper.

The NN4IR tutorial at WSDM 2018 in Los Angeles, California, was a success. Due to (totally pointless) visa issues, Mostafa Dehghani and Maarten de Rijke couldn't be there. Many, many thanks, however, to Hosein Azarbonyad for stepping in and helping us out! And of course, may thanks to Alexey Borisov and Christophe Van Gysel as well!

The slides are available as one file over here, or per session from the NN4IR website. Additionally, here is the overview paper.

AAAI-18, in New Orleans, was great!
I presented the work I did with Llion Jones and Daniel Hewlett during my first internship at Google Research. Here is the PDF of the paper, which is called Byte-level Machine Reading across Morphologically Varied Languages.
I graduated...!!

Friday December 15 2017 I successfully defended my thesis, Text Understanding for Computers, at the Agnietenkapel in Amsterdam.

Many thanks to my committee members: prof. dr. Krisztian Balog (University of Stavanger), prof. dr. Antal van den Bosch (Radboud University, Meertens Instituut), prof. dr. Franciska de Jong (Utrecht University), dr. Evangelos Kanoulas (University of Amsterdam), dr. Christof Monz (University of Amsterdam), prof. dr. Khalil Sima'an (University of Amsterdam), dr. Aleksandr Chuklin (Google Research) and dr. Claudia Hauff (Delft University of Technology). Also, many thanks to my co-promotor Joris van Eijnatten (Utrecht University), and most of all, to my supervisor Maarten de Rijke.

Here is a PDF of the book.


After having done the NN4IR tutorial at SIGIR 2017, we will present (a much updated version of) the tutorial again at WSDM 2018. The team will be nearly the same: Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, and me.
My paper "Byte-level Machine Reading across Morphologically Varied Languages" with Llion Jones and Daniel Hewlett of Google Research is accepted to the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), in New Orleans.
This paper is based on the research I did during my internship at Google Research in Mountain View, California.

Stay tuned for the PDF...

I am honoured! I received an Outstanding Paper Reviewer Award at the 26th ACM International Conference on Information and Knowledge Management CIKM 2017.
See this tweet over here for a proof!
Many thanks again to my co-presenters Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke and Bhaskar Mitra, and to everyone attending, for making the NN4IR tutorial at SIGIR 2017 in Tokyo a great success, in a packed room.

Please read the excellent blogpost on the ACM website. And thanks everyone for tweeting.

The final slides are now available on nn4ir.com.

I was asked to join the program committee of the 2018 edition of The Web Conference (27th edition of the former WWW conference), in Lyon, France.
Yes, that is right, WWW 2018 is rebranded as The Web Conference this year.
Together with Mostafa Dehghani (UvA), Jaap Kamps (UvA), Scott Roy (Google) and Ryen White (Microsoft Research), I joined the program committee of SCAI'17 — Search-Oriented Conversational AI, held October 1 in Amsterdam, and co-located with ICTIR'17.
I will be giving a SIGIR 2017 tutorial in Tokyo, Japan, on neural networks for Information Retrieval (NN4IR), together with Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke and Bhaskar Mitra.
More info in this overview paper and on the NN4IR website.
I am honoured to be invited to give a talk at the 14th SIKS/Twente Seminar on Searching and Ranking, Text as social and cultural data. This symposium is organized together with the PhD defense of Dong Nguyen.
I have been invited to join the Program Committee of KDD 2017, a premier interdisciplinary conference bringing together researchers and practitioners from data science, data mining, knowledge discovery, large-scale data analytics, and big data, which is held in Halifax, Nova Scotia, Canada, August 13 - 17, 2017
Spring and summer in California again!!! I am going to do a second internship at Google Research in Mountain View, April until July. I'll be working with Dana Movshovitz-Attias.

Wonderful! The full paper Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity, by Hosein Azerbonyad, Mostafa Dehghani, me, Maarten Marx, Jaap Kamps en Maarten de Rijke is accepted for the 39th European Conference on Information Retrieval (ECIR 2017) in Aberdeen!
I gave a talk about Siamese CBOW at SEA, Search Engines Amsterdam, a series of monthly talks, where academia and industry meet. Here are the slides I used.
Organising BNAIC 2016 was a lot of fun. I debuted as a session chair, in the Natural Language Processing session. I was also the Demo chair of the organising committee. We had a very nice demo session, I think, with "Autonomous Robot Soccer Matches" by Caitlin Lagrand et al. as BNAIC SKBS Demo Award winner.
This is the official version of Siamese CBOW: Optimizing Word Embeddings for Sentence Representations, the full paper I wrote with Alexey Borisov and Maarten de Rijke, which I presented last week at ACL 2016 in Berlin.
Our workshop paper Design and implementation of ShiCo: Visualising shifting concepts over time, written together with Carlos Martinez-Ortiz, Melvin Wevers, Pim Huijnen, Jaap Verheul and Joris van Eijnatten is accepted to the HistoInformatics2016 workshop held in conjunction with the Digital Humanities 2016 conference.
PDF will follow shortly.
Great stuff!! My full paper Siamese CBOW: Optimizing Word Embeddings for Sentence Representations that I wrote together with Alexey Borisov and Maarten de Rijke is accepted for ACL 2016, which is held in Berlin.

Siamese CBOW: Optimizing Word Embeddings for Sentence Similarity

We present the Siamese Continuous Bag of Words (Siamese CBOW) model, a neural network for efficient estimation of high-quality sentence embeddings. Averaging the embeddings of words in a sentence has proven to be a surprisingly successful and efficient way of obtaining sentence embeddings. However, word embeddings trained with the methods currently available are not optimized for the task of sentence representation, and, thus, likely to be suboptimal. Siamese CBOW handles this problem by training word embeddings directly for the purpose of being averaged. The underlying neural network learns word embeddings by predicting, from a sentence representation, its surrounding sentences. We show the robustness of the Siamese CBOW model by evaluating it on 20 datasets stemming from a wide variety of sources.

Here is the pre-print on arXiv.

I am quite thrilled and honoured by this... I was interviewed by the New Scientist.
The interview is titled Will computers ever be able to understand language? (in Dutch). It's is about my research on sentence similarity and also a bit about the state of affairs of natural language processing in general.

Here is the link to the interview on the New Scienist website.

Our demo paper "ShiCo: A Visualization tool for Shifting Concepts Through Time" that I wrote together with Carlos Martinez-Ortiz, Melvin Wevers, Pim Huijnen, Jaap Verheul and Joris van Eijnatten is accepted for DHBenelux 2016.
This is particularly nice, I think, because this is follow-up work of our CIKM paper Ad Hoc Monitoring of Vocabulary Shifts over Time.
I'll be the demo chair for BNAIC 2016, the Annual Benelux Conference on Artificial Intelligence.
The conference is jointly organized by the University of Amsterdam and the Vrije Universiteit Amsterdam, under the auspices of the Benelux Association for Artificial Intelligence (BNVKI) and the Dutch Research School for Information and Knowledge Systems (SIKS) and will be held in Amsterdam, Thursday 10 and Friday 11 November, 2016.

BTW, I also designed the logo... ;-)

Summer in California!!! I am going to do an internship at Google Research in Mountain View, May until August. I'll be working with Mat Kelcey.

The abstract of my CIKM'15 paper Short Text Similarity with Word Embeddings was accepted for the Dutch-Belgian Information Retrieval workshop (DIR2015) in Amsterdam, Holland.
I gave two talks on CIKM'15 in Melbourne. Here are the slides:

Short Text Similarity with Word Embeddings

Ad Hoc Monitoring of Vocabulary Shifts over Time

My research about sentence samantics and changes in word usage through time made it to the UvA homepage.

I went to the Google NLP PhD Summit in Zurich and it was great! I met a lot of very interesting people and had a lot of nice discussions.
Here is a link to the the poster I presented.
Cool! I will be going to the Google NLP PhD Summit in Zurich in September.


Today, Agnes van Belle, an AI master student I supervised, graduated. She wrote a nice thesis called Historical Document Retrieval with Corpus-derived Rewrite Rules.
Spelling changes quite often occur gradually (even if they are government-imposed) and in the thesis it is shown that we can exploit the continuum of gradual changes when doing query expansion for historical document retrieval.

Here is the final version of my CIKM 2015 paper Short Text Similarity with Word Embeddings with Maarten de Rijke.

Here is the final version of my CIKM 2015 paper Ad Hoc Monitoring of Vocabulary Shifts over Time with Melvin Wevers, Pim Huijnen and Maarten de Rijke.

The dataset we made for het CIKM 2015 paper "Ad Hoc Monitoring of Vocabulary Shifts over Time" with Melvin Wevers, Pim Huijnen and Maarten de Rijke is now publicly available.
Go and get it here.

Many, many thanks to all annotators who contributed their time and effort!!!

Yes! Yes! Nice! Nice! Both full paper submissions to CIKM 2015 are accepted. I am going to Melbourne! These are the papers:

Short Text Similarity with Word Embeddings, with Maarten de Rijke.
Short Text Similarity with Word Embeddings

Determining semantic similarity between texts is important in many tasks in information retrieval such as search, query suggestion, automatic summarization and image finding. Many approaches have been suggested, based on lexical matching, handcrafted patterns, syntactic parse trees, external sources of structured semantic knowledge and distributional semantics. However, lexical features, like string matching, do not capture semantic similarity beyond a trivial level. Furthermore, handcrafted patterns and external sources of structured semantic knowledge cannot be assumed to be available in all circumstances and for all domains. Lastly, approaches depending on parse trees are restricted to syntactically well-formed texts, typically of one sentence in length.
We investigate whether determining short text similarity is possible using only semantic features — where by semantic we mean, pertaining to a representation of meaning — rather than relying on similarity in lexical or syntactic representations. We use word embeddings, vector representations of terms, computed from unlabelled data, that represent terms in a semantic space in which proximity of vectors can be interpreted as semantic similarity.
We propose to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings. A novel feature of our approach is that an arbitrary number of word embedding sets can be incorporated. We derive multiple types of meta-features from the comparison of the word vectors for short text pairs, and from the vector means of their respective word embeddings. The features representing labelled short text pairs are used to train a supervised learning algorithm. We use the trained model at testing time to predict the semantic similarity of new, unlabelled pairs of short texts.
We show on a publicly available evaluation set commonly used for the task of semantic similarity that our method outperforms baseline methods that work under the same conditions.

Ad Hoc Monitoring of Vocabulary Shifts over Time with Melvin Wevers, Pim Huijnen and Maarten de Rijke.
Ad Hoc Monitoring of Vocabulary Shifts over Time

Word meanings change over time. Detecting shifts in meaning for particular words has been the focus of much research recently. We address the complementary problem of monitoring shifts in vocabulary over time. That is, given a small seed set of words, we are interested in monitoring which terms are used over time to refer to the underlying concept denoted by the seed words.
In this paper, we propose an algorithm for monitoring shifts in vocabulary over time, given a small set of seed terms. We use distributional semantic methods to infer a series of semantic spaces over time from a large body of time-stamped unstructured textual documents. We construct semantic networks of terms based on their representation in those semantic spaces and use graph-based measures to calculate saliency of terms. Based on these graph-based measures we produce ranked lists of terms that represent the concept underlying the initial seed terms over time as final output.
As the task of monitoring shifting vocabularies over time for an ad hoc set of seed words is, to the best of our knowledge, a new one, we construct our own evaluation set. Our main contributions are the introduction of the task of ad hoc monitoring of vocabulary shifts over time, the description of an algorithm for tracking shifting vocabularies over time given a small set of seed words, and a systematic evaluation of results over a substantial period of time (over four decades). Additionally, we make our newly constructed evaluation set publicly available.

Camera-ready PDFs will follow shortly.

The IPM paper Evaluating Document Filtering Systems over Time with Krisztian Balog and Maarten de Rijke is online. Here is the official link and it can also be downloaded here.
The NLeSc PathFinder grant proposal that I co-wrote is accepted. In the proposal we describe a system for monitoring shifts in vocabulary over time.
For example, in the 1950s people used to say automobile where nowadays everyone would use the word car. It's the same concept, but the vocabulary has changed. Another nice example is the word propaganda in Dutch. In the 1950s, this used to refer to commercial activities like advertising, where nowadays, in Dutch, one would use the word reclame.

The algorithms I developed to monitor changes in vocabulary over time will be implemented in a tool that discloses a corpus of digitized historical Dutch newspapers (covering the last four centuries) used by digital humanities scholars.

Nice! My paper called Evaluating Document Filtering Systems over Time with Krisztian Balog and Maarten de Rijke was accepted for the IPM special issue on Time and IR. PDF will follow soon.
The abstract Concepts Through Time: Tracing Concepts In Dutch Newspapers Discourse (1890-1990) Using Word Embeddings which I co-wrote with Melvin Wevers and Pim Huijnen is accepted for Digital Humanities 2015 (DH2015) in Sydney, Australia.
Here are some very simple and high-level slides on word2vec that I made for a reading group on our group. Nothing special, just what it is (not) and what it is used for.

Last year, I participated in the Cumulative Citation Recommendation task (CCR) of the Knowledge Base Acceleration (KBA) track of the Text REtrieval Conference, TREC 2013. This is the notebook paper describing our approach.
Today I presented my work on "Time-Aware Chi-squared for Document Filtering over Time" at CLIN24 in Leiden. This is largely the same presentation I held earlier at the TAIA workshop at SIGIR 2013 in Dublin and at TREC 2013 in Gaithersburg.
Just in case anyone is interested, here are the slides.
Presented my poster at ICT.OPEN 2013.


Nice! My abstract for CLIN24, called "Time-Aware Chi-squared for Document Filtering over Time" is accepted for presentation.