Massively Multilingual NMT

Link to paper: https://arxiv.org/abs/1903.00089 Neural Machine Translation (NMT) is the challenge of using neural networks to translate languages from one to another. Google Translate is famously good at this, and is where the latter two authors on this paper work. NMT is difficult largely due to the plethora ways humans choose to communicate with one another. Dialects, jargon, slang; all of these feed in to the challenge of performing NMT, even before one considers the number of languages that are spoken.
Read more →

Improving tf-idf weighted document vector embedding

Link to paper: https://arxiv.org/abs/1902.09875 In this work, Schmidt provides a simple and coherent derivation of what he calls an “optimal embedding” for a document. Optimality is defined such that it maximizes the similiarty of documents in downstream tasks. For example, as an employee at TripAdvisor, he would like to group reviews about the same location. Schmidt approaches the calculation of an optimal embedding through a weighted sum of word2vec skip-gram embeddings.
Read more →