Last month, Amazon Alexa AI science leader Young-Bum Kim wrote a blog post about a paper he’s presenting at the annual conference of the Association for Computational Linguistics (ACL), which describes the first part of the two-part system that will enable Alexa to select the one skill out of thousands that’s best suited to a particular customer request. He described the second part in a paper presented at a meeting of the North American chapter of the ACL, which was under way at the time.
ACL begins in Melbourne, Australia on July 15, and in addition to Kim, two other Amazon AI researchers are coauthors of papers accepted to the conference. Anima Anandkumar, a principal scientist in Amazon’s Rekognition and Video group, and colleagues at Cornell University will present a new technique for producing word embeddings, which attempt to mathematically capture words’ semantic similarities. And Alessandro Moschitti, a principal scientist in Amazon’s Machine Translation/Natural Language Processing group, is coauthor on a pair of papers that use innovative machine-learning systems for classifying discussion-forum threads as a springboard for addressing some more general problems.
Word embeddings are essential to the design of most natural-language-understanding (NLU) systems. Most commonly, they use statistics about words’ co-occurrence with other words to group semantically related strings of words together as points in a geometric space.
But researchers are constantly trying to come up with better word embeddings, and two fruitful recent attempts have been FastText and probabilistic embeddings. FastText is an embedding that analyzes the substrings of characters from which words are composed. Consequently, it can make educated guesses about the meanings of words it hasn’t seen before -- for instance, “circumnavigation” -- if they contain substrings it has -- such as "circum” and “navigation.” Probabilistic embeddings represent words’ semantic properties as probability distributions, which can help NLU systems handle words with multiple senses. The embedding for the word “rock,” for instance, could be a distribution with multiple peaks, one near words like “pop” and “jazz” and one near words like “stone” and “basalt.”
In the ACL paper “Probabilistic FastText for Multi-Sense Word Embeddings,” Anandkumar and her colleagues combine these two approaches, in a way that remains computationally efficient despite its expanded expressive power. The first author on the paper is Ben Athiwaratkun, a Cornell PhD student in statistics who was an intern at Amazon when the work was done. He’s joined by Anandkumar and by his thesis advisor, Andrew Gordon Wilson, an assistant professor of operations research and information engineering at Cornell.
The researchers’ new embedding represents words as combinations of distributions, but the peaks of the distributions are determined by FastText-style substring analyses. In tests, the embedding, dubbed Probabilistic FastText, outperformed both traditional FastText and existing probabilistic embeddings by 3 to 4 percent. It can also perform tasks that traditional FastText cannot, such as disambiguating word senses.
Most recent advances in NLU have come courtesy of deep neural networks, densely interconnected networks of information processors that collectively learn to perform computational tasks. But there are a few problems on which older NLU systems still outperform neural networks; determining whether two different sentences address the same topic is one of them. That’s because syntactic parsing — analyzing sentences’ grammatical structure — makes sentence similarity easier to assess. Despite recent research that has narrowed the gap, earlier machine-learning systems, such as support vector machines, remain better able to exploit syntactic parsing than neural nets.
In principle, neural nets should be able to learn syntactic parsing. But training them to perform such complicated operations requires a huge amount of data — more than has been practical to annotate by hand. So in a procedure they describe in their ACL paper “Injecting Relational Structural Representation in Neural Networks for Question Similarity,” Moschitti and colleagues at the University of Trento automated the annotation process.
First, they used about 10,000 hand-annotated examples to train a support vector machine (SVM) with built-in syntactic parsing to assess pairs of sentences for topic similarity. They next used the SVM to automatically annotate an additional 375,000 sentence pairs, which they used to train a neural net. Finally, they fine-tuned the neural net with another 10,000 hand-annotated examples. After fine-tuning, the neural net actually outperformed the SVM by 5 percent.
Moschitti and colleagues at the Qatar Computing Research Institute also have a paper accepted to ACL’s system demonstration track. The system they will demonstrate breaks up the computations performed by SVMs trained to assess sentence similarity and distributes them across computers, improving speed and scalability while still achieving state-of-the-art performance.
Top Photo: Anima Anandkumar, a principal scientist in Amazon’s Rekognition and Video group, and colleagues will present a new technique for producing word embeddings at ACL 2018.