Where computer science and linguistics meet
At the annual conference of the North American Chapter of the Association for Computational Linguistics (NAACL), which begins June 1, 28 Amazon artificial-intelligence researchers will co-present 12 papers, addressing topics such as how to develop new machine-learning applications when training data are scarce, how to select the right Alexa skill to execute a given request, and how to identify fake news.
In addition, Daniel Marcu, Amazon director of machine translation and natural-language processing, will deliver one of two keynote addresses on the second day of the conference. Marcu will relay lessons he learned in the course of converting cutting-edge natural-language-processing and machine-learning projects into marketable products, starting with the machine translation company he cofounded in 2002 and culminating with his work at Amazon.
NAACL is one of the premier conferences for computational linguists, and Amazon is one of the leading corporate sponsors of this year’s 16th annual conference, which is being held in New Orleans. As the diversity of its paper submissions suggests, Amazon employs natural-language processing and machine-learning technologies across the breadth of its business to enhance customer experiences. Here’s a quick overview of the Amazon researchers’ papers:
Using machine translations of existing, annotated training data to rapidly develop natural-language-understanding (NLU) models in new languages.
The researchers show that models trained using high volumes of translated data outperform both those trained on “formal grammars” hand-crafted by language specialists and those trained on available sets of hand-annotated data (with approximately 10,000 utterances). They also show that techniques for automatically identifying high-quality translations improve the models’ accuracy.
Rapidly developing new NLU applications by transferring capabilities from machine-learning models trained on large data sets to those for which little data is available.
The researchers’ approach involves “neural networks,” networks of simple processing units arranged into layers; each layer passes data to the one above it for further processing. The researchers first train a network on a range of tasks for which large stores of annotated data are available, then reset the top layers and re-train the network on the sparser data available for a particular task.
Selecting the best Alexa skill out of thousands to execute a given request, without requiring the user to specify the skill by name.
The researchers tackle this daunting computational challenge by splitting it into two parts: First, their system generates a short list of candidate skills, based on both the content of the request and a learned clustering of skills with similar purposes; then it re-ranks that list using richer information, such as semantic interpretation of the request, the user’s preferences and history, and overall skill popularity.
Paper: “A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding”
Authors: Young-Bum Kim, Dongchan Kim, Joo-Kyung Kim, Ruhi Sarikaya
Forcing machine translations to make particular word choices, in a way that scales as the number of required words increases.
If the system is translating an economics paper into English, for instance, you want it to use the phrase “supply and demand”, not “availability and appetite”. Previous solutions to this problem scaled linearly with the number of constraints: requiring four times as many words made the translation take four times as long. The Amazon researchers’ method, by contrast, has constant overhead: it’s as efficient whether there’s one required word or 10.
Paper: “Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation”
Authors: Matt Post, David Vilar Torres
Adapting generic machine translation systems to particular terminological or style conventions.
Amazon machine-learning scientist David Vilar Torres used news data and parliamentary speeches to train a neural network to translate German into English, but he added an extra layer of nodes that can throttle the output of other nodes up or down. He then trained that layer on TED talks, and in tests, the resulting system was better able to reproduce TED talks’ signature style than one using traditional adaptation techniques.
Paper: “Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models”
Authors: David Vilar Torres
A more useful way to fill in the gaps in “knowledge bases,” structured collections of facts used in many A.I. applications.
The completeness of knowledge bases is usually evaluated using either human-specified criteria or aggregate statistics. The researchers demonstrate a computationally practical technique for using actual knowledge-base queries to determine what types of data about which entities are most important to users, enabling knowledge-base managers to better prioritize data additions.
Paper: “Demand-Weighted Completeness Prediction for a Knowledge Base”
Authors: Andrew Hopkinson, Amit Gurdasani, Dave Palfrey, Arpit Mittal
A better way to learn “word embeddings” — concise representations of words that capture their semantic similarity — when data are scarce.
Word embeddings, which cluster words together if they perform similar functions in similar contexts, are useful for many natural-language-processing applications. But the machine-learning systems that produce them generally require huge sets of training data. The researchers present a new way to generate word embeddings that, in experiments involving small data sets in four different languages, outperformed its predecessors in 75% of test cases.
Paper: “Learning Word Embeddings for Low-resource Languages by PU Learning”
Authors: Chao Jiang, Cho-Jui Hsieh, Hsiang-Fu Yu, Kai-Wei Chang
Using ancillary information — such as key-stroke logs or speech prosody — to improve machine-learning systems trained on scanty data in new languages.
Many languages lack the rich, annotated data sets used to train natural-language-processing (NLP) systems in English, but they have other sources of readily available data, such as key-stroke logs or audio recordings. These data may contain linguistic clues: in English, for instance, a rising inflection at the end of a sentence frequently denotes a question, while in key-stroke data, pauses or revisions can reflect syntactic or semantic complexity. The researchers show that exploiting such information can improve the performance of NLP systems when training data is scarce.
Paper: “Unsupervised Induction of Linguistic Categories with Records of Reading, Speaking, and Writing”
Authors: Maria Barrett, Lea Frermann, Ana Valeria Gonzalez-Garduño, Anders Søgaard
A large data set that can be used to train machine-learning systems to extract facts from unstructured data — and verify their accuracy.
The data set contains more than 185,000 factual assertions, together with extracts from publicly available documents that human annotators used to substantiate them. The key to assembling the data set was a set of annotation guidelines and an interface that allowed multiple human annotators to converge quickly on a shared interpretation of the source texts.
An Arabic data set that can be used to train machine-learning systems to identify false claims. The data set is the first, in any language, to contain annotations for both veracity — whether a claim is true — and “stance” — whether a particular commenter regards the claim as true. False assertions were extracted from an Arabic fact-checking website. The data set contains roughly 3,000 documents that bear on the veracity of 422 factual assertions.
Paper: “Integrating Stance Detection and Fact Checking in a Unified Corpus”
Authors: Ramy Baly, Mitra Mohtarami, James Glass, Lluís Marquez, Alessandro Moschitti, Preslav Nakov
A system that determines the “stance” of a text toward a factual claim — whether the text agrees or disagrees with it — and extracts text that supports its conclusion.
One promising way to automatically identify false assertions is to canvass reliable sources’ reactions to them. The new system is a “memory network,” which can keep track of multiple pieces of information when assessing a claim’s validity. The researchers tested their system on the data set created by the grassroots organization Fake News Challenge, and it performed comparably to state-of-the-art systems, while also providing textual evidence for its inferences.
Paper: “Automatic Stance Detection Using End-to-End Memory Networks”
Authors: Mitra Mohtarami, Ramy Baly, James Glass, Preslav Nakov, Lluís Màrquez, Alessandro Moschitti
A more sophisticated way to annotate the spoken utterances used to train NLU systems, to enable more complex tasks and improve performance on simpler ones.
Where a traditional annotation scheme might involve simply labeling words in a sample utterance, the Alexa Meaning Representation Language converts the utterance into a “graph,” which looks something like a network diagram but represents semantic properties of the utterance. Graphs can capture more sophisticated linguistic relationships than simple semantic labels can, and they can be reused across utterances with similar forms. The researchers also describe a new data set containing 20,000 utterances annotated in the new language, which can be used to train semantic parsers to produce graphs automatically.
Paper: “The Alexa Meaning Representation Language”
Authors: Thomas Kollar, Danielle Berry, Lauren Stuart, Karolina Owczarzak, Tagyoung Chung, Lambert Mathias, Michael Kayser, Bradford Snow, Spyros Matsoukas
More information about Amazon’s presence at NAACL is available here.