Dagstuhl-Seminar 17042
From Characters to Understanding Natural Language (C2NLU): Robust End-to-End Deep Learning for NLP
( 22. Jan – 27. Jan, 2017 )
Permalink
Organisatoren
- Phil Blunsom (University of Oxford, GB)
- Kyunghyun Cho (New York University, US)
- Chris Dyer (Carnegie Mellon University - Pittsburgh, US)
- Hinrich Schütze (LMU München, DE)
Kontakt
- Simone Schilke (für administrative Fragen)
Deep learning is currently one of most active areas of research in machine learning and its applications, including natural language processing (NLP). One hallmark of deep learning is end-to-end learning: all parameters of a deep learning model are optimized directly on the learning objective; e.g., on the objective of accuracy on the binary classification task: is the input image the image of a cat? Crucially, the set of parameters that are optimized includes "first-layer" parameters that connect the raw input representation (e.g., pixels) to the first layer of internal representations of the network (e.g., edge detectors). In contrast, many other machine learning models employ hand-engineered features to take the role of these first-layer parameters.
Even though deep learning has had a number of successes in NLP, research on true end-to-end learning is just beginning to emerge. Most NLP deep learning models still start with a hand-engineered layer of representation, the level of tokens or words, i.e., the input is broken up into units by manually designed tokenization rules. Such rules often fail to capture structure both within tokens (e.g., morphology) and across multiple tokens (e.g., multi-word expressions).
Another problem of token-based end-to-end systems is that they currently have no principled and general way to generate tokens that are not part of the training vocabulary. Since a token is represented as a vocabulary index and parameters governing system behavior affecting this token are referring to this vocabulary index, a token that does not have a vocabulary index cannot easily be generated in end-to-end systems. In contrast, character-based end-to-end systems can generate new vocabulary items, so that -- at least in theory -- they do not have an out-of-vocabulary problem.
Character-based processing is also interesting from a theoretical point of view for linguistics and computational linguistics. We generally assume that the relationship between signifiers (tokens) and the signified (meaning) is arbitrary. There are well-known cases of non-arbitrariness, including onomatopoeia and regularities in names (female vs male first names), but these are usually considered to be exceptions. Character-based approaches can deal much better with such non-arbitrariness than token-based approaches. Thus, if non-arbitrariness is more pervasive than generally assumed, then character-based approaches would have an additional advantage.
Given the success of end-to-end learning in other domains, it is likely that it will also be widely used in NLP to alleviate these issues and lead to great advances. This workshop will bring together an interdisciplinary group of researchers from deep learning, machine learning and computational linguistics to develop a research agenda for end-to-end deep learning applied to natural language.
Deep learning is currently one of most active areas of research in machine learning and its applications, including natural language processing (NLP). One hallmark of deep learning is end-to-end learning: all parameters of a deep learning model are optimized directly for the learning objective; e.g., for the objective of accuracy on the binary classification task: is the input image the image of a cat? Crucially, the set of parameters that are optimized includes "first-layer" parameters that connect the raw input representation (e.g., pixels) to the first layer of internal representations of the network (e.g., edge detectors). In contrast, many other machine learning models employ hand-engineered features to take the role of these first-layer parameters.
Even though deep learning has had a number of successes in NLP, research on true end-to-end learning is just beginning to emerge. Most NLP deep learning models still start with a hand-engineered layer of representation, the level of tokens or words, i.e., the input is broken up into units by manually designed tokenization rules. Such rules often fail to capture structure both within tokens (e.g., morphology) and across multiple tokens (e.g., multi-word expressions). Given the success of end-to-end learning in other domains, it is likely that it will also be widely used in NLP to alleviate these issues and lead to great advances.
The seminar brought together researchers from deep learning, general machine learning, natural language processing and computational linguistics to develop a research agenda for the coming years. The goal was to combine recent advances in deep learning architectures and algorithms with extensive domain knowledge about language to make true end-to-end learning for NLP possible.
Our goals were to make progress on answering the following research questions.
- C2NLU approaches so far fall short of the state of the art in cases where token structures can easily be exploited (e.g., in well-edited newspaper text) compared to word-level approaches. What are promising avenues for developing C2NLU to match the state of the art even in these cases of text with well-defined token structures?
- Character-level models are computationally more expensive than word-level models because detecting syntactic and semantic relationships at the character-level is more expensive (even though it is potentially more robust) than at the word-level. How can we address the resulting challenges in scalability for character-level models?
- Part of the mantra of deep learning is that domain expertise is no longer necessary. Is this really true or is knowledge about the fundamental properties of language necessary for C2NLU? Even if that expertise is not needed for feature engineering, is it needed to design model architectures, tasks and training regimes?
- NLP tasks are diverse, ranging from part-of-speech tagging over sentiment analysis to question answering. For which of these problems is C2NLU a promising approach, for which not?
- More generally, what characteristics make an NLP problem amenable to be addressed using tokenization-based approaches vs. C2NLU approaches?
- What specifically can each of the two communities involved - natural language processing and deep learning - contribute to C2NLU?
- Create an NLP/deep learning roadmap for research in C2NLU over the next 5--10 years.
- Heike Adel (LMU München, DE) [dblp]
- Parnia Bahar (RWTH Aachen, DE) [dblp]
- Phil Blunsom (University of Oxford, GB) [dblp]
- Ondrej Bojar (Charles University - Prague, CZ) [dblp]
- Fabienne Cap (Uppsala University, SE) [dblp]
- Ryan Cotterell (Johns Hopkins University - Baltimore, US) [dblp]
- Vera Demberg (Universität des Saarlandes, DE) [dblp]
- Kevin Duh (Johns Hopkins University - Baltimore, US) [dblp]
- Chris Dyer (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Desmond Elliott (University of Amsterdam, NL) [dblp]
- Manaal Faruqui (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Orhan Firat (Middle East Technical University - Ankara, TR) [dblp]
- Alexander M. Fraser (LMU München, DE) [dblp]
- Vladimir Golkov (TU München, DE) [dblp]
- Jan Hajic (Charles University - Prague, CZ) [dblp]
- Georg Heigold (DFKI - Kaiserslautern, DE) [dblp]
- Karl Moritz Hermann (Google DeepMind - London, GB) [dblp]
- Thomas Hofmann (ETH Zürich, CH) [dblp]
- Hang Li (Huawei Technologies - Hong Kong, HK) [dblp]
- Adam Lopez (University of Edinburgh, GB) [dblp]
- Marie-Francine Moens (KU Leuven, BE) [dblp]
- Hermann Ney (RWTH Aachen, DE) [dblp]
- Jan Niehues (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Laura Rimell (University of Cambridge, GB) [dblp]
- Helmut Schmid (LMU München, DE) [dblp]
- Martin Schmitt (LMU München, DE) [dblp]
- Hinrich Schütze (LMU München, DE) [dblp]
- Kristina Toutanova (Google - Seattle, US) [dblp]
- Thang Vu (Universität Stuttgart, DE) [dblp]
- Yadollah Yaghoobzadeh (LMU München, DE) [dblp]
- Francois Yvon (LIMSI - Orsay, FR) [dblp]
Klassifikation
- artificial intelligence / robotics
Schlagworte
- Natural language processing
- computational linguistics
- deep learning
- robustness in learning
- end-to-end learning
- machine learning