Deep embodiment: grounding semantics in perceptual modalities
نویسندگان
چکیده
Multi-modal distributional semantic models address the fact that text-based semantic models, which represent word meanings as a distribution over other words, suffer from the grounding problem. This thesis advances the field of multi-modal semantics in two directions. First, it shows that transferred convolutional neural network representations outperform the traditional bag of visual words method for obtaining visual features. It is then shown that these representations may be applied successfully to various natural language processing tasks. Second, it performs the first ever experiments with grounding in the non-visual modalities of auditory and olfactory perception using raw data. Deep learning, a natural fit for deriving grounded representations, is used to obtain the highest-quality representations compared to more traditional approaches. Multi-modal representation learning leads to improvements over language-only models in a variety of tasks. If we want to move towards human-level artificial intelligence, we will need to build multi-modal models that represent the full complexity of human meaning, including its grounding in our various perceptual modalities.
منابع مشابه
Grounded Distributional Semantics for Abstract Words
Since Harnad (1990) pointed out the symbol grounding problem, cognitive science research has demonstrated that grounding in perceptual or sensorimotor experience is crucial to language. Recent embodied cognition theories have argued that language is more important for grounding abstract than concrete words; abstract words are grounded via language. Distributional semantics has recently addresse...
متن کاملMulti- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception
Multi-modal semantics has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics, including measuring conceptual similarity and relatedness. We also evaluate cross-modal mappings, through a zero-shot learning task mapping between linguistic and auditory...
متن کاملLearning Neural Audio Embeddings for Grounding Semantics in Auditory Perception
Multi-modal semantics, which aims to ground semantic representations in perception, has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics. After having shown the quality of such auditorily grounded representations, we show how they can be applied t...
متن کاملVirtual Embodiment: A Scalable Long-Term Strategy for Artificial Intelligence Research
Meaning has been called the “holy grail” of a variety of scientific disciplines, ranging from linguistics to philosophy, psychology and the neurosciences [1]. The field of Artifical Intelligence (AI) is very much a part of that list: the development of sophisticated natural language semantics is a sine qua non for achieving a level of intelligence comparable to humans. Embodiment theories in co...
متن کاملGrounding Semantics in Olfactory Perception
Multi-modal semantics has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in olfactory (smell) data, through the construction of a novel bag of chemical compounds model. We use standard evaluations for multi-modal semantics, including measuring conceptual similarity and cross-modal zero-shot learning. To our knowledge, ...
متن کامل