Computational Integration of Human Vision and Natural Language through Bitext Alignment
نویسندگان
چکیده
Multimodal integration of visual and linguistic data is a longstanding but crucial challenge for modeling human understanding. We propose a framework that uses an unsupervised bitext alignment method to integrate visual and linguistic data. We present an empirical study of the various parameters of the framework. Our results exceed baselines using both exact and delayed temporal correspondence. The resulting alignments can be used for image classification and retrieval.
منابع مشابه
Posterior Regularization for Learning with Side Information and Weak Supervision
Supervised machine learning techniques have been very successful for a variety of tasks and domains including natural language processing, computer vision, and computational biology. Unfortunately, their use often requires creation of large problem-specific training corpora that can make these methods prohibitively expensive. At the same time, we often have access to external problem-specific i...
متن کاملVision-Language Integration in AI: A Reality Check
Multimodal human to human interaction requires integration of the contents/meaning of the modalities involved. Artificial Intelligence (AI) multimodal prototypes attempt to go beyond technical integration of modalities to this kind of meaning integration that allows for coherent, natural, “intelligent” communication with humans. Though bringing many multimedia-related AI research fields togethe...
متن کاملDetermining Entailment of Questions in the Quora Dataset
Automating the process of finding duplicate questions is one of the most challenging tasks in Natural Language Processing for knowledge-sharing platforms like Quora. An accurate predictor would better organize the forums and make searching and answering questions more efficient. In this paper, we explore the effectiveness of several models from Stanford Natural Language Inference publications o...
متن کاملA weighted finite state transducer translation template model for statistical machine translation
We present a Weighted Finite State Transducer Translation Template Model for statistical machine translation. This is a source-channel model of translation inspired by the Alignment Template translation model. The model attempts to overcome the deficiencies of word-to-word translation models by considering phrases rather than words as units of translation. The approach we describe allows us to ...
متن کاملBitext Alignment for Statistical Machine Translation
Bitext alignment is the task of finding translation equivalence between documents in two languages, collections of which are commonly known as bitext. This dissertation addresses the problems of statistical alignment at various granularities from sentence to word with the goal of creating Statistical Machine Translation (SMT) systems. SMT systems are statistical pattern processors based on para...
متن کامل