Textual Entailmaint Recognition using Word Overlap, Mutual Information and Subpath Set
نویسندگان
چکیده
When two texts have an inclusion relation, the relationship between them is called entailment. The task of mechanically distinguishing such a relation is called recognising textual entailment (RTE), which is basically a kind of semantic analysis. A variety of methods have been proposed for RTE. However, when the previous methods were combined, the performances were not clear. So, we utilized each method as a feature of machine learning, in order to combine methods. We have dealt with the binary classification problem of two texts exhibiting inclusion, and proposed a method that uses machine learning to judge whether the two texts present the same content. We have built a program capable to perform entailment judgment on the basis of word overlap, i.e. the matching rate of the words in the two texts, mutual information, and similarity of the respective syntax trees (Subpath Set). Word overlap was calclated by utilizing BiLingual Evaluation Understudy (BLEU). Mutual information is based on co-occurrence frequency, and the Subpath Set was determined by using the Japanise WordNet. A ConfidenceWeighted Score of 68.6% was obtained in the mutual information experiment on RTE. Mutual information and the use of three methods of SVM were shown to be effective.
منابع مشابه
A Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملWhole-book recognition using mutual-entropy-driven model adaptation
We describe an approach to unsupervised high-accuracy recognition of the textual contents of an entire book using fully automatic mutual-entropy-based model adaptation. Given images of all the pages of a book together with approximate models of image formation (e.g. a character-image classifier) and linguistics (e.g. a word-occurrence probability model), we detect evidence for disagreements bet...
متن کاملمدلسازی بازشناسی واجی کلمات فارسی
Abstract of spoken word recognition is proposed. This model is particularly concerned with extraction of cues from the signal leading to a specification of a word in terms of bundles of distinctive features, which are assumed to be the building blocks of words. In the model proposed, auditory input is chunked into a set of successive time slices. It is assumed that the derivation of the underly...
متن کاملRecognizing Textual Entailment with Statistical Methods a Thesis in Partial Fulfilment of the Requirements for the Degree of Master of Science in Computer Science
We study statistical methods based on the use of information retrieved from the Web in attempt to solve two Natural Language Processing tasks: Word Sense Disambiguation and Recognizing Textual Entailment. For Word Sense Disambiguation, we present a measure for semantic relatedness based on the simple Lesk algorithm. We measure kind of mutual information between the gloss of each sense of the wo...
متن کاملLarge Scale Mmie Training for Conversational Telephone Speech Recognition
This paper describes a lattice-based framework for maximum mutual information estimation (MMIE) of HMM parameters which has been used to train HMM systems for conversational telephone speech transcription using up to 265 hours of training data. These experiments represent the largest-scale application of discriminative training techniques for speech recognition of which the authors are aware, a...
متن کامل