Probabilistic Named Entity Verification
نویسندگان
چکیده
Named entity (NE) recognition is an important task for many natural language applications, such as Internet search engines, document indexing, information extraction and machine translation. Moreover, in oriental languages (such as Chinese, Japanese and Korean), NE recognition is even more important because it significantly affects the performance of word segmentation, the most fundamental task for understanding the texts in oriental languages. In this paper, a probabilistic verification model is designed for verifying the correctness of a named entity candidate. This model assesses the confidence level of a candidate not only according to the candidate’s structure but also according to its context. In our design, the clues for confidence measurement are collected from both positive and negative examples in the training data in a statistical manner. Experimental results show that the proposed method significantly improves the F-measure of Chinese personal name recognition from 86.5% to 94.4%.
منابع مشابه
A Trust Based Probabilistic Method for Efficient Correctness Verification in Database Outsourcing
Correctness verification of query results is a significant challenge in database outsourcing. Most of the proposed approaches impose high overhead, which makes them impractical in real scenarios. Probabilistic approaches are proposed in order to reduce the computation overhead pertaining to the verification process. In this paper, we use the notion of trust as the basis of our probabilistic app...
متن کاملNamed Entity Recognition Using a Character-based Probabilistic Approach
We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps word...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملNamed Entity Learning and Verification: Expectation Maximization in Large Corpora
The regularity of named entities is used to learn names and to extract named entities. Having only a few name elements and a set of patterns the a lgorithm learns new names and its elements. A verification step assures quality using a large background corpus. Further improvement is reached through classifying the newly learnt elements on character level. Moreover, unsupervised rule learning is ...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کامل