Thai Named Entity Extraction by incorporating Maximum Entropy Model with Simple Heuristic Information

نویسندگان

  • Hutchatai Chanlekha
  • Asanee Kawtrakul
چکیده

The role of Named entity (NE) extraction is very important in many NLP tasks, such as information extraction, etc. In Thai, the problems of NE extraction are much more difficult due to the characteristics of Thai language, that are lack of orthographical information to signal NEs, and no boundary indicator between words. In this paper, we present Thai NE extraction system by using Maximum Entropy model, with heuristic information and dictionary. Our system is divided into three steps. The first step is to identify the boundary of candidate NE that composes of many words by using heuristic rules, dictionary and statistic of word cooccurrence. The second step is NE extraction by using Maximum Entropy model. The final step is to extract the undiscovered NE by matching the extracted NEs against the rest of document. On Thai political news test data, the evaluation of the system shows that the Fmeasures of person, location, and organization names are 90.44%, 82.16% and 89.87% respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning

Named entity (NE) recognition is a task in which proper nouns and numerical information in a document are detected and classified into categories such as person, organization, location, and date. NE recognition plays an essential role in information extraction systems and question answering systems. It is well known that hand-crafted systems with a large set of heuristic rules are difficult to ...

متن کامل

Named Entity Recognition using Maximum Entropy Models on Biologists’ Literature

According to the explosion of online biomedical texts, it becomes more difficult to get exact information manually. The named entity recognition is the very first step for further text mining tasks like information extraction, knowledge discovery and others. In this paper, we present our statistical named entity recognition method. Until now, there were some approaches using different statistic...

متن کامل

Named Entity Recognition for Indian Languages: A Survey

Named Entity Recognition (NER) is a sub task of Information Extraction (IE) used to identify and classify the names in any given data. Earlier studies were mostly based on hand written rules where as now-a-days Machine Learning models such as Hidden Markov Model (HMM), Maximum Entropy (MaxEnt), Maximum Entropy Markov model (MEMM), Support Vector Machine (SVM), Conditional Random Fields (CRFs) a...

متن کامل

Statistical Named Entity Recognizer Adaptation

Named entity recognition (NER) is a subtask of widely-recognized utility of information extraction (IE). NER has been explored in depth to provide rapid characterization of newswire data (Sundheim, 1995; Palmer and Day, 1997). The NER task involves both identification of spans of text referring to named entities, and categorization of these entities into classes based on the role they fill in c...

متن کامل

Extracting Caller Information from Voicemail

In this paper we address the problem of extracting the identities and phone numbers of the callers in voicemail messages. Previous work in information extraction from speech includes spoken document retrieval and named entity detection. This task differs from the named entity task in that the information we are interested in is a subset of the named entities in the message, and consequently, th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004