A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition

نویسندگان

  • Sujan Kumar Saha
  • Sudeshna Sarkar
  • Pabitra Mitra
چکیده

We describe our effort in developing a Named Entity Recognition (NER) system for Hindi using Maximum Entropy (MaxEnt) approach. We developed a NER annotated corpora for the purpose. We have tried to identify the most relevant features for Hindi NER task to enable us to develop an efficient NER from the limited corpora developed. Apart from the orthographic and collocation features, we have experimented on the efficiency of using gazetteer lists as features. We also worked on semi-automatic induction of context patterns and experimented with using these as features of the MaxEnt method. We have evaluated the performance of the system against a blind test set having 4 classes Person, Organization, Location and Date. Our system achieved a f-value of 81.52%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Approach for Named Entity Recognition in Indian Languages

In this paper we describe a hybrid system that applies maximum entropy model (MaxEnt), language specific rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we first build a baseline NER system. Then some language specific rules are added to th...

متن کامل

A Hybrid Named Entity Recognition System for South and South East Asian Languages

In this paper we describe a hybrid system that applies Maximum Entropy model (MaxEnt), language specific rules and gazetteers to the task of Named Entity Recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with Named Entity (NE) annotated corpora and a set of features we first build a baseline NER system. Then some language specific rules are added to th...

متن کامل

A Two Stage Language Independent Named Entity Recognition for Indian Languages

This paper describes about the development of a two stage hybrid Named Entity Recognition (NER) system for Indian Languages particularly for Hindi, Oriya, Bengali and Telugu. We have used both statistical Maximum Entropy Model (MaxEnt) and Hidden Markov Model (HMM) in this system. We have used variety of features and contextual information for predicting the various Named Entity (NE) classes. T...

متن کامل

Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER

Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data. Methods like Maximum Entropy and Conditional Random Fields make use of features for the training purpose. These methods tend to overfit when the available training corpus is limited especially if the number of features is large or the number of values for a feature is large. To overcome thi...

متن کامل

Feature Subset Selection Using Genetic Algorithm for Named Entity Recognition

In this paper, genetic algorithm (GA) is utilized to search for the appropriate feature combination for constructing a maximum entropy (ME) based classifier for named entity recognition (NER). Features are encoded in the chromosomes. The ME classifier is evaluated for the 3-fold cross validation with the features, encoded in a particular chromosome, and its average F-measure value is used as th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008