Comparison of Unigram, Bigram, HMM and Brill’s POS Tagging Approaches for some South Asian Languages

نویسندگان

  • Fahim Muhammad Hasan
  • Naushad UzZaman
  • Mumit Khan
چکیده

Part-of-Speech (POS) Tagging is a process that attaches each word in a sentence with a suitable tag from a given set of tags. POS Tagging is important in various areas of Natural Language Processing. Different methods of automating the process have been developed and employed for English and other Western languages. Some similar work, most of which utilize the stochastic approaches for POS Tagging has also been done in the same area for South Asian languages. We experimented with some of the widelyused approaches for POS Tagging on three South Asian languages, Bangla, Hindi and Telegu, using corpora of different sizes. We observed the performance of the approaches and found the Brill’s transformation based tagger’s performance to be superior to the other approaches in all of our experiments, though the use of this approach has been very limited until recently.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of different POS Tagging Techniques (N-Gram, HMM and Brill’s tagger) for Bangla

There are different approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. In this paper we compare the performance of a few POS tagging techniques for Bangla language, e.g. statistical approach (n-gram, HMM) and transformation based approach (Brill’s tagger). A supervised POS tagging approach requires a large amoun...

متن کامل

Comparison of different POS Tagging Techniques ( -Gram, HMM and Brill’s tagger) for Bangla

There are different approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. In this paper we compare the performance of a few POS tagging techniques for Bangla language, e.g. statistical approach (n-gram, HMM) and transformation based approach (Brill’s tagger). A supervised POS tagging approach requires a large amoun...

متن کامل

Training and Evaluation of POS Taggers on the French MULTITAG Corpus

The explicit introduction of morphosyntactic information into statistical machine translation approaches is receiving an important focus of attention. The current freely available Part of Speech (POS) taggers for the French language are based on a limited tagset which does not account for some flectional particularities. Moreover, there is a lack of a unified framework of training and evaluatio...

متن کامل

Part of Speech Tagging for English Text Data

A variety of Natural Language Processing (NLP) tasks, such as named entity recognition, stemming and question answering, benefit from knowledge of the words syntactic categories or Partof-Speech (POS) [4][6]. POS taggers have been successfully applied to assign a single best POS to every word in a corpus [2][5][12]. This paper reports on the implementation and empiric comparison of three superv...

متن کامل

Proceedings of the IJCAI – 2007 Workshop On Shallow Parsing for South Asian

As part of the IJCAI workshop on ”Shallow Parsing for South Asian Languages”, a contest was held in which the participants trained and tested their shallow parsing systems for Hindi, Bengali and Telugu. This paper gives the complete account of the contest in terms of how the data for the three languages was released, the performances of the participating systems and an overview of the approache...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007