Tamil NER – Coping with Real Time Challenges
نویسنده
چکیده
This paper describes various challenges encountered while developing an automatic Named Entity Recognition (NER) using Conditional Random Fields (CRFs) for Tamil. We also discuss how we have overcome some of these challenges. Though most of the challenges in NER discussed here are common to many Indian languages, in this work the focus is on Tamil, a South Indian language belonging to Dravidian language family. The corpus used in this work is the web data. The web data consisted of news paper articles, articles on blog sites and other online web portals.
منابع مشابه
Communal proactive coping strategies among Tamil refugees in Norway: A case study in a naturalistic setting
BACKGROUND An exclusive focus on individual or family coping strategies may be inadequate for people whose major point of concern may be collective healing on a more communal level. METHODS To our knowledge, the current study is the first to make use of ethnographic fieldwork methods to investigate this type of coping as a process in a natural setting over time. Participant observation was em...
متن کاملHITS@FIRE task 2015: Twitter based Named Entity Recognizer for Indian Languages
Natural Language processing (NLP) in its pure sense, is a platform that provides the ability for transforming natural language text to useful information. Named Entity Recognition (NER) is a key task in NLP for classification of named entities in natural languages. Though, there are several algorithms for named entity classification, identifying named entities in twitter data is a demanding tas...
متن کاملAMRITA_CEN-NLP@FIRE 2015: CRF Based Named Entity Extractor For Twitter Microposts
1 ABSTRACT This proposed method implements the Named Entity Recognition (NER) for four dialects Such as English, Tamil, Malayalam, and Hindi. The results obtained from this work are submitted to a research evaluation workshop Forum for Information Retrieval and Evaluation (FIRE 2015). It is single-layered problem which is divided into multi-layered this step is called pre-processing; it has thr...
متن کاملCross-Lingual Named Entity Recognition via Wikification
Named Entity Recognition (NER) models for language L are typically trained using annotated data in that language. We study cross-lingual NER, where a model for NER in L is trained on another, source, language (or multiple source languages). We introduce a language independent method for NER, building on cross-lingual wikification, a technique that grounds words and phrases in nonEnglish text in...
متن کاملسیستم شناسایی و طبقه بندی اسامی در متون فارسی
Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...
متن کامل