PAMPO: using pattern matching and pos-tagging for effective Named Entities recognition in Portuguese

نویسندگان

  • Conceição Rocha
  • Alípio Mário Jorge
  • Roberta Sionara
  • Paula Brito
  • Carlos Pimenta
  • Solange Oliveira Rezende
چکیده

This paper deals with the entity extraction task (named entity recognition) of a text mining process that aims at unveiling non-trivial semantic structures, such as relationships and interaction between entities or communities. In this paper we present a simple and efficient named entity extraction algorithm. The method, named PAMPO (PAttern Matching and POs tagging based algorithm for NER), relies on flexible pattern matching, part-of-speech tagging and lexical-based rules. It was developed to process texts written in Portuguese, however it is potentially applicable to other languages as well. We compare our approach with current alternatives that support Named Entity Recognition (NER) for content written in Portuguese. These are Alchemy, Zemanta and Rembrandt. Evaluation of the efficacy of the entity extraction method on several texts written in Portuguese indicates a considerable improvement on recall and F1 measures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

Correcting Word Segmentation and Part-of-speech Tagging Errors for Chinese Named Entity Recognition

In the exploration of Chinese named entity recognition for a specific domain, the authors found that the errors caused during word segmentation and part-ofspeech (POS) tagging have obstructed the improvement of the recognition performance. In order to further enhance recognition recall and precision, the authors propose an error correction approach for Chinese named entity recognition. In the e...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

Joint Part-of-Speech Tagging and Named Entity Recognition Using Factor Graphs

We present a machine learning-based method for jointly labeling POS tags and named entities. This joint labeling is performed by utilizing factor graphs. The variables of part of speech and named entity labels are connected by factors so the tagger jointly determines the best labeling for the two labeling tasks. Using the feature sets of SZTENER and the POS-tagger magyarlanc, we built a model t...

متن کامل

Unsupervised Part-Of-Speech Tagging Supporting Supervised Methods

This paper investigates the utility of an unsupervised partof-speech (PoS) system in a task oriented way. We use PoS labels as features for different supervised NLP tasks: Word Sense Disambiguation, Named Entity Recognition and Chunking. Further we explore, how much supervised tagging can gain from unsupervised tagging. A comparative evaluation between variants of systems using standard PoS, un...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1612.09535  شماره 

صفحات  -

تاریخ انتشار 2016