parts of speech tagging

نتایج جستجو برای: parts of speech tagging

تعداد نتایج: 21177608 فیلتر نتایج به سال:

What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages

2014

Long Duong Trevor Cohn Karin M. Verspoor Steven Bird Paul Cook

In this paper we address the problem of multilingual part-of-speech tagging for resource-poor languages. We use parallel data to transfer part-of-speech information from resource-rich to resourcepoor languages. Additionally, we use a small amount of annotated data to learn to “correct” errors from projected approach such as tagset mismatch between languages, achieving state-of-the-art performan...

متن کامل

Learning when to trust distant supervision: An application to low-resource POS tagging using cross-lingual projection

2016

Meng Fang Trevor Cohn

Cross lingual projection of linguistic annotation suffers from many sources of bias and noise, leading to unreliable annotations that cannot be used directly. In this paper, we introduce a novel approach to sequence tagging that learns to correct the errors from cross-lingual projection using an explicit debiasing layer. This is framed as joint learning over two corpora, one tagged with gold st...

متن کامل

Part of Speech Tagging for Mongolian Corpus

2009

Purev Jaimai Odbayar Chimeddorj

This paper introduces the current result of a research work which aims to build a 5 million tagged word corpus for Mongolian. Currently, around 1 million words have been automatically tagged by developing a POS tagset and a bigram POS tagger.

متن کامل

Cross-Lingual POS Tagging through Ambiguous Learning: First Experiments (Apprentissage partiellement supervisé d'un étiqueteur morpho-syntaxique par transfert cross-lingue) [in French]

2014

Guillaume Wisniewski Nicolas Pécheux Elena Knyazeva Alexandre Allauzen François Yvon

When Part-of-Speech annotated data is scarce, e.g. for under resourced languages, one can turn to crosslingual transfer and crawled dictionaries to collect partially supervised data. We cast this problem in the framework of ambiguous learning and show how to learn an accurate history-based model. This method is evaluated on four languages and yields improvements over state-of-the-art for three ...

متن کامل

Improving POS Tagging in Old Spanish Using TEITOK

2017

Maarten Janssen Josep Ausensi Josep Fontana

In this paper, we describe how the TEITOK corpus tools helped to create a diachronic corpus for Old Spanish that contains both paleographic and linguistic information, which is easy to use for nonspecialists, and in which it is easy to perform manual improvements to automatically assigned POS tags and lemmas.

متن کامل

Automatic Segmentation and Labeling for Continuous Number Recognition

2006

S.A.R. AL-HADDAD SALINA ABDUL SAMAD AINI HUSSEIN

Abstrac:This study is focused on continuous number speech recognition with the intention to distinguish speech and non-speech segments and segment it as one digit. This study proposes an algorithm for automatic segmentation of male and female voiced speech. The calculations of log energy and zero rate crossing are used to process speech samples to accomplish the segmentation. The thresholds are...

متن کامل

Improving Part-of-Speech Tagging for NLP Pipelines

Journal: :CoRR 2017

Vishaal Jatav Ravi Teja Srini Bharadwaj Venkat Srinivasan

This paper outlines the results of sentence level linguistics based rules for improving part-of-speech tagging. It is well known that the performance of complex NLP systems is negatively affected if one of the preliminary stages is less than perfect. Errors in the initial stages in the pipeline have a snowballing effect on the pipeline’s end performance. We have created a set of linguistics bas...

متن کامل

An Overview of Data-Driven Part-of-Speech Tagging

2016

Dan TUFIŞ

Over the last twenty years or so, the approaches to partof-speech tagging based on machine learning techniques have been developed or ported to provide high-accuracy morpho-lexical annotation for an increasing number of languages. Given the large number of morpho-lexical descriptors for a morphologically complex language, one has to consider ways to avoid the data sparseness threat in standard ...

متن کامل

Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey

2010

Dinesh Kumar Gurpreet Singh Josan

The problem of tagging in natural language processing is to find a way to tag every word in a text as a particular part of speech, e.g., proper pronoun. POS tagging is a very important preprocessing task for language processing activities. This paper reports about the Part of Speech (POS) taggers proposed for various Indian Languages like Hindi, Punjabi, Malayalam, Bengali and Telugu. Various p...

متن کامل

BKTreebank: Building a Vietnamese Dependency Treebank

Journal: :CoRR 2017

Kiem-Hieu Nguyen

Dependency treebank is an important resource in any language. In this paper, we present our work on building BKTreebank, a dependency treebank for Vietnamese. Important points on designing POS tagset, dependency relations, and annotation guidelines are discussed. We describe experiments on POS tagging and dependency parsing on the treebank. Experimental results show that the treebank is a usefu...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید