نتایج جستجو برای: parts of speech tagging

تعداد نتایج: 21177608  

2014
Long Duong Trevor Cohn Karin M. Verspoor Steven Bird Paul Cook

In this paper we address the problem of multilingual part-of-speech tagging for resource-poor languages. We use parallel data to transfer part-of-speech information from resource-rich to resourcepoor languages. Additionally, we use a small amount of annotated data to learn to “correct” errors from projected approach such as tagset mismatch between languages, achieving state-of-the-art performan...

2016
Meng Fang Trevor Cohn

Cross lingual projection of linguistic annotation suffers from many sources of bias and noise, leading to unreliable annotations that cannot be used directly. In this paper, we introduce a novel approach to sequence tagging that learns to correct the errors from cross-lingual projection using an explicit debiasing layer. This is framed as joint learning over two corpora, one tagged with gold st...

2009
Purev Jaimai Odbayar Chimeddorj

This paper introduces the current result of a research work which aims to build a 5 million tagged word corpus for Mongolian. Currently, around 1 million words have been automatically tagged by developing a POS tagset and a bigram POS tagger.

2014
Guillaume Wisniewski Nicolas Pécheux Elena Knyazeva Alexandre Allauzen François Yvon

When Part-of-Speech annotated data is scarce, e.g. for under resourced languages, one can turn to crosslingual transfer and crawled dictionaries to collect partially supervised data. We cast this problem in the framework of ambiguous learning and show how to learn an accurate history-based model. This method is evaluated on four languages and yields improvements over state-of-the-art for three ...

2017
Maarten Janssen Josep Ausensi Josep Fontana

In this paper, we describe how the TEITOK corpus tools helped to create a diachronic corpus for Old Spanish that contains both paleographic and linguistic information, which is easy to use for nonspecialists, and in which it is easy to perform manual improvements to automatically assigned POS tags and lemmas.

2006
S.A.R. AL-HADDAD SALINA ABDUL SAMAD AINI HUSSEIN

Abstrac:This study is focused on continuous number speech recognition with the intention to distinguish speech and non-speech segments and segment it as one digit. This study proposes an algorithm for automatic segmentation of male and female voiced speech. The calculations of log energy and zero rate crossing are used to process speech samples to accomplish the segmentation. The thresholds are...

Journal: :CoRR 2017
Vishaal Jatav Ravi Teja Srini Bharadwaj Venkat Srinivasan

This paper outlines the results of sentence level linguistics based rules for improving part-of-speech tagging. It is well known that the performance of complex NLP systems is negatively affected if one of the preliminary stages is less than perfect. Errors in the initial stages in the pipeline have a snowballing effect on the pipeline’s end performance. We have created a set of linguistics bas...

2016
Dan TUFIŞ

Over the last twenty years or so, the approaches to partof-speech tagging based on machine learning techniques have been developed or ported to provide high-accuracy morpho-lexical annotation for an increasing number of languages. Given the large number of morpho-lexical descriptors for a morphologically complex language, one has to consider ways to avoid the data sparseness threat in standard ...

2010
Dinesh Kumar Gurpreet Singh Josan

The problem of tagging in natural language processing is to find a way to tag every word in a text as a particular part of speech, e.g., proper pronoun. POS tagging is a very important preprocessing task for language processing activities. This paper reports about the Part of Speech (POS) taggers proposed for various Indian Languages like Hindi, Punjabi, Malayalam, Bengali and Telugu. Various p...

Journal: :CoRR 2017
Kiem-Hieu Nguyen

Dependency treebank is an important resource in any language. In this paper, we present our work on building BKTreebank, a dependency treebank for Vietnamese. Important points on designing POS tagset, dependency relations, and annotation guidelines are discussed. We describe experiments on POS tagging and dependency parsing on the treebank. Experimental results show that the treebank is a usefu...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید