Cross-Lingual POS Tagging through Ambiguous Learning: First Experiments (Apprentissage partiellement supervisé d'un étiqueteur morpho-syntaxique par transfert cross-lingue) [in French]

نویسندگان

  • Guillaume Wisniewski
  • Nicolas Pécheux
  • Elena Knyazeva
  • Alexandre Allauzen
  • François Yvon
چکیده

When Part-of-Speech annotated data is scarce, e.g. for under resourced languages, one can turn to crosslingual transfer and crawled dictionaries to collect partially supervised data. We cast this problem in the framework of ambiguous learning and show how to learn an accurate history-based model. This method is evaluated on four languages and yields improvements over state-of-the-art for three of them, with gains up to 3.9% absolute or 35.8% relative. Mots-clés : apprentissage partiellement supervisé, analyse morpho-syntaxique, transfert cross-lingue.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Better pos-tagging for "que" through targeted features and rules (Améliorer l'étiquetage de "que" par les descripteurs ciblés et les règles) [in French]

Robust statistical NLP tools, and in particular pos-taggers, often use knowledge-poor features, which are easily applicable to any language but do not look beyond 1 or 2 tokens to the right and left and do not make use of syntactic equivalence classes. Although pos-tagging tends to get high accuracy scores (around 97%), the remaining 3% errors systematically result in a 3% loss in parsing accur...

متن کامل

Détection et correction automatique d'erreurs d'annotation morpho-syntaxique du French TreeBank (Detecting and Correcting POS Annotation in the French TreeBank) [in French]

Detecting and correcting POS annotation in the French TreeBank The quality of the Part-Of-Speech (POS) annotation in a corpus has a large impact on training and evaluating POS taggers. In this paper, we present a series of experiments that we have conducted on automatically detecting and correcting annotation errors in the French TreeBank. Two methods are used. The first simply relies on identi...

متن کامل

Logiciel d'aide à l'étiquetage morpho-syntaxique de textes de spécialité

Résumé. La compréhension de textes de spécialité nécessite un étiquetage morpho-syntaxique de bonne qualité. Or, lorsque les textes étudiés sont issus de domaines spécifiques et peu usités, il est rare de disposer de dictionnaires et autres ressources lexicales fiables. Le logiciel que nous proposons permet d’utiliser un étiquetage réalisé par un étiqueteur généraliste, puis d’améliorer cet éti...

متن کامل

Un segmenteur-étiqueteur et un chunker pour le français (A Segmenter-POS Labeller and a Chunker for French) [in French]

A Segmenter-POS Labeller and a Chunker for French We propose a demo of two softwares : a Segmenter-POS Labeller for French and a Chunker for texts treated by the first program. Both have been learned from the French Tree Bank. MOTS-CLÉS : étiquetage POS, chunking, apprentissage automatique, French Tree Bank, CRF.

متن کامل

Building Monolingual Comparable and Annotated Corpora: An experimental study from a pos tagged corpus (Construire un corpus monolingue annoté comparable Expérience à partir d'un corpus annoté morpho-syntaxiquement) [in French]

This work is motivated by the will of creating a new part-of-speech annotated corpus in French from an existing one. We propose a general and operational definition of the comparability relation between annotated monolingual corpora. We also propose a comparability measure and a procedure to build semi-automatically a comparable corpus from a source one. We study the use of the perplexity (info...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014