(Better than) State-of-the-Art PoS-tagging for Italian Texts
نویسنده
چکیده
English. This paper presents some experiments for the construction of an highperformance PoS-tagger for Italian using deep neural networks techniques (DNN) integrated with an Italian powerful morphological analyser. The results obtained by the proposed system on standard datasets taken from the EVALITA campaigns show large accuracy improvements when compared with previous systems from the literature. Italiano. Questo contributo presenta alcuni esperimenti per la costruzione di un PoS-tagger ad alte prestazioni per l’italiano utilizzando reti neurali ‘deep’ integrate con un potente analizzatore morfologico. I risultati ottenuti sui dataset delle campagne EVALITA da parte del sistema proposto mostrano incrementi di accuratezza piuttosto rilevanti in confronto ai precedenti sistemi in letteratura.
منابع مشابه
bot.zen @ EVALITA 2016 - A minimally-deep learning PoS-tagger (trained for Italian Tweets)
English. This article describes the system that participated in the POS tagging for Italian Social Media Texts (PoSTWITA) task of the 5th periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language EVALITA 2016. The work is a continuation of Stemle (2016) with minor modifications to the system and different data sets. It combines a small assertion...
متن کاملSimpler unsupervised POS tagging with bilingual projections
We present an unsupervised approach to part-of-speech tagging based on projections of tags in a word-aligned bilingual parallel corpus. In contrast to the existing state-of-the-art approach of Das and Petrov, we have developed a substantially simpler method by automatically identifying “good” training sentences from the parallel corpus and applying self-training. In experimental results on eigh...
متن کاملCharacter Embeddings PoS Tagger vs HMM Tagger for Tweets
English. The paper describes our submissions to the task on PoS tagging for Italian Social Media Texts (PoSTWITA) at Evalita 2016. We compared two approaches: a traditional HMM trigram Pos tagger and a Deep Learning PoS tagger using both character-level and word-level embeddings. The character-level embeddings performed better proving that they can provide a finer representation of words that a...
متن کاملNLP-NITMZ: Part-of-Speech Tagging on Italian Social Media Text using Hidden Markov Model
English. This paper describes our approach on Part-of-Speech tagging for Italian Social Media Texts (PoSTWITA), which is one of the task of EVALITA 2016 campaign. EVALITA is a evaluation campaign, where teams are participated and submit their systems towards the developing of tools related to Natural Language Processing (NLP) and Speech for Italian language. Our team NLP–NITMZ participated in t...
متن کاملThe TextPro Tool Suite
We present TextPro, a suite of modular Natural Language Processing (NLP) tools for analysis of Italian and English texts. The suite has been designed so as to integrate and reuse state of the art NLP components developed by researchers at FBK. The current version of the tool suite provides functions ranging from tokenization to chunking and Named Entity Recognition (NER). The system‟s architect...
متن کامل