Tagging Complex Non-Verbal German Chunks with Conditional Random Fields
نویسندگان
چکیده
We report on chunk tagging methods for German that recognize complex non-verbal phrases using structural chunk tags with Conditional Random Fields (CRFs). This state-of-the-art method for sequence classification achieves 93.5% accuracy on newspaper text. For the same task, a classical trigram tagger approach based on Hidden Markov Models reaches a baseline of 88.1%. CRFs allow for a clean and principled integration of linguistic knowledge such as part-of-speech tags, morphological constraints and lemmas. The structural chunk tags encode phrase structures up to a depth of 3 syntactic nodes. They include complex prenominal and postnominal modifiers that occur frequently in German noun phrases.
منابع مشابه
Studies for Segmentation of Historical Texts: Sentences or Chunks?
We present some experiments on text segmentation for German texts aimed at developing a method of segmenting historical texts. Since such texts have no (consistent) punctuation, we use a machine learning approach to label tokens with their relative positions in text segments using Conditional Random Fields. We compare the performance of this approach on the task of segmenting of text into sente...
متن کاملSemantic Tagging of Web Search Queries
We present a novel approach to parse web search queries for the purpose of automatic tagging of the queries. We will define a set of probabilistic context-free rules, which generates bags (i.e. multi-sets) of words. Using this new type of rule in combination with the traditional probabilistic phrase structure rules, we define a hybrid grammar, which treats each search query as a bag of chunks (...
متن کاملPart of Speech Tagging for Amharic using Conditional Random Fields
We applied Conditional Random Fields (CRFs) to the tasks of Amharic word segmentation and POS tagging using a small annotated corpus of 1000 words. Given the size of the data and the large number of unknown words in the test corpus (80%), an accuracy of 84% for Amharic word segmentation and 74% for POS tagging is encouraging, indicating the applicability of CRFs for a morphologically complex la...
متن کاملMidterm Report for National Undergraduate Innovational Experimental Program Hierarchical Conditional Random Fields for Chinese Part-Of-Speech Tagging
We explore methods to implement Conditional Random Fields (CRF) for Chinese Part-Of-Speech Tagging. We focus on the task of POS tagging without pre-segmentation, and propose a hierarchical Conditional Random Fields to do Segmenta-tion and POS Tagging at one time step. Experiments are going to be done for my method to compare it with existent methods on this task.
متن کاملArabic Named Entity Recognition using Conditional Random Fields
The Named Entity Recognition (NER) task consists in determining and classifying proper names within an open-domain text. This Natural Language Processing task proved to be harder for languages with a complex morphology such as the Arabic language. NER was also proved to help Natural Language Processing tasks such as Machine Translation, Information Retrieval and Question Answering to obtain a h...
متن کامل