parts of speech tagging

Using Prior Probabilities and Density Estimation for Relational Classification

1998

James Cussens

A Bayesian method for incorporating probabilistic background knowledge into ILP is presented. Positive only learning is extended to allow density estimation. Estimated densities and deened prior are combined in Bayes theorem to perform relational classiication. An initial application of the technique is made to part-of-speech (POS) tagging. A novel use of Gibbs sampling for POS tagging is given.

متن کامل

Supertagging With LSTMs

2016

Ashish Vaswani Yonatan Bisk Kenji Sagae Ryan Musa

In this paper we present new state-of-the-art performance on CCG supertagging and parsing. Our model outperforms existing approaches by an absolute gain of 1.5%. We analyze the performance of several neural models and demonstrate that while feed-forward architectures can compete with bidirectional LSTMs on POS tagging, models that encode the complete sentence are necessary for the long range sy...

متن کامل

An Empirical Exploration of Skip Connections for Sequential Tagging

2016

Huijia Wu Jiajun Zhang Chengqing Zong

In this paper, we empirically explore the effects of various kinds of skip connections in stacked bidirectional LSTMs for sequential tagging. We investigate three kinds of skip connections connecting to LSTM cells: (a) skip connections to the gates, (b) skip connections to the internal states and (c) skip connections to the cell outputs. We present comprehensive experiments showing that skip co...

متن کامل

Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

2016

Burcu Can Ahmet Üstün Murathan Kurfali

Sparsity is one of the major problems in natural language processing. The problem becomes even more severe in agglutinating languages that are highly prone to be inflected. We deal with sparsity in Turkish by adopting morphological features for part-of-speech tagging. We learn inflectional and derivational morpheme tags in Turkish by using conditional random fields (CRF) and we employ the morph...

متن کامل

Different Flavors of GUM: Evaluating Genre and Sentence Type Effects on Multilayer Corpus Annotation Quality

2016

Amir Zeldes Dan Simonson

Genre and domain are well known covariates of both manual and automatic annotation quality. Comparatively less is known about the effect of sentence types, such as imperatives, questions or fragments, and how they interact with text type effects. Using mixed effects models, we evaluate the relative influence of genre and sentence types on automatic and manual annotation quality for three relate...

متن کامل

A Two-Stage Approach to Chinese Part-of-Speech Tagging

2008

Aitao Chen Ya Zhang Gordon Sun

This paper describes a Chinese part-ofspeech tagging system based on the maximum entropy model. It presents a novel two-stage approach to using the part-ofspeech tags of the words on both sides of the current word in Chinese part-of-speech tagging. The system is evaluated on four corpora at the Fourth SIGHAN Bakeoff in the close track of the Chinese part-ofspeech tagging task.

متن کامل

Part Of Speech Tagging Using A Hybrid System

2005

Sean Finney Mark Angelillo

A procedure is proposed for tagging part of speech using a hybrid system that consists of a statistical based rule finder and a genetic algorithm which decides how to use those rules. This procedure will try to improve upon an already very good method of part of speech tagging.

متن کامل

the impact of musical texts on the text recall of young learners of english in isfahan junior high schools

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه شیخ بهایی - دانشکده زبانهای خارجی 1392

مرتضی ازادی, محمدحسن تحریریان,

abstract although music possesses some kind of power and using it has been welcome by many students in language classrooms, it seems that they take a non-serious image of the lesson while listening to songs and they may think that it is a matter of fun. the main objective of the present study was to investigate whether learning a foreign language through musical texts (songs) can have an impac...

15 صفحه اول

Combining Multiple Classifiers to Improve Part of Speech Tagging: A Case Study for Brazilian Portuguese

2000

Rachel V. Xavier Aires Sandra M. Aluísio Denise C. S. Kuhn Marcio L. B. Andreeta Osvaldo N. Oliveira

Four taggers have been trained on a 100,000-word corpus of Brazilian Portuguese, namely Unigram (Treetagger), N-gram (Treetagger), transformationbased (TBL) and Maximum-Entropy tagging (MXPOST). The latter displayed the best accuracy (88.73%), which is still much lower than the state-of-the-art accuracy for English. The low accuracy is attributed to the reduced size of the training corpus. Twel...

متن کامل

Decision Strategies for Incremental POS Tagging

2011

Niels Beuck Arne Köhn Wolfgang Menzel

In an incremental NLP pipeline every module needs to work incrementally. However, an incremental processing mode can lead to a degradation of accuracy due to the missing context to the right. We discuss three properties of incremental output that can be traded for accuracy, namely timeliness, monotonicity and decisiveness. The consequences of these trade-offs are evaluated systematically for th...

متن کامل