Feature Ablation for Preposition Disambiguation

نویسنده

  • Ken Litkowski
چکیده

The development of classification models for preposition disambiguation involves the generation of thousands of features describing the context of the preposition. The best modeling technique, supportvector machines, produce weights for each feature, but these weights are difficult to interpret and use for determining the most important features. One technique that can aid in this identification is feature ablation, removing features and assessing the effect on classification performance. We build upon standard approaches for feature ablation, removing feature sets one at a time, performing upwards of 5000 iterations for each preposition. We describe our algorithm in detail, including the detailed results that are generated with each iterate. We examine these results, suggesting that intuitions about how to describe preposition behavior might not hold. In particular, factors other than the syntactic and semantic properties of the complement and the governor frequently emerge as important. In this paper, we describe an algorithm for feature ablation using support-vector machine (SVM) modeling. In section 1, we provide background for SVM modeling used in preposition disambiguation, particularly identifying the types of features that are used. In section 2, we describe the algorithm that drills down through word-finding features, syntactic and semantic characterization features, and combinations of the two types. We describe the criteria used to identify feature sets to be ablated and the measures that are generated. In section 3, we examine these results and measures and how they might be used in characterizing preposition behavior. In section 4, we interpret the results and discuss the need for further investigations to use the results to aid in describing preposition behavior. 1. Features Used in SVM Modeling for Preposition Disambiguation As described in Litkowski (2014) and Litkowski (2016), the Pattern Dictionary of English Prepositions (PDEP) has processed 81509 sentences in three corpora using a lemmatizer, part-of-speech tagger, and dependency parser (Tratz and Hovy, 2011). Using the parse results, in CoNLL-X format, features are extracted to describe the context of a specified preposition in each sentence. Each feature consists of three components, a word position relative to the prepositions, a syntactic or semantic characterization of the element at the word position, and a value for the feature, depending on the word position and the type of characterization. We describe these features in more detail below. PDEP includes three corpora, collectively called the TPP Corpora (Litkowski, 2013a). The first was all FrameNet sentences (57 prepositions, 26739 instances), not just those used in SemEval (24 prepositions, which were divided into training and test sets). The second was a set of 20 sentences drawn from the Oxford English Corpus (OEC) to exemplify each sense in ODE, notably providing instances for multiword prepositional phrases (7485 sentences). The third was a set of sentences from the written

منابع مشابه

Exploiting Semantic Role Resources for Preposition Disambiguation

This article describes how semantic role resources can be exploited for preposition disambiguation. The main resources include the semantic role annotations provided by the Penn Treebank and FrameNet tagged corpora. The resources also include the assertions contained in the Factotum knowledge base, as well as information from Cyc and Conceptual Graphs. A common inventory is derived from these i...

متن کامل

MELB-YB: Preposition Sense Disambiguation Using Rich Semantic Features

This paper describes a maxent-based preposition sense disambiguation system entry to the preposition sense disambiguation task of the SemEval 2007. This system uses a wide variety of semantic and syntactic features to perform the disambiguation task and achieves a precision of 69.3% over the test data.

متن کامل

Models and Training for Unsupervised Preposition Sense Disambiguation

We present a preliminary study on unsupervised preposition sense disambiguation (PSD), comparing different models and training techniques (EM, MAP-EM with L0 norm, Bayesian inference using Gibbs sampling). To our knowledge, this is the first attempt at unsupervised preposition sense disambiguation. Our best accuracy reaches 56%, a significant improvement (at p <.001) of 16% over the most-freque...

متن کامل

What's in a Preposition? Dimensions of Sense Disambiguation for an Interesting Word Class

Choosing the right parameters for a word sense disambiguation task is critical to the success of the experiments. We explore this idea for prepositions, an often overlooked word class. We examine the parameters that must be considered in preposition disambiguation, namely context, features, and granularity. Doing so delivers an increased performance that significantly improves over two state-of...

متن کامل

Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Prepositions are very common and very ambiguous, and understanding their sense is critical for understanding the meaning of the sentence. Supervised corpora for the preposition-sense disambiguation task are small, suggesting a semi-supervised approach to the task. We show that signals from unannotated multilingual data can be used to improve supervised prepositionsense disambiguation. Our appro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016