Identifying Prepositional Phrases in Chinese Patent Texts with Rule-based and CRF Methods
نویسندگان
چکیده
Identification of prepositional phrases (PP) has been an issue in the field of Natural Language Processing (NLP). In this paper, towards Chinese patent texts, we present a rule-based method and a CRF-based method to identify the PPs. In the rule-based method, according to the special features and expressions of PPs, we manually write targeted formal identification rules; in the CRF approach, after labelling the sentences with features, a typical CRF toolkit is exploited to train the model for identifying PPs. We then conduct some experiments to test the performance of the two methods, and final precision rates are over 90%, indicating the proposed methods are effective and feasible.
منابع مشابه
A CRF Method of Identifying Prepositional Phrases in Chinese Patent Texts
This paper presents a Conditional Random Field (CRF) method of identifying prepositional phrases (PP) in Chinese patent documents. By using the CRF model, the identification process can be recognized as sequence labelling issue. After analyzing the characteristics of PP chunks in large scale corpus, we design several essential and helpful features and feature templates for recognizing PP chunks...
متن کاملA new approach to identifying Chinese maximal-length phrases by combining bidirectional labeling
Chinese maximal-length phrases (maximal-length noun phrases and prepositional phrases) possess notable linguistic properties. Bidirectional labeling results of the Chinese maximal-length phrases obtained by sequential classifiers reveal the complementary properties in the two directions of Chinese sentences. In this paper, both left-right and right-left sequential labeling are used to identify ...
متن کاملCFN - based Semantic Role Labeling of Chinese Prepositional Phrase ⋆
Prepositional Phrases are often among the most frequent expressions in Chinese, but they have been ignored on the grounds of being syntactically promiscuous and semantically vacuous, and relegated to the ignominious rank of “stop word”. The Chinese FrameNet (CFN) is a lexical resource project developed by Shanxi University, Taiyuan, based on the principles of Frame Semantics and supported by co...
متن کاملImprovement of CRF-Based Accent Sandhi Prediction Using The Features Derived from Accent Rules
When developing Japanese text-to-speech (TTS) systems, algorithms to accurately predict accent types of each constituent phrase is essential for better output speech quality. In our previous studies on the accent type estimation, a CRF-based method was realized. Although this method outperformed the conventional rule-based method, the estimation accuracy of particular phrases such as those incl...
متن کاملGoal-Source Asymmetry and Russian Spatial Prefixes
In this paper, I draw on data from Russian to argue for an asymmetry between Goal and Source prepositional phrases. Source prepositional phrases are structurally ambiguous; they can occur both as arguments and adjuncts in certain syntactic contexts. Goal prepositional phrases are unambiguously arguments. I claim that Source prepositions have lexically specified semantics, which determines their...
متن کامل