Coarse-grained Candidate Generation and Fine-grained Re-ranking for Chinese Abbreviation Prediction
نویسندگان
چکیده
Correctly predicting abbreviations given the full forms is important in many natural language processing systems. In this paper we propose a two-stage method to find the corresponding abbreviation given its full form. We first use the contextual information given a large corpus to get abbreviation candidates for each full form and get a coarse-grained ranking through graph random walk. This coarse-grained rank list fixes the search space inside the top-ranked candidates. Then we use a similarity sensitive re-ranking strategy which can utilize the features of the candidates to give a fine-grained re-ranking and select the final result. Our method achieves good results and outperforms the state-ofthe-art systems. One advantage of our method is that it only needs weak supervision and can get competitive results with fewer training data. The candidate generation and coarse-grained ranking is totally unsupervised. The re-ranking phase can use a very small amount of training data to get a reasonably good result.
منابع مشابه
Predicting Chinese Abbreviations with Minimum Semantic Unit and Global Constraints
We propose a new Chinese abbreviation prediction method which can incorporate rich local information while generating the abbreviation globally. Different to previous character tagging methods, we introduce the minimum semantic unit, which is more fine-grained than character but more coarse-grained than word, to capture word level information in the sequence labeling framework. To solve the “ch...
متن کاملPETROLOGICAL AND GEOCHEMICAL STUDY OF CRUSTAL XENOLITHS FROM 1961 ERUPTION OF CALBUCO VOLCANO, CHILE (LATITUDE 41 20 S )
Twenty four samples of xenoliths and country rocks from the 1961 lava flow of Calbuco volcano have been studied. Fourteen samples have been analyzed for major elements and P, Ni, Ba, Cr, V, Zr, Sc, Y, and Sr. Five of these samples were further analyzed for Sm, Nd, Sr, and Pb isotope ratios. Seventeen samples were analyzed under the microscope and three samples were analyzed by microprobe fo...
متن کاملCannabis_TREATS_cancer: Incorporating Fine-Grained Ontological Relations in Medical Document Ranking
The previous work has justified the assumption that document ranking can be improved by further considering the coarse-grained relations in various linguistic levels (e.g., lexical, syntactical and semantic). To the best of our knowledge, little work is reported to incorporate the fine-grained ontological relations (e.g., ) in document ranking. Two contributions are wo...
متن کاملUltra-Fine Grained Dual-Phase Steels
This paper provides an overview on obtaining low-carbon ultra-fine grained dual-phase steels through rapid intercritical annealing of cold-rolled sheet as improved materials for automotive applications. A laboratory processing route was designed that involves cold-rolling of a tempered martensite structure followed by a second tempering step to produce a fine grained aggregate of ferrite and ca...
متن کاملNUS-PT: Exploiting Parallel Texts for Word Sense Disambiguation in the English All-Words Tasks
We participated in the SemEval-2007 coarse-grained English all-words task and fine-grained English all-words task. We used a supervised learning approach with SVM as the learning algorithm. The knowledge sources used include local collocations, parts-of-speech, and surrounding words. We gathered training examples from English-Chinese parallel corpora, SEMCOR, and DSO corpus. While the fine-grai...
متن کامل