Coarse-grained Candidate Generation and Fine-grained Re-ranking for Chinese Abbreviation Prediction

نویسندگان

  • Longkai Zhang
  • Houfeng Wang
  • Xu Sun
چکیده

Correctly predicting abbreviations given the full forms is important in many natural language processing systems. In this paper we propose a two-stage method to find the corresponding abbreviation given its full form. We first use the contextual information given a large corpus to get abbreviation candidates for each full form and get a coarse-grained ranking through graph random walk. This coarse-grained rank list fixes the search space inside the top-ranked candidates. Then we use a similarity sensitive re-ranking strategy which can utilize the features of the candidates to give a fine-grained re-ranking and select the final result. Our method achieves good results and outperforms the state-ofthe-art systems. One advantage of our method is that it only needs weak supervision and can get competitive results with fewer training data. The candidate generation and coarse-grained ranking is totally unsupervised. The re-ranking phase can use a very small amount of training data to get a reasonably good result.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Chinese Abbreviations with Minimum Semantic Unit and Global Constraints

We propose a new Chinese abbreviation prediction method which can incorporate rich local information while generating the abbreviation globally. Different to previous character tagging methods, we introduce the minimum semantic unit, which is more fine-grained than character but more coarse-grained than word, to capture word level information in the sequence labeling framework. To solve the “ch...

متن کامل

PETROLOGICAL AND GEOCHEMICAL STUDY OF CRUSTAL XENOLITHS FROM 1961 ERUPTION OF CALBUCO VOLCANO, CHILE (LATITUDE 41 20 S )

Twenty four samples of xenoliths and country rocks from the 1961 lava flow of Calbuco volcano have been studied. Fourteen samples have been analyzed for major elements and P, Ni, Ba, Cr, V, Zr, Sc, Y, and Sr. Five of these samples were further analyzed for Sm, Nd, Sr, and Pb isotope ratios. Seventeen samples were analyzed under the microscope and three samples were analyzed by microprobe fo...

متن کامل

Cannabis_TREATS_cancer: Incorporating Fine-Grained Ontological Relations in Medical Document Ranking

The previous work has justified the assumption that document ranking can be improved by further considering the coarse-grained relations in various linguistic levels (e.g., lexical, syntactical and semantic). To the best of our knowledge, little work is reported to incorporate the fine-grained ontological relations (e.g., ) in document ranking. Two contributions are wo...

متن کامل

Ultra-Fine Grained Dual-Phase Steels

This paper provides an overview on obtaining low-carbon ultra-fine grained dual-phase steels through rapid intercritical annealing of cold-rolled sheet as improved materials for automotive applications. A laboratory processing route was designed that involves cold-rolling of a tempered martensite structure followed by a second tempering step to produce a fine grained aggregate of ferrite and ca...

متن کامل

NUS-PT: Exploiting Parallel Texts for Word Sense Disambiguation in the English All-Words Tasks

We participated in the SemEval-2007 coarse-grained English all-words task and fine-grained English all-words task. We used a supervised learning approach with SVM as the learning algorithm. The knowledge sources used include local collocations, parts-of-speech, and surrounding words. We gathered training examples from English-Chinese parallel corpora, SEMCOR, and DSO corpus. While the fine-grai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014