Technical Term Extraction Using Measures of Neology

نویسندگان

  • Christopher Norman
  • Akiko Aizawa
چکیده

This study aims to show that frequency of occurrence over time for technical terms and keyphrases differs from general language terms in the sense that technical terms and keyphrases show a strong tendency to be recent coinage, and that this difference can be exploited for the automatic identification and extraction of technical terms and keyphrases. To this end, we propose two features extracted from temporally labelled datasets designed to capture surface level n-gram neology. Our analysis shows that these features, calculated over consecutive bigrams, are highly indicative of technical terms and keyphrases, which suggests that both technical terms and keyphrases are strongly biased to be surface level neologisms. Finally, we evaluate the proposed features on a gold-standard dataset for technical term extraction and show that the proposed features are comparable or superior to a number of features commonly used for technical term extraction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of the Solvent Extraction of Rhenium from Molybdenite Roasting Dust Leaching Solution using Counter-current Extraction by a Mixer-settler Extractor (TECHNICAL NOTE)

Continuous counter-current extraction of rhenium from roasting dust leach liquor was carried out using a mixer-settler extractor. Tributylphosphate was used as the extractant diluted in kerosene. The effects of the flow rates and extraction stages number were investigated. The extraction efficiency was affected by the flow rates of the aqueous and organic phases, and its mechanism was qualitati...

متن کامل

Term Extraction Using Non-technical Corpora as a Point of Leverage

This paper describes a new hybrid term extraction technique for technical corpora. Our maingoal is to reduce the amount of noise in the list of candidate terms by restricting the lexicalitems that can appear inside candidate terms. In order to do so, we base our term extractionprocess on lexical items selected by a statistical test that targets items that are highly specificto t...

متن کامل

Web-based Extraction of Technical Features of Products

We present a novel symbolic approach to extract domain-specific technical features of products from large German corpora. Our prototypical implementation extracts terms like “Auflösung” (resolution), “Speicherplatz” (storage capacity), etc. The proposed methods depend on manually added lists of technical measures of the target domain (in our case measures like “Megapixel” or “MB”).We applied th...

متن کامل

HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID

The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID’s facilities for analyzing the structure of scientific articles, resulting in a first set of structural features. A second set of features captures content pro...

متن کامل

Evaluation of Automatic Term Recognition of Nuclear Receptors from MEDLINE

The C/NC value method combines linguistic and statistical information for the automatic extraction of multi-word technical terms from machine readable corpora. C-value enhances the commonly used statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, nested terms. Nested terms are those which also exist as substrings of o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015