Technical Term Extraction Using Measures of Neology
نویسندگان
چکیده
This study aims to show that frequency of occurrence over time for technical terms and keyphrases differs from general language terms in the sense that technical terms and keyphrases show a strong tendency to be recent coinage, and that this difference can be exploited for the automatic identification and extraction of technical terms and keyphrases. To this end, we propose two features extracted from temporally labelled datasets designed to capture surface level n-gram neology. Our analysis shows that these features, calculated over consecutive bigrams, are highly indicative of technical terms and keyphrases, which suggests that both technical terms and keyphrases are strongly biased to be surface level neologisms. Finally, we evaluate the proposed features on a gold-standard dataset for technical term extraction and show that the proposed features are comparable or superior to a number of features commonly used for technical term extraction.
منابع مشابه
Improvement of the Solvent Extraction of Rhenium from Molybdenite Roasting Dust Leaching Solution using Counter-current Extraction by a Mixer-settler Extractor (TECHNICAL NOTE)
Continuous counter-current extraction of rhenium from roasting dust leach liquor was carried out using a mixer-settler extractor. Tributylphosphate was used as the extractant diluted in kerosene. The effects of the flow rates and extraction stages number were investigated. The extraction efficiency was affected by the flow rates of the aqueous and organic phases, and its mechanism was qualitati...
متن کاملTerm Extraction Using Non-technical Corpora as a Point of Leverage
This paper describes a new hybrid term extraction technique for technical corpora. Our maingoal is to reduce the amount of noise in the list of candidate terms by restricting the lexicalitems that can appear inside candidate terms. In order to do so, we base our term extractionprocess on lexical items selected by a statistical test that targets items that are highly specificto t...
متن کاملWeb-based Extraction of Technical Features of Products
We present a novel symbolic approach to extract domain-specific technical features of products from large German corpora. Our prototypical implementation extracts terms like “Auflösung” (resolution), “Speicherplatz” (storage capacity), etc. The proposed methods depend on manually added lists of technical measures of the target domain (in our case measures like “Megapixel” or “MB”).We applied th...
متن کاملHUMB: Automatic Key Term Extraction from Scientific Articles in GROBID
The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID’s facilities for analyzing the structure of scientific articles, resulting in a first set of structural features. A second set of features captures content pro...
متن کاملEvaluation of Automatic Term Recognition of Nuclear Receptors from MEDLINE
The C/NC value method combines linguistic and statistical information for the automatic extraction of multi-word technical terms from machine readable corpora. C-value enhances the commonly used statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, nested terms. Nested terms are those which also exist as substrings of o...
متن کامل