Beyond Terms: Multi-Word Units in MultiTerm Extract
نویسندگان
چکیده
Multi-word units are lexical units that are written as more than one word. They constitute a rather heterogeneous class, whose only unifying feature is that they represent a mismatch between orthographic representation and lexical units. Included in this class are syntactically governed combinations (e.g. correspond with), complex prepositions (e.g. in spite of), collocations (e.g. put into practice), idioms (e.g. have a bee in one's bonnet), etc.
منابع مشابه
On multiword lexical units and their role in maritime dictionaries
Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...
متن کاملPreparatory Work on Automatic Extraction of Bilingual Multi-Word Units from Parallel Corpora
Automatic extraction of bilingual Multi-Word Units is an important subject of research in the automatic bilingual corpus alignment field. There are many cases of single source words corresponding to target multi-word units. This paper presents an algorithm for the automatic alignment of single source words and target multi-word units from a sentence-aligned parallel spoken language corpus. On t...
متن کاملExtracting Chinese Multi-Word Units from Large-Scale Balanced Corpus
Automatic Multi-word Units Extraction is an important issue in Natural Language Processing. This paper has proposed a new statistical method based on a large-scale balanced corpus to extract multi-word units. We have used two improved traditional parameters: mutual information and log-likelihood ratio, and have increased the precision for the top 10,000 words extracted through the method to 80....
متن کاملIdentifying Fixed Expressions: A Comparison of SDL MultiTerm Extract and Déjà Vu’s Lexicon
The term fixed expression refers to a formally quite heterogeneous group of expressions, such as adjective-noun collocations (e.g. heavy smoker), prepositional expressions (e.g. in spite of), verbal expressions (e.g. break the ice), dual expressions (e.g. black and white), foreign phrases (e.g. per capita), etc. The properties that unite them are that they consist of more than one word and are ...
متن کاملTowards Bilingual Term Extraction in Comparable Patents
In order to extract bilingual terms in a corpus of comparable patents, we present a novel framework in this paper. The framework includes the following major steps: 1) extract monolingual single-word and multi-word term candidates in monolingual patents; 2) Find parallel sentences in comparable patents; 3) extract bilingual single-word and multi-word term candidates; 4) identify correct bilingu...
متن کامل