Selecting level-specific specialized vocabulary using statistical measures
ثبت نشده
چکیده
To find an easy-to-use, automated tool to identify technical vocabulary applicable to learners at various levels, nine statistical measures were applied to the 7.3-million-word ‘commerce and finance’ component of the British National Corpus. The resulting word lists showed that each statistical measure extracted a different level of specialized vocabulary as measured by word length, vocabulary level, U.S. native speaker grade level, and Japanese school textbook vocabulary coverage, and that these measures produced level-specific words; i.e., beginning-level basic business words were identified using Cosine and the complimentary similarity measure; intermediate-level business words were extracted using log-likelihood, the chi-square test, and the chi-square test with Yates’s correction; and advanced-level business word lists were created using mutual information and McNemar’s test. We conclude that these statistical measures are effective tools for identifying multi-level specialized vocabulary for pedagogical purposes.
منابع مشابه
Extracting Level-Specific Science and Technology Vocabulary
With rapid advances in technology come rapid advances in the language of technology, or English for Science and Technology (EST). We have had success in our earlier research in devising a systematic means of extracting leveland domain-specific words from the British National Corpus. In this study, we apply a similar methodology to the Corpus of Professional English (CPE), a 20-million-word comp...
متن کاملSpeech Recognition Methods and their Potential for Dialogue Systems in Mobile Environments
The DaimlerChrysler speech recognizer is specialized for robust speech recognition in noisy environments, in particular for command and control applications. The recognizer that is used in cars has fixed grammars, which restrict the speaker to using short commands. This paper presents methods that allow the user to speak more freely and add spontaneous words to the commands: language modelling,...
متن کاملA Suite to Compile and Analyze an LSP Corpus
This paper presents a series of tools for the extraction of specialized corpora from the web and its subsequent analysis mainly with statistical techniques. It is an integrated system of original as well as standard tools and has a modular conception that facilitates its re-integration on different systems. The first part of the paper describes the original techniques, which are devoted to the ...
متن کاملThe Specialized Vocabulary of Modern Patent Language: Semantic Associations in Patent Lexis
This paper presents an analysis of the language of patents, as a contribution to the field of English for Specific Purposes (ESP). While there work appears to fill a niche in the ESP field (and particularly in the English for Occupational Legal Purposes), the present study insists that statistical approach is necessary for compiling patent technical word list for ESP. Since research studies on ...
متن کاملMeasuring Similarity between Flamenco Rhythmic Patterns
Music similarity underlies a large part of a listener’s experience, as it relates to familiarity and associations between different pieces or parts. Rhythmic similarity has received scant research attention in comparison with other aspects of music similarity such as melody or harmony. Mathematical measures of rhythmic similarity have been proposed, but none of them has been compared to human j...
متن کامل