Language Independent Statistical Software for Corpus Exploration
نویسندگان
چکیده
In this report two programs for statistical analysis of concordance lines are described. The programs have been developed for analysing the lexical context of a given word. It is shown how different parameter settings influence the outcome of collocational analysis, and how the concept of collocation can be extended to allow the extraction of lines typical for a word from a set of concordance lines. Even though all the examples are for English, the software is completely language independent and only requires minimal linguistic resources.
منابع مشابه
Trameur: A Framework for Annotated Text Corpora Exploration
Corpus resources with complex linguistic annotations are becoming increasingly important in the work of language specialists. They often need to perform extensive corpus research, including Natural Language Processing (NLP), statistical modelling and data visualisation. Our software system, called Trameur, aims at making these analyses possible within a single graphical user interface. It relie...
متن کاملProof Mining with Dependent Types
Several approaches exist to data-mining big corpora of formal proofs. Some of these approaches are based on statistical machine learning, and some – on theory exploration. However, most are developed for either untyped or simply-typed theorem provers. In this paper, we present a method that combines statistical data mining and theory exploration in order to analyse and automate proofs in depend...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملInteractive Part-of-Speech Exploration
We discuss the design of a tool for the interactive exploration of part-of-speech classes using structural features. At the heart of the tool are incremental hierarchical clustering algorithms. The algorithms are used to detect classes using morphological and syntactical features. The algorithms have been modified or designed to allow interactive exploration and constrained clustering. We prese...
متن کاملLanguage-independent exploration of repetition and variation in longitudinal child-directed speech: a tool and resources
We present a language-independent tool, called Varseta, for extracting variation sets in child-directed speech. This tool is evaluated against a gold standard corpus annotated with variation sets, MINGLE-3-VS, and used to explore variation sets in 26 languages1 in CHILDES-26-VS, a comparable corpus derived from the CHILDES database. The tool and the resources are freely available for research.2
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computers and the Humanities
دوره 31 شماره
صفحات -
تاریخ انتشار 1997