Significance testing of word frequencies in corpora
نویسندگان
چکیده
منابع مشابه
Significance testing of word frequencies in corpora
Finding out whether a word occurs significantly more often in one text or corpus than in another is an important question in analysing corpora. As noted by Kilgarriff (2005), the use of the χ and log-likelihood ratio tests is problematic in this context, as they are based on the assumption that all samples are statistically independent of each other. However, words within a text are not indepen...
متن کاملReconsidering the significance of genomic word frequencies.
By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which encapsulates lognormal and power-law ...
متن کاملNull-hypothesis significance testing of word frequencies: a follow-up on Kilgarriff*
In this issue of Corpus Linguistics and Linguistic Theory, Adam Kilgarriff discusses several issues concerned with the role of probabilistic modelling and statistical hypothesis testing in the domain of corpus linguistics and computational linguistics. Given the overall importance of these issues to the above-mentioned fields, I felt that the topic merits even more discussion and decided to add...
متن کاملThe Significance of Education and Gender in Persian Word-selection
This study strives to investigate the importance of ‘education’ and ‘gender’, as two major sociolinguistic variables, in accepting or rejecting the words coined by the Iranian Academy of Persian Language and Literature (APLL). A total of 500 students from state universities in Tehran were chosen as subjects and provided with a questionnaire consisting of 50 APLL equivalents. The respondents’ ac...
متن کاملTitle: Reconsidering the Significance of Genomic Word Frequencies 1 2 Short Title: Genomic Word Frequencies 3 4 Introduction
NOTICE: this is the authors' version of a work that was accepted for publication in Trends in Genetics. Changes resulting from the publishing process such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Abstract 1 By conventiona...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Digital Scholarship in the Humanities
سال: 2014
ISSN: 2055-7671,2055-768X
DOI: 10.1093/llc/fqu064