Significance testing of word frequencies in corpora

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Significance testing of word frequencies in corpora

Finding out whether a word occurs significantly more often in one text or corpus than in another is an important question in analysing corpora. As noted by Kilgarriff (2005), the use of the χ and log-likelihood ratio tests is problematic in this context, as they are based on the assumption that all samples are statistically independent of each other. However, words within a text are not indepen...

متن کامل

Reconsidering the significance of genomic word frequencies.

By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which encapsulates lognormal and power-law ...

متن کامل

Null-hypothesis significance testing of word frequencies: a follow-up on Kilgarriff*

In this issue of Corpus Linguistics and Linguistic Theory, Adam Kilgarriff discusses several issues concerned with the role of probabilistic modelling and statistical hypothesis testing in the domain of corpus linguistics and computational linguistics. Given the overall importance of these issues to the above-mentioned fields, I felt that the topic merits even more discussion and decided to add...

متن کامل

The Significance of Education and Gender in Persian Word-selection

This study strives to investigate the importance of ‘education’ and ‘gender’, as two major sociolinguistic variables, in accepting or rejecting the words coined by the Iranian Academy of Persian Language and Literature (APLL). A total of 500 students from state universities in Tehran were chosen as subjects and provided with a questionnaire consisting of 50 APLL equivalents. The respondents’ ac...

متن کامل

Title: Reconsidering the Significance of Genomic Word Frequencies 1 2 Short Title: Genomic Word Frequencies 3 4 Introduction

NOTICE: this is the authors' version of a work that was accepted for publication in Trends in Genetics. Changes resulting from the publishing process such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Abstract 1 By conventiona...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Digital Scholarship in the Humanities

سال: 2014

ISSN: 2055-7671,2055-768X

DOI: 10.1093/llc/fqu064