نتایج جستجو برای: corpus linguistics

تعداد نتایج: 98006  

2001
Chu-Ren Huang

Adopting corpus-based empirical approaches to linguistics, this paper has two main goals: the first is to propose formal methodology to extract meaningful quantitative characterizations from Chinese corpora, the second is to achieve generalizations about theoretically significant linguistic qualities based on these quantitative data. The quantitative scales discussed include mutual information,...

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

Journal: :Linguistics 2021

Abstract Linguistics, English linguistics in particular, has witnessed a remarkable quantitative turn since the 1990s and early 2000s. It was both scale quality, concerning degree (including of sophistication) to which empirical studies, statistical techniques, modelling have come be used determine linguistic research. Which role corpus probabilistic linguistics, including usage-based approache...

1999
Maria Lapata

The acquisition of linguistic knowledge, i.e., the identication, extraction, and encoding of linguistic information in a corpus, has been one of the main motivations for data-driven approaches to natural language. Methods have been developed for the acquisition of, for instance, parts of speech, noun compounds, collocations, support verbs, subcategorization frames, phrase structure rules, selec...

Journal: :Synthesis Lectures on Human Language Technologies 2022

Labelling data is one of the most fundamental activities in science, and has underpinned practice, particularly medicine, for decades, as well research corpus linguistics since at least d

2006
Yousif Almas Khurshid Ahmad

An unsupervised learning method, based on corpus linguistics and special language terminology, is described that can extract time-varying information from text streams. The method is shown to be ‘language-independent’ in that its use leads to sets of regular-expressions that can be used to extract the information in typologically distinct languages like English and Arabic. The method uses the i...

2016
Sarah Ita Levitan Guozhen An Min Ma Rivka Levitan Andrew Rosenberg Julia Hirschberg

Improving methods of automatic deception detection is an important goal of many researchers from a variety of disciplines, including psychology, computational linguistics, and criminology. We present a system to automatically identify deceptive utterances using acoustic-prosodic, lexical, syntactic, and phonotactic features. We train and test our system on the Interspeech 2016 ComParE challenge...

1999
Nelleke Oostdijk

In this paper the Spoken Dutch Corpus Project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a 10million-word corpus of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for research in the fields of computational linguistics and language and speech technology. The paper first gives an overview of the project. It then goes ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید