linguistic corpus

نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027 فیلتر نتایج به سال:

Subjectivity in Japanese: A Corpus-Linguistic Study

Journal: :International Journal of English Linguistics 2019

متن کامل

A Japanese National Project on Spontaneous Speech Corpus and Processing Technology

2003

Sadaoki Furui Kikuo Maekawa Hitoshi Isahara

A new national project for raising the technological level of speech recognition and understanding has recently commenced in Japan. This project aims at a) building a large-scale spontaneous speech corpus consisting of roughly 7M words and 800 hours of speech, b) acoustic and linguistic modeling for spontaneous speech understanding and summarization using linguistic as well as para-linguistic i...

متن کامل

Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb)

2006

Renata Savy Francesco Cutugno Claudia Crocco

In this paper we present an application of AGTK to a corpus of spoken Italian annotated at many different linguistic levels. The work consists of two parts: a) the presentation of AG-SpIt, a toolkit devoted to corpus data management that we developed according to AGTK proposals; b) the presentation of corpus’ structure together with some examples and results of cross-level linguistic analyses o...

متن کامل

An Extended Version of the KoKo German L1 Learner Corpus

2016

Andrea Abel Aivars Glaznieks Lionel Nicolas Egon Stemle

English. This paper describes an extended version of the KoKo corpus (version KoKo4, Dec 2015), a corpus of written German L1 learner texts from three different German-speaking regions in three different countries. The KoKo corpus is richly annotated with learner language features on different linguistic levels such as errors or other linguistic characteristics that are not deficit-oriented, an...

متن کامل

A Linguistic Search Tool for Semitic Languages

2010

Alon Itai

The paper discusses searching a corpus for linguistic patterns. Semitic languages have complex morphology and ambiguous writing systems. We explore the properties of Semitic Languages that challenge linguistic search and describe how we used the Corpus Workbench (CWB) to enable linguistic searches in Hebrew corpora.

متن کامل

Using Chinese Gigaword Corpus and Chinese Word Sketch in linguistic Research

2006

Jia-Fei Hong Chu-Ren Huang

We explore the possibility of deeper linguistic research based on corpus and computational linguistic tools in this paper. In particular, we adopt Chinese Word Sketch, the application of Word Sketch Engine to Chinese GigaWord Corpus, for linguistic research. We apply Chinese Sketch Engine results to deeper linguistic account such as selectional restriction and event type selection. The study is...

متن کامل

What if? Conditionals in educational registers

2008

Max M. Louwerse Scott A. Crossley Patrick Jeuniaux

Many corpus linguistic studies have investigated classification of texts into genres and registers, but relatively few of these studies have looked at linguistic features in educational registers. From a pedagogical perspective it is important to determine whether certain linguistic features behave differently across registers within particular disciplines. The current study investigates condit...

متن کامل

Automatic Acquisition of Linguistic Knowledge: From Sinica Corpus to Gigaword Corpus

2006

Chu-Ren Huang

The raison d’etre for a corpus, as it was first conceived by Francis and Kucera in 1963, was to provide a body of linguistic facts from which linguistic knowledge could be generalized, [1]. The methods of acquisition have evolved as corpus size and technology have advanced in the past 40 years. Originally corpus-based concordances assisted linguists to form generalizations. This was what Fillmo...

متن کامل

Multi-language Speech Collection for NIST LRE

2016

Karen Jones Stephanie Strassel Kevin Walker David Graff Jonathan Wright

The Multi-language Speech (MLS) Corpus supports NIST’s Language Recognition Evaluation series by providing new conversational telephone speech and broadcast narrowband data in 20 languages/dialects. The corpus was built with the intention of testing system performance in the matter of distinguishing closely related or confusable linguistic varieties, and careful manual auditing of collected dat...

متن کامل

An Efficient Approach to Gold-Standard Annotation: Decision Points for Complex Tasks

2006

Julie Medero Kazuaki Maeda Stephanie Strassel Christopher Walker

Inter-annotator consistency is a concern for any corpus building effort relying on human annotation. Adjudication is as effective way to locate and correct discrepancies of various kinds. It can also be both difficult and time-consuming. This paper introduces Linguistic Data Consortium (LDC)’s model for decision point-based annotation and adjudication, and describes the annotation tools develop...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید