نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027  

1993
Masaki Kiyono Jun'ichi Tsujii

A semi-automatic procedure of linguistic knowledge acquisition is proposed, which combines corpus-based techniques with the conventional rule-based approach. The rule-based component generates all the possible hypotheses of defects which the existing linguistic knowledge might contain, when it fails to parse a sentence. The rule-based component does not try to identify the defects, but generate...

Journal: :Procesamiento del Lenguaje Natural 2008
Arantza Díaz de Ilarraza Enrique Fernández-Terrones Izaskun Aldezabal María Jesús Aranzabe

In this paper the process for turning a dependency-based corpus to a constituentbased one is explained. For this purpose, first both the Dependency and the Constituent formalism are analized and then the corresponding equivalences of linguistic phenomena are treated. This process has had different phases in which the linguistic equivalences have been improved. Finally, the evaluation process is...

2009
Stefan Th. Gries

• frequencies of occurrence of linguistic elements, which can be studied from two different perspectives: o how frequent are morphemes or words or patterns/constructions in (parts of) a corpus? This information can be provided in various different forms of frequency lists; o how evenly are morphemes or words or patterns/constructions distributed across (parts of) a corpus? This information can ...

2016
Miki Nishioka Shiro Akasegawa

In this paper, we discuss our creation of a web corpus of spoken Hindi (COSH), one of the Indo-Aryan languages spoken mainly in the Indian subcontinent. We also point out notable problems we’ve encountered in the web corpus and the special concordancer. After observing the kind of technical problems we encountered, especially regarding annotation tagged by Shiva Reddy’s tagger, we argue how the...

Journal: :IJCLCLP 2004
Jia-Yan Jian Yu-Chia Chang Jason S. Chang

In this paper, we propose a new method for extracting bilingual collocations from a parallel corpus to provide phrasal translation memories. The method integrates statistical and linguistic information to achieve effective extraction of bilingual collocations. The linguistic information includes parts of speech, chunks, and clauses. The method involves first obtaining an extended list of Englis...

2003
Véronique Aubergé Nicolas Audibert Albert Rilliard

The affects are expressed in different levels of speech: metalinguistic (expressiveness), linguistic (attitudes), both anchored in the “linguistic time”, and para-linguistic (emotions expressions) that is anchored in the emotional causes timing. In an experimental approach, the corpus are the base of analysis. Main of emotional corpus have been produced by acting/elicitating speakers on one sid...

2015
Wenda Chen Nancy F. Chen Boon Pang Lim Bin Ma

In this paper, we evaluate a set of linguistic rules for pronunciation variations in Singapore English. We collect and annotate a speech corpus for Singapore English and label it with IPA narrow transcriptions. Data driven pronunciation rules are derived using American English (Buckeye corpus) as a reference. We compare the data driven rules with linguistic rules proposed by phoneticians, and f...

1995
I Lewin S G Pulman

We discuss the treatment of ellipsis in a spoken language route planning enquiry service which uses the Core Language Engine (CLE) as its linguistic processor. We show how use of the CLE allows us to separate the interpretation of ellipsis in a dialogue context from the more general issue of dialogue management in a dialogue context and, especially, to factor out the linguistic innuences on suc...

2002
Mathias Géry Dominique Vaufreydaz

Web is a rich and diversified source of information. In this article, we propose to benefit from this richness to collect and analyze documents, with the aim of a relational indexation based on noun phrases. Proposed data processing chain includes a spider collecting data to build textual corpora, and a linguistic module analyzing text to extract information. Comparison of obtained corpus with ...

2016
Rita de Carvalho Andreia Querido Marisa Campos Rita Valadas Pereira João Ricardo Silva António Branco

This paper presents a new linguistic resource for the study and computational processing of Portuguese. CINTIL DependencyBank PREMIUM is a corpus of Portuguese news text, accurately manually annotated with a wide range of linguistic information (morpho-syntax, named-entities, syntactic function and semantic roles), making it an invaluable resource specially for the development and evaluation of...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید