نتایج جستجو برای: text linguistic

تعداد نتایج: 208900  

2000
Marcello Federico Dimitri Giordani Paolo Coletti

This paper reports on the development and evaluation of an Italian broadcast news corpus at ITC-irst, under a contract with the European Language resources Distribution Agency (ELDA). The corpus consists of 30 hours of recordings transcribed and annotated with conventions similar to those adopted by the Linguistic Data Consortium for the DARPA HUB-4 corpora. The corpus will be completed and rel...

Gayane Y. Maltseva Irina I. Chumak-Zhun Larisa I. Plotnikova Sofiya M. Boldyreva Svetlana A. Kosharnaya

Modern text linguistics pays serious attention to the significant structural elements of the text, which carry special knowledge. Such structural elements include the title. In this article, the title is considered as a linguistic and cognitive characteristic and a spatially fixed structural element of the text – «frame», which is located around/before/behind the text, focusing on the importanc...

2006
Andrew W. Cole

This paper will discuss issues relevant to corpus development and publication at the LDC and will illustrate those issues by examining the history of three LDC corpora. This paper will also briefly examine alternative corpus creation and distribution methods and their challenges. The intent of this paper is to increase the available linguistic resources by describing the regulatory and technica...

2008
Péter Halácsy András Kornai Péter Németh Dániel Varga

For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN* toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing ...

2007
Hwee Tou Ng Yee Seng Chan

We made use of parallel texts to gather training and test examples for the English lexical sample task. Two tracks were organized for our task. The first track used examples gathered from an LDC corpus, while the second track used examples gathered from a Web corpus. In this paper, we describe the process of gathering examples from the parallel corpora, the differences with similar tasks in pre...

2004
Stephanie Strassel

This paper describes ongoing efforts at Linguistic Data Consortium to create shared evaluation resources for improved speech-to-text technology. The DARPA EARS Program (Effective, Affordable, Reusable Speech-to-Text) is focused on enabling core STT technology to produce rich, highly accurate output in a range of languages and speaking styles. The aggressive EARS program goals motivate new appro...

Journal: :Ena da Kultura 2023

          Advertising will use a rich range of expression techniques at all levels language. Not so rarely, tropes are used in advertising. Types the most common types tropes: allegory, hyperbole, irony, metaphor, metonymy, comparison, epithet. The phraseological expressions no less expressive. Journalists often phraseology as not only language but a...

Journal: :Academic Journal of Interdisciplinary Studies 2020

2009
Éric Villemonte de la Clergerie Benoît Sagot Rosa Stern Pascal Denis Gaëlle Recourcé Victor Mignot

We introduce SAPIENS, a platform for extracting quotations from news wires, associated with their author and context. The originality of SAPIENS is that it relies on a deep linguistic processing chain, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. We describe the archit...

2004
Kazuaki Maeda Stephanie Strassel

Large-scale corpus development demands substantial infrastructure. As part of this infrastructure, the Linguistic Data Consortium (LDC) has adopted the Annotation Graph Toolkit (AGTK) as a primary resource for annotation tool development. This paper reports on LDC’s experiences using AGTK to develop and implement highly customized annotation tools for a variety of large-scale corpus creation ef...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید