نتایج جستجو برای: corpora creation

تعداد نتایج: 147847  

2014
Adam Kilgarriff Pavel Rychlý Milos Jakubícek Vojtech Kovár Vít Baisa Lucia Kocincová

The NLP researcher or application-builder often wonders “what corpus should I use, or should I build one of my own? If I build one of my own, how will I know if I have done a good job?” Currently there is very little help available for them. They are in need of a framework for evaluating corpora. We develop such a framework, in relation to corpora which aim for good coverage of ‘general languag...

2003
Elviira Hartikainen Giulio Maltese Asunción Moreno Shaunie Shammass Ute Ziegenhain

This paper presents the corpora collection and lexica creation for the purposes of Automatic Speech Recognition (ASR) and Text-to-speech (TTS) that are needed in speech-to-speech translation (SST). These lexica will be specified, built and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during the years 2002-2005. Large lexic...

2010
Kalina Bontcheva Hamish Cunningham Ian Roberts Valentin Tablan

In this paper we present Teamware, a novel web-based collaborative annotation environment which enables users to carry out complex corpus annotation projects, involving less skilled, cheaper annotators working remotely. It has been evaluated by us through the creation of several gold standard corpora, as well as through external evaluation in commercial annotation projects.

2010
Jirka Hana Anna Feldman

We describe the challenges of resource creation for a resource-light system for morphological tagging of fusional languages (Feldman and Hana, 2010). The constraints on resources (time, expertise, and money) introduce challenges that are not present in development of morphological tools and corpora in the usual, resource intensive way.

2006
Anna Pappa

This paper presents a robust rule-based system of shallow parsing for part-of-speech (PoS) recognition and tagging. Unlike previous work the system uses parsing to tagging based on unsupervised learning methods with no prior knowledge, nor training or pre-tagged corpora. START (System of Textual Analysis Recognition and Tagging) has been evaluated on both French and Greek non-annotated corpora,...

2003
Georgios Sigletos Dimitra Farmakiotou Konstantinos Stamatakis Georgios Paliouras Vangelis Karkaletsis

This paper outlines our approach to the creation of annotated corpora for the purposes of Web Information Extraction, and presents the Web Annotation tool. This tool enables the annotation of Web pages from different domains and for different information extraction tasks providing a user-friendly interface to human annotators. Annotated information is stored in a representation format that can ...

2013
Anne-Kathrin Schumann

This paper describes the collection, annotation and linguistic analysis of a gold standard for knowledge-rich context extraction on the basis of Russian and German web corpora as part of ongoing PhD thesis work. In the following sections, the concept of knowledge-rich contexts is refined and gold standard creation is described. Linguistic analyses of the gold standard data and their results are...

2000
Christopher Cieri David Graff Mark Liberman Nii Martey Stephanie Strassel

This paper describes the creation and content two corpora, TDT-2 and TDT-3, created for the DARPA sponsored Topic Detection and Tracking project. The research goal in the TDT program is to create the core technology of a news understanding system that can process multilingual news content categorizing individual stories according to the topic(s) they describe. The research tasks include segment...

Journal: :Language Resources and Evaluation 2008
Birte Lönneker-Rodman

This paper concerns metaphor resource creation. It provides an account of methods used, problems discovered, and insights gained at the Hamburg Metaphor Database project, intended to inform similar resource creation initiatives, as well as future metaphor processing algorithms. After introducing the project, the theoretical underpinnings that motivate the subdivision of represented information ...

2012
Elias Iosif Alexandros Potamianos

We investigate the creation of corpora from web-harvested data following a scalable approach that has linear query complexity. Individual web queries are posed for a lexicon that includes thousands of nouns and the retrieved data are aggregated. A lexical network is constructed, in which the lexicon nouns are linked according to their context-based similarity. We introduce the notion of semanti...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید