corpora creation

نتایج جستجو برای: corpora creation

تعداد نتایج: 147847 فیلتر نتایج به سال:

leiomyoma 0f corpora cavernosa

Journal: :acta medica iranica 0

darab mehraban ali shahrazad dr.0sk0yi nasser kamalian

a rare case of leiomyoma of penis is reported and a review of literature coducted. this is the first such case yet repoted in copera cavernosa. it also was bigger than • 1 em. and had a nodular surface .

متن کامل

Using Wikipedia to Collect a Corpus for Automatic Definition Extraction: Comparing English and Portuguese Languages

2012

Systems for the detection and extraction of definitions are being developed for different purposes, such as glossaries creation [5, 3], lexical databases [6], ontologies [2], question answering [1], etc. All these systems use annotated corpora to build a set of rules or patterns capable to identify a definition in a different text. The basic structure of a definition should resemble an equation...

متن کامل

Discourse Annotation Working Group Report

2007

Manfred Stede Janyce Wiebe Eva Hajicová Brian Reese Simone Teufel Bonnie Webber Theresa Wilson

The classical “success story” of corpus annotation are the various syntax treebanks that provide structural analyses of sentences and have enabled researchers to develop a range of new and highly successful data-oriented approaches to sentence parsing. In recent years, however, a number of corpora have been constructed that provide annotations on the discourse level, i.e. information that reach...

متن کامل

Creating a Corpus of Auslan within an Australian National Corpus

2009

Trevor Johnston

The creation of signed language (SL) corpora presents special challenges to linguists. They are face-to-face visual-gestural languages that have no widely accepted written forms or standard specialist notation system, making even superficial transcription problematic. SL corpora need to be created taking these facts into account. Using the example of Auslan (Australian Sign Language) this paper...

متن کامل

Cross-Domain and Cross-Language Porting of Shallow Parsing

2014

Evgeny A. Stepanov Giuseppe Riccardi

English was the main focus of attention of the Natural Language Processing (NLP) community for years. As a result, there are significantly more annotated linguistic resources in English than in any other language. Consequently, data-driven tools for automatic text or speech processing are developed mainly for English. Developing similar corpora and tools for other languages is an important issu...

متن کامل

Automatic Thesaurus Generation from Raw Text using Knowledge-Poor Techniques

1993

Gregory Grefenstette

In addition to showing how lexical units are related within a eld, domain-speciic thesauri give an idea of what subjects are important to that eld and are thus useful at many points in an information system. The major impediment to creation of thesauri has been the cost of their manual creation. We present here a number of automatic techniques that jointly produce a rst draft of a thesaurus fro...

متن کامل

Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora

2011

Matteo Negri Luisa Bentivogli Yashar Mehdad Danilo Giampiccolo Alessandro Marchetti

We address the creation of cross-lingual textual entailment corpora by means of crowdsourcing. Our goal is to define a cheap and replicable data collection methodology that minimizes the manual work done by expert annotators, without resorting to preprocessing tools or already annotated monolingual datasets. In line with recent works emphasizing the need of large-scale annotation efforts for te...

متن کامل

Stimmen: A citizen science approach to minority language sociolinguistics

Journal: :Linguistics vanguard 2021

Abstract This paper presents the project Stimmen fan Fryslân ‘Voices of Fryslân’. The relies on a smartphone application developed to involve local communities in creation speech corpora, particularly lesser used languages. lays out scientific and societal context project, showcases gives an overview results from that attracted more than 15,000 users. Some key methodological issues are consider...

متن کامل

Corpora and lexis

Journal: :ICAME Journal 2019

متن کامل

A Bilingual Corpus of Inter-linked Events

2008

Tommaso Caselli Nancy Ide Roberto Bartolini

This paper describes the creation of a bilingual corpus of inter-linked events for Italian and English. Linkage is accomplished through the Inter-Lingual Index (ILI) that links ItalWordNet with WordNet. The availability of this resource, on the one hand, enables contrastive analysis of the linguistic phenomena surrounding events in both languages, and on the other hand, can be used to perform m...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید