linguistic

The NIST 2010 speaker recognition evaluation

2010

Alvin F. Martin Craig S. Greenberg

The 2010 NIST Speaker Recognition Evaluation continues a series of evaluations of text independent speaker detection begun in 1996. It utilizes the newly collected Mixer-6 and Greybeard Corpora from the Linguistic Data Consortium. Major test conditions to be examined include variations in channel, speech style, vocal effort, and the effect of speaker aging over a multi-year period. A new primar...

متن کامل

Transliteration Alignment

2009

Vladimir Pervouchine Haizhou Li Bo Lin

This paper studies transliteration alignment, its evaluation metrics and applications. We propose a new evaluation metric, alignment entropy, grounded on the information theory, to evaluate the alignment quality without the need for the gold standard reference and compare the metric with F -score. We study the use of phonological features and affinity statistics for transliteration alignment at...

متن کامل

A Semantic Approach To Textual Entailment: System Evaluation and Task Analysis

2007

Aljoscha Burchardt Nils Reiter Stefan Thater Anette Frank

This paper discusses our contribution to the third RTE Challenge – the SALSA RTE system. It builds on an earlier system based on a relatively deep linguistic analysis, which we complement with a shallow component based on word overlap. We evaluate their (combined) performance on various data sets. However, earlier observations that the combination of features improves the overall accuracy could...

متن کامل

The LogAnswer Project at ResPubliQA 2010

2010

Ingo Glöckner Björn Pelzer

The LogAnswer project investigates the potential of deep linguistic processing and logical reasoning for question answering. The paragraph selection task of ResPubliQA 2010 offered the opportunity to validate improvements of the LogAnswer QA system that reflect our experience from ResPubliQA 2009. Another objective was to demonstrate the benefit of QA technologies over a pure IR approach. Two r...

متن کامل

Linguistic Resources for Handwriting Recognition and Translation Evaluation

2012

Zhiyi Song Safa Ismael Stephen Grimes David S. Doermann Stephanie Strassel

We describe efforts to create corpora to support development and evaluation of handwriting recognition and translation technology. LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT. We collect handwritten samples of pre-processed Arabic and Chinese data that has been already tra...

متن کامل

Question Answering with QED and Wee at TREC 2004

2004

Kisuh Ahn Johan Bos Stephen Clark Tiphaine Dalmas Jochen L. Leidner Matthew Smillie Bonnie L. Webber James R. Curran

This report describes the experiments of the University of Edinburgh and the University of Sydney at the TREC-2004 question answering evaluation exercise. Our system combines two approaches: one with deep linguistic analysis using IR on the AQUAINT corpus applied to answer extraction from text passages, and one with a shallow linguistic analysis and shallow inference applied to a large set of s...

متن کامل

ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions

2016

Olga Uryupina Ron Artstein Antonella Bristot Federica Cavicchio Kepa Joseba Rodríguez Massimo Poesio

This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phen...

متن کامل

Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

Journal: :CoRR 2008

Ibrahim Abu El-Khair

The effectiveness of three stop words lists for Arabic Information Retrieval---General Stoplist, CorpusBased Stoplist, Combined Stoplist ---were investigated in this study. Three popular weighting schemes were examined: the inverse document frequency weight, probabilistic weighting, and statistical language modelling. The Idea is to combine the statistical approaches with linguistic approaches ...

متن کامل

Digital Museum of Greek Oral History: How Dialectal Speech Corpora Remain Vivid in Class

2014

A. Sfakianaki

Dialectal variants are complete linguistic systems just like standard languages (cf. Kontosopoulos 1997, Ntinas & Zarkogianni 2009). The teaching of different linguistic varieties of a standard language gives pupils the possibility a) to be acquainted with the treasures of the expressive means of their mother language, b) to embody the mother language in a broader cultural and historical contex...

متن کامل

Adapting an Example-Based Translation System to Chinese

2001

Ying Zhang Ralf D. Brown Robert E. Frederking

We describe an Example-Based Machine Translation (EBMT) system and the adaptations and enhancementsmade to create a ChineseEnglish translation system from the Hong Kong legal code and various other bilingual resources available from the Linguistic Data Consortium (LDC).

متن کامل