نتایج جستجو برای: linguistic
تعداد نتایج: 52525 فیلتر نتایج به سال:
The 2010 NIST Speaker Recognition Evaluation continues a series of evaluations of text independent speaker detection begun in 1996. It utilizes the newly collected Mixer-6 and Greybeard Corpora from the Linguistic Data Consortium. Major test conditions to be examined include variations in channel, speech style, vocal effort, and the effect of speaker aging over a multi-year period. A new primar...
This paper studies transliteration alignment, its evaluation metrics and applications. We propose a new evaluation metric, alignment entropy, grounded on the information theory, to evaluate the alignment quality without the need for the gold standard reference and compare the metric with F -score. We study the use of phonological features and affinity statistics for transliteration alignment at...
This paper discusses our contribution to the third RTE Challenge – the SALSA RTE system. It builds on an earlier system based on a relatively deep linguistic analysis, which we complement with a shallow component based on word overlap. We evaluate their (combined) performance on various data sets. However, earlier observations that the combination of features improves the overall accuracy could...
The LogAnswer project investigates the potential of deep linguistic processing and logical reasoning for question answering. The paragraph selection task of ResPubliQA 2010 offered the opportunity to validate improvements of the LogAnswer QA system that reflect our experience from ResPubliQA 2009. Another objective was to demonstrate the benefit of QA technologies over a pure IR approach. Two r...
We describe efforts to create corpora to support development and evaluation of handwriting recognition and translation technology. LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT. We collect handwritten samples of pre-processed Arabic and Chinese data that has been already tra...
This report describes the experiments of the University of Edinburgh and the University of Sydney at the TREC-2004 question answering evaluation exercise. Our system combines two approaches: one with deep linguistic analysis using IR on the AQUAINT corpus applied to answer extraction from text passages, and one with a shallow linguistic analysis and shallow inference applied to a large set of s...
This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phen...
The effectiveness of three stop words lists for Arabic Information Retrieval---General Stoplist, CorpusBased Stoplist, Combined Stoplist ---were investigated in this study. Three popular weighting schemes were examined: the inverse document frequency weight, probabilistic weighting, and statistical language modelling. The Idea is to combine the statistical approaches with linguistic approaches ...
Dialectal variants are complete linguistic systems just like standard languages (cf. Kontosopoulos 1997, Ntinas & Zarkogianni 2009). The teaching of different linguistic varieties of a standard language gives pupils the possibility a) to be acquainted with the treasures of the expressive means of their mother language, b) to embody the mother language in a broader cultural and historical contex...
We describe an Example-Based Machine Translation (EBMT) system and the adaptations and enhancementsmade to create a ChineseEnglish translation system from the Hong Kong legal code and various other bilingual resources available from the Linguistic Data Consortium (LDC).
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید