نتایج جستجو برای: source text

تعداد نتایج: 572362  

2001
Esther Klabbers Karlheinz Stöber

In this paper we present the procedure for creating a new speech corpus for the Bonn Open Synthesis System (BOSS). BOSS has several advantages which make this procedure particularly straightforward and fast. BOSS is open source, allowing flexible use of components and corpora. It shows a clear separation between data and architecture, which means that a change in corpus does not require a chang...

2009
Mark Kane Julie Mauclair Julie Carson-Berndsen

This paper presents a novel approach to the identification of phonetic similarity using properties observed during the speech recognition process. An experiment is presented whereby specific phones are removed during the training phase of a statistical speech recognition system so that the behaviour of the system can be analysed to see which alternative phone is selected. The domain of the anal...

2016
Francis M. Tyers Aziyana Bayyr-ool Aelita Salchak Jonathan Washington

This paper describes the development of free/open-source finite-state morphological transducers for Tuvan, a Turkic language spoken in and around the Tuvan Republic in Russia. The finite-state toolkit used for the work is the Helsinki Finite-State Toolkit (HFST), we use the lexc formalism for modelling the morphotactics and twol formalism for modelling morphophonological alternations. We presen...

2009
Barbara McGillivray Marco Passarotti

We present a valency lexicon for Latin verbs extracted from the Index Thomisticus Treebank, a syntactically annotated corpus of Medieval Latin texts by Thomas Aquinas. In our corpus-based approach, the lexicon reflects the empirical evidence of the source data. Verbal arguments are induced directly from annotated data. The lexicon contains 432 Latin verbs with 270 valency frames. The lexicon is...

2010
Tobias Marschall Sven Rahmann

The overlapping structure of complex patterns, such as IUPAC motifs, significantly affects their statistical properties and should be taken into account in motif discovery algorithms. The contribution of this paper is twofold. On the one hand, we give surprisingly simple formulas for the expected size and weight of motif clumps (maximal overlapping sets of motif matches in a text). In contrast ...

2009
Olli Sjöblom

Successful data mining is an iterative process during which data will be refined and adjusted to achieve more accurate mining results. Most important tools in the text mining context are list of stop words and list of synonyms. The size and richness of the lists mentioned depend on the structure of the language used in the text to be mined. English, for example, is an “easy” language for search...

2016
Amal Htait Sébastien Fournier Patrice Bellot

In this paper, we present the automatic annotation of bibliographical references’ zone in papers and articles of XML/TEI format. Our work is applied through two phases: first, we use machine learning technology to classify bibliographical and non-bibliographical paragraphs in papers, by means of a model that was initially created to differentiate between the footnotes containing or not containi...

2015
Pablo Ruiz Thierry Poibeau Frédérique Mélanie

Entity Linking (EL) systems’ performance is uneven across corpora or depending on entity types. To help overcome this issue, we propose an EL workflow that combines the outputs of several open source EL systems, and selects annotations via weighted voting. The results are displayed on a UI that allows the users to navigate the corpus and to evaluate annotation quality based on several metrics.

2000
Laila Dybkjær Niels Ole Bernsen

Since early 1998, the European Telematics project MATE has worked towards facilitating re-use of annotated spoken language data, addressing theoretical issues and implementing practical solutions which could serve as standards in the field. The resulting MATE Workbench for corpus annotation is now available as licensed open source software. This paper describes the MATE markup framework which b...

2012
Maha Shaikh

Juxtaposing two local council cases of open source software adoption in the UK we highlight their differences and similarities in open source adoption and implementation. Our narratives indicate that for both cases there was strong goodwill towards open source yet the trajectories of implementation differed widely. We draw on Deleuze and Guattari’s ideas of becoming, tracing versus mapping and ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید