text coverage

Optimal data selection for unit selection synthesis

2001

Alan W. Black Kevin A. Lenzo

In this work, we address the issue of creating a set of utterances with optimal coverage for reliable, high quality concatenative synthesis, whether for general synthesis or domain synthesis. We present an automatic method that takes into account the acoustic distinctions made by a particular speaker and selects prompts from large databases of typical utterances. A general unit selection text-t...

متن کامل

Large-Coverage Root Lexicon Extraction for Hindi

2009

Cohan Sujay Carlos Monojit Choudhury Sandipan Dandapat

This paper describes a method using morphological rules and heuristics, for the automatic extraction of large-coverage lexicons of stems and root word-forms from a raw text corpus. We cast the problem of high-coverage lexicon extraction as one of stemming followed by root word-form selection. We examine the use of POS tagging to improve precision and recall of stemming and thereby the coverage ...

متن کامل

Text Summarization Model based on Redundancy-Constrained Knapsack Problem

2012

Hitoshi Nishikawa Tsutomu Hirao Toshiro Makino Yoshihiro Matsuo

In this paper we propose a novel text summarization model, the redundancy-constrained knapsack model. We add to the Knapsack problem a constraint to curb redundancy in the summary. We also propose a fast decoding method based on the Lagrange heuristic. Experiments based on ROUGE evaluations show that our proposals outperform a state-of-the-art text summarization model, the maximum coverage mode...

متن کامل

Automatic Text Summarization by Providing Coverage, Non-Redundancy, and Novelty Using Sentence Graph

Journal: :Journal of Information Technology Research 2022

The day-to-day growth of online information necessitates intensive research in automatic text summarization (ATS). ATS software produces summary by extracting important from the original text. With help summaries, users can easily read and understand documents interest. Most approaches for used only local properties Moreover, numerous make sentence selection difficult complicated. So this artic...

متن کامل

The Mobile Solutions for Immunization (M-SIMU) Trial: A Protocol for a Cluster Randomized Controlled Trial That Assesses the Impact of Mobile Phone Delivered Reminders and Travel Subsidies to Improve Childhood Immunization Coverage Rates and Timeliness in Western Kenya

2016

Dustin G Gibson E Wangeci Kagucia Benard Ochieng Nisha Hariharan David Obor Lawrence H Moulton Peter J Winch Orin S Levine Frank Odhiambo Katherine L O'Brien Daniel R Feikin

BACKGROUND Text message (short message service, SMS) reminders and incentives are two demand-side interventions that have been shown to improve health care-seeking behaviors by targeting participant characteristics such as forgetfulness, lack of knowledge, and transport costs. Applying these interventions to routine pediatric immunizations may improve vaccination coverage and timeliness. OBJE...

متن کامل

Road-testing the English Resource Grammar Over the British National Corpus

2004

Timothy Baldwin Emily M. Bender Dan Flickinger Ara Kim Stephan Oepen

This paper addresses two questions: (1) when a large deep processing resource developed for relatively closed domains is run over open text, what coverage does it have, and (2) what are the most effective and time-efficient ways of consolidating gaps in the coverage of

متن کامل

Analysing Finnish with word lists: the DDI approach to morphology revisited

2018

Atro Voutilainen

Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitiv...

متن کامل

Corpus Design for Malay Corpus-based Speech Synthesis System

2009

Tian-Swee Tan

Problem statement: Speech corpus is one of the major components in corpus-based synthesis. The quality and coverage in speech corpus will affect the quality of synthesis speech sound. Approach: This study proposes a corpus design for Malay corpus-based speech synthesis system. This includes the study of design criteria in corpus-based speech synthesis, Malay corpus based database design and the...

متن کامل

A Survey on magiran.com: A Database for the Magazines of Iran

2007

Mortaza Kokabi

This paper present the design and function of magiran.com, a databse of periodicals published in Iran. It also attempts to answer the following questions: How many of the total periodicals published in Iran are covered by magiran? What is the subject coverage of the periodicals covered? Which subjects seem to have been given importance among the periodicals covered? How many of the periodicals ...

متن کامل

Word unit based multilingual comparative analysis of text corpora

2001

Géza Németh Csaba Zainkó

Parallel study of three very different languages Hungarian. German and English using text corpora of a similar size gives a possibility for the exploration of both similarities and differences. Corpora of publicly available Internet sources was used. The corpus size was the same (app. 20Mbytes, 2.5-3.5 million word forms) for all languages. Besides traditional corpus coverage, word length and o...

متن کامل