Lexicalized Phonotactic Word Segmentation
نویسنده
چکیده
This paper presents a new unsupervised algorithm (WordEnds) for inferring word boundaries from transcribed adult conversations. Phone ngrams before and after observed pauses are used to bootstrap a simple discriminative model of boundary marking. This fast algorithm delivers high performance even on morphologically complex words in English and Arabic, and promising results on accurate phonetic transcriptions with extensive pronunciation variation. Expanding training data beyond the traditional miniature datasets pushes performance numbers well above those previously reported. This suggests that WordEnds is a viable model of child language acquisition and might be useful in speech understanding.
منابع مشابه
Phonotactic and acoustic cues for word segmentation in English
This study investigates the influence of both phonotactic and acoustic cues on the segmentation of spoken English. Listeners detected embedded English words in nonsense sequences (word spotting). Words aligned with phonotactic boundaries were easier to detect than words without such alignment. Acoustic cues to boundaries could also have signaled word boundaries, especially when word onsets lack...
متن کاملPhonotactic and prosodic effects on word segmentation in infants.
This research examines the issue of speech segmentation in 9-month-old infants. Two cues known to carry probabilistic information about word boundaries were investigated: Phonotactic regularity and prosodic pattern. The stimuli used in four head turn preference experiments were bisyllabic CVC.CVC nonwords bearing primary stress in either the first or the second syllable (strong/weak vs. weak/st...
متن کاملModeling the contribution of phonotactic cues to the problem of word segmentation.
How do infants find the words in the speech stream? Computational models help us understand this feat by revealing the advantages and disadvantages of different strategies that infants might use. Here, we outline a computational model of word segmentation that aims both to incorporate cues proposed by language acquisition researchers and to establish the contributions different cues can make to...
متن کاملEffects of prior phonotactic knowledge on infant word segmentation: the case of nonadjacent dependencies.
PURPOSE In this study, the authors explored whether French-learning infants use nonadjacent phonotactic regularities in their native language, which they learn between the ages of 7 and 10 months, to segment words from fluent speech. METHOD Two groups of 20 French-learning infants were tested using the head-turn preference procedure at 10 and 13 months of age. In Experiment 1, infants were fa...
متن کاملDoes Korean defeat phonotactic word segmentation?
Computational models of infant word segmentation have not been tested on a wide range of languages. This paper applies a phonotactic segmentation model to Korean. In contrast to the undersegmentation pattern previously found in English and Russian, the model exhibited more oversegmentation errors and more errors overall. Despite the high error rate, analysis suggested that lexical acquisition m...
متن کامل