A pilot study for a Corpus of Dutch Aphasic Speech (CoDAS)
نویسندگان
چکیده
In this paper, a pilot study for the development of a corpus of Dutch Aphasic Speech (CoDAS) is presented. Given the lack of resources of this kind not only for Dutch but also for other languages, CoDAS will be able to set standards and will contribute to the future research in this area. We have established the basic requirements with respect to text types, metadata, and annotation levels that CoDAS should fulfill. Given the special character of the speech contained in CoDAS, we cannot simply carry over the design and annotation protocols of existing corpora, such as the Spoken Dutch Corpus (CGN) or CHILDES. However, they have been taken as starting point. We have investigated whether and how the procedures and protocols for the orthographic transcription and the part-of-speech tagging used for the CGN should be adapted in order to annotate and transcribe aphasic speech properly.
منابع مشابه
Evidence from /t/-lenition in Dutch
In everyday speech, words may be reduced. Little is known about the consequences of such reductions for spoken word comprehension. This study investigated /t/-lenition in Dutch in two corpus studies and three perceptual experiments. The production studies revealed that /t/-lenition is most likely to occur after [s] and before bilabial consonants. The perception experiments showed that listeners...
متن کاملPattern of Language Impairment in Aphasic Patients Applying the P-DAB-1 Test
Background and purpose: The Persian Diagnostic Aphasia Battery (P-DAB-1) is one of the tests available for screening and determining the severity of aphasia. The test classifies the patients in seven major diagnostic classes based on the extent of the impairment in different linguistic modalities. The present study aimed to describe the pattern of linguistic impairment in four aphasic patients ...
متن کاملImproving Automatic Recognition of Aphasic Speech with AphasiaBank
Automatic recognition of aphasic speech is challenging due to various speech-language impairments associated with aphasia as well as a scarcity of training data appropriate for this speaker population. AphasiaBank, a shared database of multimedia interactions primarily used by clinicians to study aphasia, offers a promising source of data for Deep Neural Network acoustic modeling. In this paper...
متن کاملFrom D-Coi to SoNaR: a reference corpus for Dutch
The computational linguistics community in The Netherlands and Belgium has long recognized the dire need for a major reference corpus of written Dutch. In part to answer this need, the STEVIN programme was established. To pave the way for the effective building of a 500-million-word reference corpus of written Dutch, a pilot project was established. The Dutch Corpus Initiative project or D-Coi ...
متن کاملLarge Scale Syntactic Annotation of Written Dutch: Lassy
The construction of a 500-million-word reference corpus of written Dutch has been identified as one of the priorities in the STEVIN programme. The focus is on written language in order to complement the Spoken Dutch Corpus (CGN) [13], completed in 2003. In D-COI (a pilot project funded by STEVIN), a 50-million-word pilot corpus has been compiled, parts of which were enriched with verified synta...
متن کامل