A pilot study for a Corpus of Dutch Aphasic Speech (CoDAS)

نویسندگان

  • Eline Westerhout
  • Paola Monachesi
چکیده

In this paper, a pilot study for the development of a corpus of Dutch Aphasic Speech (CoDAS) is presented. Given the lack of resources of this kind not only for Dutch but also for other languages, CoDAS will be able to set standards and will contribute to the future research in this area. We have established the basic requirements with respect to text types, metadata, and annotation levels that CoDAS should fulfill. Given the special character of the speech contained in CoDAS, we cannot simply carry over the design and annotation protocols of existing corpora, such as the Spoken Dutch Corpus (CGN) or CHILDES. However, they have been taken as starting point. We have investigated whether and how the procedures and protocols for the orthographic transcription and the part-of-speech tagging used for the CGN should be adapted in order to annotate and transcribe aphasic speech properly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evidence from /t/-lenition in Dutch

In everyday speech, words may be reduced. Little is known about the consequences of such reductions for spoken word comprehension. This study investigated /t/-lenition in Dutch in two corpus studies and three perceptual experiments. The production studies revealed that /t/-lenition is most likely to occur after [s] and before bilabial consonants. The perception experiments showed that listeners...

متن کامل

Pattern of Language Impairment in Aphasic Patients Applying the P-DAB-1 Test

Background and purpose: The Persian Diagnostic Aphasia Battery (P-DAB-1) is one of the tests available for screening and determining the severity of aphasia. The test classifies the patients in seven major diagnostic classes based on the extent of the impairment in different linguistic modalities. The present study aimed to describe the pattern of linguistic impairment in four aphasic patients ...

متن کامل

Improving Automatic Recognition of Aphasic Speech with AphasiaBank

Automatic recognition of aphasic speech is challenging due to various speech-language impairments associated with aphasia as well as a scarcity of training data appropriate for this speaker population. AphasiaBank, a shared database of multimedia interactions primarily used by clinicians to study aphasia, offers a promising source of data for Deep Neural Network acoustic modeling. In this paper...

متن کامل

From D-Coi to SoNaR: a reference corpus for Dutch

The computational linguistics community in The Netherlands and Belgium has long recognized the dire need for a major reference corpus of written Dutch. In part to answer this need, the STEVIN programme was established. To pave the way for the effective building of a 500-million-word reference corpus of written Dutch, a pilot project was established. The Dutch Corpus Initiative project or D-Coi ...

متن کامل

Large Scale Syntactic Annotation of Written Dutch: Lassy

The construction of a 500-million-word reference corpus of written Dutch has been identified as one of the priorities in the STEVIN programme. The focus is on written language in order to complement the Spoken Dutch Corpus (CGN) [13], completed in 2003. In D-COI (a pilot project funded by STEVIN), a 50-million-word pilot corpus has been compiled, parts of which were enriched with verified synta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006