Harvesting Dutch Trees: Syntactic Properties of Spoken Dutch
نویسندگان
چکیده
In this paper, we report on quantitative research into certain word order phenomena in Dutch. In our research, we use the Spoken Dutch Corpus (CGN), a major new resource for research into contemporary spoken Dutch. After briefly introducing the primary data, the annotations added, and some of the tools to explore the primary data and the annotations, we illustrate how the Corpus may be utilized to answer certain linguistic questions concerning the Dutch language.
منابع مشابه
Syntactic Analysis in the Spoken Dutch Corpus (CGN)
The paper describes the syntactic annotation of the Spoken Dutch Corpus (“Corpus Gesproken Nederlands” or CGN), the Dutch-Flemish project (1998-2003) aiming at the collection, description and annotation of ten million words of spoken Dutch. In the first part, the background of the parsing strategy is discussed, as well as some details concerning the actual implementation of the parsing process....
متن کاملStudy on two species of Ophiostoma in relation with Dutch elm disease in Iran
An investigation was carried out in some areas of Golestan Province including: Loveh forest, Soosara, Daland forest park, Tooskestan; Gilan Province including Siahkal and Asalem forests; Arasbaran and landscape of urban trees during 1999–2007. In this investigation, based on some morphological, physiological and molecular characteristics and also comparison with standard isolates two species Op...
متن کاملBelgian Standard Dutch
Dutch is a language spoken by about 20 million people in the Netherlands and Belgium. This region is not only characterised by a complex dialect situation, but also by the use of two institutionalised varieties of the Standard language: Netherlandic Dutch is spoken in the Netherlands and is documented in Collins & Mees (1982), Mees & Collins (1983) and Gussenhoven (1999), while Belgian Dutch is...
متن کاملSpontaneous Speech in the Spoken Dutch Corpus
In this paper the Spoken Dutch Corpus project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a corpus of 1,000 hours of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for research in the fields of (computational) linguistics and language and speech technology. Although the corpus will contain a fair amount of read speech...
متن کاملSyntactic Annotation for the Spoken Dutch Corpus Project (CGN)
Of the ten million words of contemporary standard Dutch in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a selection of one million words of natural spoken language will be annotated syntactically. In the present paper we discuss the tag sets and the annotation procedures that are currently being developed and tested. The annotation tags provide information about syntactic constit...
متن کامل