Toward the Harmonization of Metadata Practice for Spoken Languages Resources
نویسندگان
چکیده
This paper addresses issues related to the elicitation and encoding of demographic, situational and attitudinal metadata for sociolinguistic research with an eye toward standardization to facilitate data sharing. The discussion results from a series of workshops that have recently taken place at the NWAV and LSA conferences. These discussions have focused principally on the granularity of the metadata and the subset of categories that could be considered required for sociolinguistic fieldwork generally. Although a great deal of research on quantitative sociolinguists has taken place in the Unites Stated, the workshops participants actually represent research conducted in North and South America, Europe, Asian, the Middle East, Africa and Oceania. Although the paper does not attempt to consider the metadata necessary to characterize every possible speaker population, we present evidence that the methodological issues and findings apply generally to speech collections concerned with the demographics and attitudes or the speaker pools and the situations under which speech is elicited.
منابع مشابه
Principles of the ‘Lingua Franca Approach’ and their implications for pedagogical practice in the Iranian context
AbstractThe last thirty five years have created a challenging situation for Iran and its people: on the one hand, the discriminatory British and American policies towards the country have given rise to considerable bitterness; on the other, we continue to teach both British and American English. If Iranian people wish to play a more active role internationally, it is time to review our English ...
متن کاملThe Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks
Motivated by the concept of Communicative Language Ability and the eminence of the IELTS exam, this study intended to scrutinize the representation of functional knowledge (FK) and socio-linguistic knowledge (SK) as sub-components of pragmatic knowledge in the writing performances of both tasks of the online General IELTS-practice resources across three band scores. This quantitative inter-scor...
متن کاملSemantic Interoperability, Communities of Practice and the CanCore Learning Object Metadata Profile
The vision of reusable digital learning resources or objects, made accessible through coordinated repository architectures and metadata technologies, has gained considerable attention within education and training communities. However, the pivotal role of metadata in this vision --and in more general conceptions of the semantic Web-raises important and longstanding issues about classification, ...
متن کاملTagging spoken corpus
Spoken languages are more flexible in usage than written languages. Thus, tagging spoken corpus differs from tagging traditional written corpus. This paper proposes a new framework for tagging spoken corpus. The framework adopts the written tagger to process spoken data with the special consideration of the characteristics of spoken language. Besides, the problems of different tagging sets betw...
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کامل