Developing Open Data Models for Linguistic Field Data
نویسنده
چکیده
The UQ Flint Archive houses the field notes and elicitation recordings made by Elwyn Flint in the 1950's and 1960's during extensive linguistic survey work across Queensland, Australia. The process of digitizing the contents of the UQ Flint Archive provides a number of interesting challenges in the context of EMELD. Firstly, all of the linguistic data is for languages which are either endangered or extinct, and as such forms a valuable ethnographic repository. Secondly, the physical format of the data is itself in danger of decline, and as such digitization is an important preservation task in the short to medium term. Thirdly, the adoption of open standards for the encoding and presentation of text and audio data for linguistic field data, whilst enabling preservation, represents a new field of research in itself where best practice has yet to be formalised. Fourthly, the provision of this linguistic data online as a new data source for future research introduces concerns of data portability and longevity. This paper will outline the origins of the data model, the content creation components, presentation forms based on the data model, data capture tools and media conversion components. It will also address some of the larger questions regarding the digitization and annotation of linguistic field work based on experience gained through work with the Flint Archive contents.
منابع مشابه
Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information
With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...
متن کاملA New Look into the Construct Validity of the IELTS Speaking Module
The aim of this study was to investigate the role of linguistic and intelligence factors in the Iranian IELTS candidates’ speaking performance. Linguistic factors include depth and breadth of vocabulary knowledge as well as grammar knowledge. Narrative and verbal intelligences represent the non-linguistic factors. The participants included 329 learners who took 5 validated tests and also partic...
متن کاملDigital Geolinguistics: On the Use of Linked Open Data for Data-Level Interoperability Between Geolinguistic Resources
The Open Language Archives Community which recently celebrated its first 10 years of activity, is a worldwide network dedicated to collecting information on language resources and developing standard protocols for interoperability. In this context, Linked Open Data paradigm is very promising, because it eases interoperability between different systems by allowing the definition of data-driven m...
متن کاملThe Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Linked Open Data (LLOD) cloud, an LOD (sub-)cloud o...
متن کاملTowards the Representation of Hashtags in Linguistic Linked Open Data Format
A pilot study is reported on developing the basic Linguistic Linked Open Data (LLOD) infrastructure for hashtags from social media posts. Our goal is the encoding of linguistically and semantically enriched hashtags in a formally compact way using the machinereadable OntoLex model. Initial hashtag processing consists of data-driven decomposition of multi-element hashtags, the linking of spellin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره cs.DL/0305053 شماره
صفحات -
تاریخ انتشار 2003