From Language Documentation Data to LLOD: A Case Study in Turkic Lemon Dictionaries

نویسندگان

  • Christian Chiarcos
  • Désirée Walther
  • Maxim Ionov
چکیده

In this paper, we describe the Lemon-OntoLex modeling of dictionaries created within language documentation efforts. We focus on exemplary resources for two less-resourced languages from the Turkic language family, Chalkan and Tuvan. Both datasets have been conveted into a Linked Data representation using the Lemon-OntoLex data model, with an extensible converter written in Python. We compare the conversion process for two both lexical resources, we analyze the difficulties we encountered during the conversion process and discuss the cases which caused the most common problems during the conversion. Furthermore, we evaluate the quality of converted dictionaries using specially designed SPARQL queries, and by manually checking random samples of the data. Finally, we describe the future application of this data within a lexicographic-comparative workbench, designed to facilitate language contact studies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

"LVF-lemon ― Towards a Linked Data Representation of ""Les Verbes français"""

In this study we elaborate a road map for the conversion of a traditional lexical syntactico-semantic resource for French into a linguistic linked open data (LLOD) model. Our approach uses current best-practices and the analyses of earlier similar undertakings (lemonUBY and PDEV-lemon) to tease out the most appropriate representation for our resource.

متن کامل

Equivalence in Technical Texts: The Case of Accounting Terms in English-Persian Dictionaries

Translating accounting documents, in general, and accounting terminology, in particular, is not a simple task, especially when the new terms keep created in pace with accounting developments. This study was carried out to find the most common and preferable ways to translate accounting terms from English into Persian. Also, an attempt was made to identify the frequently used patterns of word-fo...

متن کامل

Data and language documentation

The topic of this chapter is the relationship between data and language documentation. Unlike many fields of study, concerns regarding data collection and manipulation play a central role in our understanding of, and theorizing about, language documentation. The field to a large extent, in fact, owes its existence to a shift in focus in the goals of linguistic field work from concerns regarding...

متن کامل

Multilingual linked data

The interaction of natural language processing and the Semantic Web have lead to the creation of a new paradigm known as Linguistic Linked Open Data (LLOD), whereby traditional language resources are made available as linked data. Conversely, the publication of corpora, machine-readable dictionaries as linked data has opened new resources to Semantic Web researchers and allowed new tools to be ...

متن کامل

Impact of Controlled and Free Language Use in Retrieving Articles from the ProQuest and Science Direct Databases

Abstract Introduction: The growth and expansion of the Internet has changed the way information is accessed and many facilities have been created on the Web to facilitate and expedite information locating. Objective: To identify the impact of keyword documentation using the medical thesaurus on the retrieval of articles from Proquest and Science Direct databases. Materials and Methods:The pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017