LIDIOMS: A Multilingual Linked Idioms Data Set
نویسندگان
چکیده
In this paper, we describe the LIDIOMS data set, a multilingual RDF representation of idioms currently containing five languages: English, German, Italian, Portuguese, and Russian. The data set is intended to support natural language processing applications by providing links between idioms across languages. The underlying data was crawled and integrated from various sources. To ensure the quality of the crawled data, all idioms were evaluated by at least two native speakers. Herein, we present the model devised for structuring the data. We also provide the details of linking LIDIOMS to well-known multilingual data sets such as BabelNet. The resulting data set complies with best practices according to Linguistic Linked Open Data Community.
منابع مشابه
A Multilingual Database of Idioms
This paper presents a possible architecture for a multilingual database of idioms. We discuss the challenges that idioms present to the creation of such a database and propose a possible encoding that maximises the amount of information that can be stored for different languages. Such a resource provides important information for linguistic, computational linguistic and psycholinguistic use, an...
متن کاملAn investigation of acoustic models for multilingual code-switching
Multilingual speech processing continues to develop as speech technology spreads to heterogeneous clients and applications. We address a distinct problem of code-switching — the spontaneous but occasional use, within speech in one language (referred to as L1), of words, phrases, expressions or idioms from a second language (L2). We examine two alternatives for modeling the acoustics of such wor...
متن کاملThe Comparative Effect of Using Idioms in Conversation and Paragraph Writing on EFL Learners’ Idiom Learning
This study investigated the comparative effect of teaching idiomatic expressions through practicing them in conversation and paragraph writing on intermediate EFL learners’ idiom learning. The participants were sorted out of a population of 134 intermediate students in Zabansara Language School in Khorramabad based on their scores on a Preliminary English Test (PET) and an idiom test piloted in...
متن کاملImproving Machine Translation through Linked Data
With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translat...
متن کاملMultiword Verbs in WordNets
In this paper, we describe how wordnets treat multiword verbs. We pay special attention to the English and Hungarian wordnets and we argue that from a multilingual perspective it is recommended to store idioms and light verb constructions as a whole rather than listing their parts separately. In order to enhance their applicability in multilingual applications, a unified treatment should be app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.08148 شماره
صفحات -
تاریخ انتشار 2018