Multilingual Image Corpus

نویسندگان

چکیده

Abstract The ELG pilot project Multilingual Image Corpus (MIC 21) provides a large image dataset with annotated objects and multilingual descriptions in 25 languages. Our main contributions are: the provision of collection highquality, copyright-free images; formulation an ontology visual based on WordNet noun hierarchies; precise manual correction automatic segmentation annotation object classes; association images extended descriptions. is designed for classification, detection semantic segmentation. It can be also used caption generation, image-to-text alignment question answering videos.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Phrase Detective Multilingual Corpus, Release 0.1

The Phrase Detectives Game-With-A-Purpose for anaphoric annotation has been live since December 2008, collecting over 2.5 million judgments on the anaphoric expressions in texts in two languages (English and Italian) from around 9,000 players. In this paper we summarize our recent work on creating a corpus using these annotations.

متن کامل

Multilingual Corpus Development for Opinion Mining

Opinion Mining is a discipline that has attracted some attention lately. Most of the research in this field has been done for English or Asian languages, due to the lack of resources in other languages. In this paper we describe our methodology for developing a manually annotated multilingual corpus with fine-grained opinion and target annotations. The languages represented in the corpus are En...

متن کامل

TLAXCALA: a multilingual corpus of independent news

We acquire corpora from the domain of independent news from the Tlaxcala website. We build monolingual corpora for 15 languages and parallel corpora for all the combinations of those 15 languages. These corpora include languages for which only very limited such resources exist (e.g. Tamazight). We present the acquisition process in detail and we also present detailed statistics of the produced ...

متن کامل

Building a Multilingual Parallel Subtitle Corpus

In this paper on-going work of creating an extensive multilingual parallel corpus of movie subtitles is presented. The corpus currently contains roughly 23,000 pairs of aligned subtitles covering about 2,700 movies in 29 languages. Subtitles mainly consist of transcribed speech, sometimes in a very condensed way. Insertions, deletions and paraphrases are very frequent which makes them a challen...

متن کامل

Multilingual Topic Detection Using a Parallel Corpus

We have developed an approach for topic detection from multilingual news, in particular Chinese and English. We extract named entities such as people names, geographical location names, and organization names automatically from the news content by transformation-based linguistic taggers. These sets of named entities together with the remaining content terms form the basis of news representation...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Cognitive technologies

سال: 2022

ISSN: ['2197-6635', '1611-2482']

DOI: https://doi.org/10.1007/978-3-031-17258-8_22