Multilingual Image Corpus
نویسندگان
چکیده
Abstract The ELG pilot project Multilingual Image Corpus (MIC 21) provides a large image dataset with annotated objects and multilingual descriptions in 25 languages. Our main contributions are: the provision of collection highquality, copyright-free images; formulation an ontology visual based on WordNet noun hierarchies; precise manual correction automatic segmentation annotation object classes; association images extended descriptions. is designed for classification, detection semantic segmentation. It can be also used caption generation, image-to-text alignment question answering videos.
منابع مشابه
The Phrase Detective Multilingual Corpus, Release 0.1
The Phrase Detectives Game-With-A-Purpose for anaphoric annotation has been live since December 2008, collecting over 2.5 million judgments on the anaphoric expressions in texts in two languages (English and Italian) from around 9,000 players. In this paper we summarize our recent work on creating a corpus using these annotations.
متن کاملMultilingual Corpus Development for Opinion Mining
Opinion Mining is a discipline that has attracted some attention lately. Most of the research in this field has been done for English or Asian languages, due to the lack of resources in other languages. In this paper we describe our methodology for developing a manually annotated multilingual corpus with fine-grained opinion and target annotations. The languages represented in the corpus are En...
متن کاملTLAXCALA: a multilingual corpus of independent news
We acquire corpora from the domain of independent news from the Tlaxcala website. We build monolingual corpora for 15 languages and parallel corpora for all the combinations of those 15 languages. These corpora include languages for which only very limited such resources exist (e.g. Tamazight). We present the acquisition process in detail and we also present detailed statistics of the produced ...
متن کاملBuilding a Multilingual Parallel Subtitle Corpus
In this paper on-going work of creating an extensive multilingual parallel corpus of movie subtitles is presented. The corpus currently contains roughly 23,000 pairs of aligned subtitles covering about 2,700 movies in 29 languages. Subtitles mainly consist of transcribed speech, sometimes in a very condensed way. Insertions, deletions and paraphrases are very frequent which makes them a challen...
متن کاملMultilingual Topic Detection Using a Parallel Corpus
We have developed an approach for topic detection from multilingual news, in particular Chinese and English. We extract named entities such as people names, geographical location names, and organization names automatically from the news content by transformation-based linguistic taggers. These sets of named entities together with the remaining content terms form the basis of news representation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Cognitive technologies
سال: 2022
ISSN: ['2197-6635', '1611-2482']
DOI: https://doi.org/10.1007/978-3-031-17258-8_22