A Tool for Multi-Word CoUocation Extraction and Visualization in MultUingual Corpora

نویسندگان

  • Violeta Seretan
  • Luka Nerima
  • Eric Wehrli
چکیده

This document describes an implemented system of collocation extraction which is designed as aid to translation and which will be used in a real translation environment. Its main functionalities are: retrieving multi-word collocations from an existing corpus of documents in a given language (only French and English are supported for the time being); visualizing the list of extracted terms and their contexts by using a concordance tool; retrieving the translation equivalent of the sentences containing the collocations in the existing parallel corpora; and enabling the user to create a sublist ofvalidated collocations to be further used as reference in translation. The approach underlying this system is hybrid, as the extraction method combines the syntactic analysis of texts (for selecting the collocation candidates) with a statistical-based measure for the relevance test (i.e., for candidates ranking according to the collocational strength). We present the underlying approach and methodology, the architecture of the systems, we describe the main system components and provide several experimental results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer

It is well known that word aligned parallel corpora are valuable linguistic resources. Since many factors affect automatic alignment quality, manual post-editing may be required in some applications. While there are several state-of-the-art word-aligners, such as GIZA++ and Berkeley, there is no simple visual tool that would enable correcting and editing aligned corpora of different formats. We...

متن کامل

A Tool for Multi-Word Expression Extraction in Modern Greek Using Syntactic Parsing

This paper presents a tool for extracting multi-word expressions from corpora in Modern Greek, which is used together with a parallel concordancer to augment the lexicon of a rule-based machinetranslation system. The tool is part of a larger extraction system that relies, in turn, on a multilingual parser developed over the past decade in our laboratory. The paper reviews the various NLP module...

متن کامل

Vocabulary Lists for EAP and Conversation Students

Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...

متن کامل

Term Extraction + Term Clustering: An Integrated Platform for Computer-Aided Terminology

A novel technique for automatic thesaurus construction is proposed. It is based on the complementary use of two tools: (1) a Term Extraction tool that acquires term candidates from tagged corpora through a shallow grammar of noun phrases, and (2) a Term Clustering tool that groups syntactic variants (insertions). Experiments performed on corpora in three technical domains yield clusters of term...

متن کامل

Creating a Multilingual Collocation Dictionary from Large Text Corpora

This paper describes a system of terminological extraction capable of handling multi-word expressions, using a powerful syntactic parser. The system includes a concordancing tool enabling the user to display the context of the collocation, i.e. the sentence or the whole document where the collocation occurs. Since the corpora are multilingual, the system also offers an alignment mechanism for t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016