A Refinement Framework for Cross Language Text Categorization

نویسندگان

  • Ke Wu
  • Bao-Liang Lu
چکیده

Cross language text categorization is the task of exploiting labelled documents in a source language (e.g. English) to classify documents in a target language (e.g. Chinese). In this paper, we focus on investigating the use of a bilingual lexicon for cross language text categorization. To this end, we propose a novel refinement framework for cross language text categorization. The framework consists of two stages. In the first stage, a cross language model transfer is proposed to generate initial labels of documents in target language. In the second stage, expectation maximization algorithm based on naive Bayes model is introduced to yield resulting labels of documents. Preliminary experimental results on collected corpora show that the proposed framework is effective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization

Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled documents in a source language (e.g. Italian). In this work we present many solutions according to the availability of bilingual resources, and we show that it is possible to deal with the problem even when no such resour...

متن کامل

Cross Language Text Categorization by Acquiring Multilingual Domain Models from Comparable Corpora

In a multilingual scenario, the classical monolingual text categorization problem can be reformulated as a cross language TC task, in which we have to cope with two or more languages (e.g. English and Italian). In this setting, the system is trained using labeled examples in a source language (e.g. English), and it classifies documents in a different target language (e.g. Italian). In this pape...

متن کامل

Cross Language Text Categorization Using a Bilingual Lexicon

With the popularity of the Internet at a phenomenal rate, an ever-increasing number of documents in languages other than English are available in the Internet. Cross language text categorization has attracted more and more attention for the organization of these heterogeneous document collections. In this paper, we focus on how to conduct effective cross language text categorization. To this en...

متن کامل

The Quality Assessment of the Persian Translation of “The Graveyard” Based on House's Translation Quality Assessment Model

The notion of Translation Quality Assessment has been the object of many studies. TQA as the compre- hensive treatment of translation evaluation, focuses on the relationship between the source text and target text and also emphasizes that translation is a linguistic operation. To accomplish this research, House‟s model of TQA was chosen as the framework for the investigation ...

متن کامل

Automatic Generation of Language-Independent Features for Cross-Lingual Classification

Many applications require categorization of text documents using predefined categories. The main approach to performing text categorization is learning from labeled examples. For many tasks, it may be difficult to find examples in one language but easy in others. The problem of learning from examples in one or more languages and classifying (categorizing) in another is called cross-lingual lear...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008