Cross-Language Pseudo-Relevance Feedback Techniques for Informal Text
نویسندگان
چکیده
Previous work has shown that pseudo relevance feedback (PRF) can be effective for cross-lingual information retrieval (CLIR). This research was primarily based on corpora such as news articles that are written using relatively formal language. In this paper, we revisit the problem of CLIR with a focus on the problems that arise with informal text, such as blogs and forums. To address the problem of the two major sources of “noisy” text, namely translation and the informal nature of the documents, we propose to select between interand intra-language PRF, based on the properties of the language of the query and corpora being searched. Experimental results show that this approach can significantly outperform state-of-the-art results reported for monolingual and cross-lingual environments. Further analysis indicates that interlanguage PRF is particularly helpful for queries with poor translation quality. Intra-language PRF is more useful for high-quality translated queries as it reduces the impact of any potential translation errors in
منابع مشابه
University of Chicago at CLEF2004: Cross-language Text and Spoken Document Retrieval
The University of Chicago participated in the Cross-Language Evaluation Forum 2004 (CLEF2004) cross-language multilingual, bilingual, and spoken language tracks. Cross-language experiments focused on meeting the challenges of new languages with freely available resources. We found that modest e ectiveness could be achieved with the additional application of pseudo-relevance feedback to overcome...
متن کاملStructured queries, language modeling, and relevance modeling in cross-language information retrieval
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in a...
متن کاملHighly Relevant Documents Lost in CLIR: Experiments with Dictionary Translation and Pseudo-Relevance Feedback
Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. First, mon...
متن کاملThe Effect of Pseudo Relevance Feedback on MT-Based CLIR
In this paper, we identify factors that affect machine translation (MT) of a source query for cross-language information retrieval (CLIR) and empirically evaluate the effect of pseudo relevance feedback on crosslanguage retrieval performance. Our experiments demonstrate that, by using pseudo relevance feedback, we can significantly improve cross-language retrieval performance and achieve the le...
متن کاملNotes on Experiments with Pseudo Relevance Feedback and Data Merging In Cross-Language Retrieval
In the TREC-8 cross-language information retrieval (CLIR) track, we adopted the approach of using machine translation to prepare a source-language query for use in a target-language retrieval task. We empirically evaluated (1) the effect of pseudo relevance feedback on retrieval performance with two feedback vector length control methods in CLIR, and (2) the effect of multilingual data merging ...
متن کامل