Cross-Language Information Retrieval Based on Category Matching of Web Directories
نویسندگان
چکیده
With the popularity of the Internet, more and more languages are becoming to be used for Web documents. Accordingly, Cross-Language Information Retrieval (CLIR), a method to retrieve documents written in one or more languages using a query written in another language, has been actively studied. A variety of methods, including employing corpus statistics for translation of terms and disambiguation of translated terms, are studied and a certain results has been obtained. However, since corpus-based methods depend much on the domain of the content, such methods have a potential problem that the retrieval effectiveness might be poor for domains which do not match the content of the corpus. In this paper, we propose a method to employ a Web directory which has multiple language versions such as Yahoo! for CLIR of Web documents. Feature terms are extracted from Web documents in a category, and one or more correspondent categories are determined by comparing similarities of categories across languages. We intend to resolve ambiguities of dictionary translation and to improve the retrieval effectiveness by limiting the categories to be retrieved.
منابع مشابه
Analysis of Appropriate Category Level of Web Directory for Cross-Language Information Retrieval
In this paper, we analyzed appropriate category level of Web directory for Cross-Language Information Retrieval (CLIR). Our proposed method for CLIR is based on estimating domains of the query using hierarchic structures of Web directories. Therefore, it is necessary for correct domain estimation to detect appropriate category level of Web directory. We conducted experiments of retrieval using ...
متن کاملCross-Language Information Retrieval based on category matching between language versions of a web directory
Since the Web consists of documents in various domains or genres, the method for Cross-Language Information Retrieval (CLIR) of Web documents should be independent of a particular domain. In this paper, we propose a CLIR method which employs a Web directory provided in multiple language versions (such as Yahoo!). In the proposed method, feature terms are first extracted from Web documents for e...
متن کاملImpact of Controlled and Free Language Use in Retrieving Articles from the ProQuest and Science Direct Databases
Abstract Introduction: The growth and expansion of the Internet has changed the way information is accessed and many facilities have been created on the Web to facilitate and expedite information locating. Objective: To identify the impact of keyword documentation using the medical thesaurus on the retrieval of articles from Proquest and Science Direct databases. Materials and Methods:The pr...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کامل