Applying Multiple Characteristics and Techniques in the NICT Information Retrieval System in NTCIR-5
نویسندگان
چکیده
Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. Robertson’s 2Poisson model and Rocchio’s formula, both of which are known to be effective, are used in the system. Characteristics of newspapers such as locational information are used. We present our application of Fujita’s method, where longer terms are used in retrieval by the system but de-emphasized relative to the emphasis on the shortest terms; this allows us to use both compound and single-word terms. The statistical test used in expanding queries through an automatic feedback process is described. The method gives us terms that have been statistically shown to be related to the top-ranked documents that were obtained in the first retrieval. We also used a numerical term, QIDF, which is an IDF term for queries. It decreases the scores for stop words that occur in many queries. It can be very useful for foreign languages for which we cannot determine stop words. We participated in three monolingual information retrieval tasks (Korean, Japanese, and English) and two bilingual information retrieval tasks (Japanese-English and EnglishJapanese) in NTCIR-5. We obtained high precision in all the tasks in which we participated compared to other participants. In particular, we obtained the best precision in the Korean title-based monolingual information retrieval and the Japanese-English bilingual information retrieval.
منابع مشابه
Applying Multiple Characteristics and Techniques in the NICT Information Retrieval System at NTCIR-6
Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. It uses Robertson’s 2-Poisson model and Rocchio’s formula, both of which are known to be effective. Characteristics of newspapers such as locational information are used. We present our application of Fujita’s method, where longer terms are used in retrieval by...
متن کاملApplying Multiple Characteristics and Techniques to Obtain High Levels of Performance in Information Retrieval
Our information retrieval system which achieves its goals by taking advantage of numerous characteristics of the information and applying numerous sophisticated techniques is described. Robertson’s 2Poisson model and Rocchio’s formula, both of which are known to be effective, have been applied in the system. Characteristics of newspapers such as locational information were applied. We give exam...
متن کاملApplying Multiple Characteristics and Techniques to Obtain High Levels of Performance in Information Retrieval at NTCIR-4
Our information retrieval system takes advantage of numerous characteristics of the information and applies numerous sophisticated techniques. Robertson’s 2-Poisson model and Rocchio’s formula, both of which are known to be effective, have been applied in the system. Characteristics of newspapers such as locational information were applied. We present our application of Fujita’s method, where l...
متن کاملExperiments on Chinese-English Cross-language Retrieval at NTCIR-4
The AI Lab group participated in the crosslanguage retrieval task at NTCIR-4. Aiming at a practical retrieval system, our applied a dictionarybased approach incorporated with phrasal translation, co-occurrence disambiguation and query expansion techniques. Although experimental results were not as good as we expected, our study demonstrated the feasibility of applying CLIR techniques in real-wo...
متن کاملA Patent Retrieval Method Using a Hierarchy of Clusters at TUT
To retrieve relevant documents from an enormous document collection, we usually utilize the similarity or distance measure between a query and the documents, or apply document clustering techniques to the document collection and partition it into relevant document groups. For patent retrieval, however, it is difficult to retrieve documents by using query terms only, because complex terminologie...
متن کامل