Applying Multiple Characteristics and Techniques in the NICT Information Retrieval System at NTCIR-6

نویسندگان

  • Masaki Murata
  • Jong-Hoon Oh
  • Qing Ma
  • Hitoshi Isahara
چکیده

Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. It uses Robertson’s 2-Poisson model and Rocchio’s formula, both of which are known to be effective. Characteristics of newspapers such as locational information are used. We present our application of Fujita’s method, where longer terms are used in retrieval by the system but de-emphasized relative to the emphasis on the shortest terms. This allows us to use both compound and single-word terms. The statistical test used in expanding queries through an automatic feedback process is described. The method gives us terms that have been statistically shown to be related to the top-ranked documents obtained in the first retrieval. We also use a numerical term, QIDF, which is an IDF term for queries. QIDF decreases the scores for stop words that occur in many queries. It can be very useful for foreign languages for which we cannot determine stop words. We also use web-based unknown word translation for bilingual information retrieval. We participated in two monolingual information retrieval tasks (Korean and Japanese) and five bilingual information retrieval tasks (Chinese-Japanese, EnglishJapanese, Japanese-Korean, Korean-Japanese, and English-Korean) at NTCIR-6. We obtained good results in all the tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying Multiple Characteristics and Techniques in the NICT Information Retrieval System in NTCIR-5

Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. Robertson’s 2Poisson model and Rocchio’s formula, both of which are known to be effective, are used in the system. Characteristics of newspapers such as locational information are used. We present our application of Fujita’s method, where longer terms are used ...

متن کامل

Applying Multiple Characteristics and Techniques to Obtain High Levels of Performance in Information Retrieval

Our information retrieval system which achieves its goals by taking advantage of numerous characteristics of the information and applying numerous sophisticated techniques is described. Robertson’s 2Poisson model and Rocchio’s formula, both of which are known to be effective, have been applied in the system. Characteristics of newspapers such as locational information were applied. We give exam...

متن کامل

Applying Multiple Characteristics and Techniques to Obtain High Levels of Performance in Information Retrieval at NTCIR-4

Our information retrieval system takes advantage of numerous characteristics of the information and applies numerous sophisticated techniques. Robertson’s 2-Poisson model and Rocchio’s formula, both of which are known to be effective, have been applied in the system. Characteristics of newspapers such as locational information were applied. We present our application of Fujita’s method, where l...

متن کامل

Experiments on Chinese-English Cross-language Retrieval at NTCIR-4

The AI Lab group participated in the crosslanguage retrieval task at NTCIR-4. Aiming at a practical retrieval system, our applied a dictionarybased approach incorporated with phrasal translation, co-occurrence disambiguation and query expansion techniques. Although experimental results were not as good as we expected, our study demonstrated the feasibility of applying CLIR techniques in real-wo...

متن کامل

NTCIR-6 CLIR-J-J Experiments at Yahoo! Japan

This paper describes NTCIR-6 experiments of the CLIRJ-J task, i.e. Japanese monolingual retrieval subtask, at the Yahoo group, focusing on the parameter optimization in information retrieval (IR). Unlike regression approaches, we optimized parameters completely independent from retrieval models so that the optimized parameter set can illustrate the characteristics of the target test collections...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007