NTCIR-2 Experiments Using Long Gram Based Indices

نویسندگان

Takashi Sato

Nao Hatta

Koji Hiraiwa

Kihei Kobata

Akihiro Furusho

Koto Han

چکیده

Long gram based indices are experimented at NTCIR-2. In making gram based indices, no analyses such as morphological ones are required. The accessing number, titles, abstracts and keywords are extracted from NTCIR-2 documents. The total index size is 1.43Gbyte and time to make indices is about 100 minutes. Average retrieval time per topic takes 21 seconds since documents are ranked in a Perl program which is simple and not fast. Ranking algorithm used is based on a traditional probabilistic model, and the result is standard average precision.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NTCIR-3 CLIR Experiments at Osaka Kyoiku University - Comparison of Gram-based Indices

Long gram-based indices are experimented at NTCIR-3 CLIR task. To make gram-based indices, no analyses such as morphological ones are required. Indices in three languages (i.e. Japanese, English and Chinese) are made at this task. They are quite different in some point. The difference of index overhead comes from the difference of character code for example.

متن کامل

NTCIR-3 WEB Experiments at Osaka Kyoiku University - Towards Index Partitioning and Parallel Retrieval

Long gram-based indices are experimented at NTCIR-3 WEB task. To make gram-based indices, no analyses such as morphological ones are required. 2 byte characters extracted from NTCIR-3 ‘cooked’ version of WEB task corpus. The total index size is 26 Gbyte and time to make indices is about 18 hours. Median search time per word from index is 197msec. Ranking algorithm used is based on a traditional...

متن کامل

NTCIR-3 PAT Experiments at Osaka Kyoiku University: Long Gram-based Index and Essential Words

Long gram-based indices are experimented at NTCIR-3 patent task. To make gram-based indices, no analyses such as morphological ones are required. The docno, abj, clj and dej tag fields are extracted from NTCIR-3 patent corpus. The total index size is 11.4Gbyte and time to make indices is about 8.7 hours. Median search time per word from abj and dej index is 9.8msec and 91.8msec respectively. Av...

متن کامل

NTCIR-4 PATENT Experiments at Osaka Kyoiku University - Gram-Based Passage Index and Essential Words

Long gram-based indices are experimented at NTCIR-4 patent task. No morphological analyses are required to make gram-based indices. The ABJ and DEJ tag fields are extracted and indexed from NTCIR-4 patent corpus. Passages are extracted and indexed also. The total index size is 240Gbyte and time to make indices is about 86 hours. By merging the result of passage retrieval with the result of docu...

متن کامل

NTCIR-4 WEB Experiments at Osaka Kyoiku University - Static/Dynamic Scoring Using Link Structure Analysis and Web Page Grouping

We did gram-based indexing and the retrieval with NTCIR-4 WEB task. The time required to make indices are 34.7 hours. The size of indices is 30.2Gbyte. The median of retrieval time par word is 26msec. The ranking algorithm of retrieval results is based on a traditional probabilistic model. We report on the result of gram-based indexing and the retrieval, and propose a scoring method based on li...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

NTCIR-2 Experiments Using Long Gram Based Indices

نویسندگان

چکیده

منابع مشابه

NTCIR-3 CLIR Experiments at Osaka Kyoiku University - Comparison of Gram-based Indices

NTCIR-3 WEB Experiments at Osaka Kyoiku University - Towards Index Partitioning and Parallel Retrieval

NTCIR-3 PAT Experiments at Osaka Kyoiku University: Long Gram-based Index and Essential Words

NTCIR-4 PATENT Experiments at Osaka Kyoiku University - Gram-Based Passage Index and Essential Words

NTCIR-4 WEB Experiments at Osaka Kyoiku University - Static/Dynamic Scoring Using Link Structure Analysis and Web Page Grouping

عنوان ژورنال:

اشتراک گذاری