POSTECH at NTCIR-5 Patent Retrieval: Smoothing Experiments in a Language Modeling Approach to Patent Retrieval

نویسندگان

  • In-Su Kang
  • Seung-Hoon Na
  • Jun-Ki Min
  • Jong-Hyeok Lee
چکیده

This report describes the experimental results of our participation at the Document Retrieval Subtask of NTCIR-5 Patent Retrieval Task. Unlike newspaper articles which belong to the main document type handled in previous information retrieval experiments, patent documents have many different characteristics in terms of length, technicality, structureness, etc. Among these, we focus on the length of patent documents. Since patent documents are long and cover a diverse spectrum of topicality, document models estimated from them by the language modeling approach are expected to show different characteristics from those estimated from short document samples such as newspaper articles. Based on such contemplation, this report investigates the effect of smoothing in the language modeling approach on retrieving patent documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

POSTECH at NTCIR-6 English Patent Retrieval Subtask

This paper reports our experimental results at the NTCIR-6 English Patent Retrieval Subtask. Our previous participation at the patent retrieval Subtask revealed that the long length of the patent applications require less smoothing of the document model than general documents such as news paper articles. We setup the initial baseline retrieval system for U.S. patent applications and compare the...

متن کامل

Revisiting Document Length Hypotheses: NTCIR-4 CLIR and Patent Experiments at Patolis

NTCIR-4 experiments of CLIR J-J and Patent tasks, focusing on comparative studies of two testcollections and two retrieval approaches in view of document length hypotheses are described. TF*IDF outperformed the language modeling approach in the CLIR J-J task while two approaches performed similarly in the Patent task. Two different document length hypotheses behind two tasks/collections are ass...

متن کامل

Experiments on Cross-language and Patent Retrieval at NTCIR-3 Workshop

The Berkeley group participated in the crosslanguage retrieval task and the patent retrieval task at the third NTCIR workshop. This paper describes our experiments on cross-language and patent retrieval. We present an automatic relevance feedback procedure for document ranking formula based on logistic regression, and a procedure for automatically extracting Chinese/Japanese translations of Eng...

متن کامل

NTCIR-5 Patent Retrieval Experiments at Hitachi

In NTCIR-5, we used five retrieval methods proposed in NTCIR-4: (1) query term weighting using only document frequency, (2) stopword deletion, (3) two-stage patent retrieval, (4) term weighting considering “measurement terms”, and (5) related term expansion. In this paper, we compare the retrieval accuracy for two test sets: 34 main queries in NTCIR-4 and 1189 new queries in NTCIR-5. Then, we e...

متن کامل

Overview of Patent Retrieval Task at NTCIR-5

In the Fifth NTCIR Workshop, we organized the Patent Retrieval Task and performed three subtasks; Document Retrieval, Passage Retrieval, and Classification. This paper describes the Document Retrieval Subtask and Passage Retrieval Subtask, both of which were intended for patent-to-patent invalidity search task. We show the evaluation results of the groups participating in those subtasks.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005