POSTECH at NTCIR-5 Patent Retrieval: Smoothing Experiments in a Language Modeling Approach to Patent Retrieval
نویسندگان
چکیده
This report describes the experimental results of our participation at the Document Retrieval Subtask of NTCIR-5 Patent Retrieval Task. Unlike newspaper articles which belong to the main document type handled in previous information retrieval experiments, patent documents have many different characteristics in terms of length, technicality, structureness, etc. Among these, we focus on the length of patent documents. Since patent documents are long and cover a diverse spectrum of topicality, document models estimated from them by the language modeling approach are expected to show different characteristics from those estimated from short document samples such as newspaper articles. Based on such contemplation, this report investigates the effect of smoothing in the language modeling approach on retrieving patent documents.
منابع مشابه
POSTECH at NTCIR-6 English Patent Retrieval Subtask
This paper reports our experimental results at the NTCIR-6 English Patent Retrieval Subtask. Our previous participation at the patent retrieval Subtask revealed that the long length of the patent applications require less smoothing of the document model than general documents such as news paper articles. We setup the initial baseline retrieval system for U.S. patent applications and compare the...
متن کاملRevisiting Document Length Hypotheses: NTCIR-4 CLIR and Patent Experiments at Patolis
NTCIR-4 experiments of CLIR J-J and Patent tasks, focusing on comparative studies of two testcollections and two retrieval approaches in view of document length hypotheses are described. TF*IDF outperformed the language modeling approach in the CLIR J-J task while two approaches performed similarly in the Patent task. Two different document length hypotheses behind two tasks/collections are ass...
متن کاملExperiments on Cross-language and Patent Retrieval at NTCIR-3 Workshop
The Berkeley group participated in the crosslanguage retrieval task and the patent retrieval task at the third NTCIR workshop. This paper describes our experiments on cross-language and patent retrieval. We present an automatic relevance feedback procedure for document ranking formula based on logistic regression, and a procedure for automatically extracting Chinese/Japanese translations of Eng...
متن کاملNTCIR-5 Patent Retrieval Experiments at Hitachi
In NTCIR-5, we used five retrieval methods proposed in NTCIR-4: (1) query term weighting using only document frequency, (2) stopword deletion, (3) two-stage patent retrieval, (4) term weighting considering “measurement terms”, and (5) related term expansion. In this paper, we compare the retrieval accuracy for two test sets: 34 main queries in NTCIR-4 and 1189 new queries in NTCIR-5. Then, we e...
متن کاملOverview of Patent Retrieval Task at NTCIR-5
In the Fifth NTCIR Workshop, we organized the Patent Retrieval Task and performed three subtasks; Document Retrieval, Passage Retrieval, and Classification. This paper describes the Document Retrieval Subtask and Passage Retrieval Subtask, both of which were intended for patent-to-patent invalidity search task. We show the evaluation results of the groups participating in those subtasks.
متن کامل