Passage Retrieval Starting from Patent Claims A Clef-Ip 2013 Task Overview

نویسندگان

  • Florina Piroi
  • Mihai Lupu
  • Allan Hanbury
چکیده

Most of the searches a patent expert at a patent o ce does are using boolean methods to query large databases of patent data. The Clef-Ip evaluation track is designed to experiment with information retrieval techniques on the patent domain. The data corpus in the Clef-Ip Lab consists of patent documents published by the European Patent Ofce. One of the main tasks in the Lab has been related to the Prior Art type of search performed by the patent experts at patent o ces. The task has went through various changes along the years, from using virtual patent documents as topics (in 2009) to actual patent application documents, and sets of claims from patent application documents (2012 and 2013). Relevance assessments for this task were based on Search Reports published by the European Patent O ce. In this overview we give report on the work we have done in organizing this retrieval task in 2013. 1 The Clef-Ip Passage Retrieval Task The technological developments in our time are closely coupled with the patent system which encourages inventors to make their ideas public in exchange for a monopoly on the invention for a limited period of time, up to 20 years. A patent can be seen as a contract between a government and the patent owner by which the latter can exclude other parties from manufacturing and exploiting the invention without a permission. To obtain a patent, one of the main requirements is that the invention is new. To verify this, extensive searches, not only in the patent repositories, but also specialized literature, conference publications, etc., must pe thoroughly done. The amount of data to be searched, as well as the fact that many publications are now digitized, makes it that search operations cannot be done without the help of computers. With the tasks organized in Clef-Ip along the years we investigate how current IR solutions may serve to the needs of patent experts doing novelty searches. This task, in particular, is meant to explore the approaches that IR systems may o er when faced with nding speci c pieces of text that are relevant to any given patent claim. We present here shortly the process of obtaining a patent with focus on the European Patent O ce (Epo [2]). To obtain a patent, a patent application must be registered with a patent o ce. A patent application contains an abstract, a title, a detailed description of the invention, drawings (if necesary) and a set of claims that de ne the extent of the protection aimed for. An applicant will also cite previously published patents that are considered relevant to the described invention. At the Epo applications can be made in any language. Given that the o cial languages at the Epo are English, French, and German, whenever another language is used in an application, a translation to one of these three languages must be made. Once the application is registered at the patent o ce, it will be examined that it is novel, that it has an inventive step, and that it is realizable. During these examinations, at the Epo, a European search report is prepared which lists all the relevant documents found (called patent citations). The Epo publishes patent applications together with their search reports in a time limit of 18 months from the ling date. If the patent applicant, based on the search report, decides to pursue for a patent, a sequence of communications between him and the patent o ce takes place. Usually, during this process, the claims are adjusted such as not to con ict with existing patents. The European search report is mainly based on the application claims, and, more often than not, speci es not only the documents relevant to the (various) claims, but also the passages particularly of importance to them. Knowing this, the Passage Retrieval Task Starting from Claims was designed to investigate the e ectiveness of Information Retrieva (IR) methods in nding relevant documents and marking passages particularly pertinent to a set of claims. 2 The Clef-Ip Corpus The Clef-Ip corpus was distributed as a collection of over 3 million Xml documents pertaining to over 1.5 million patents published by the Epo and the World Intellectual Property Organization (Wipo) prior to 2002 [8]. TheClef-Ip corpus is an extract of the larger Marec collection which uses a common normalized Xml data format to represent patent documents published by the Epo, Wipo, US Patent and Trademark O ce, and Japan Patent O ce. We do not describe the collection content here, but we direct the reader to the previous publications that detail it ([7,9]).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Formulation for Prior Art Search - Georgetown University at CLEF-IP 2013

Our group participated in the CLEF-IP 2013 Passage Retrieval starting from Claims task. We focus on formulating representative queries from various metadata that is embedded in a patent document. We then submit the queries to a state-of-the-art search engine to perform document level retrieval. For passage level retrieval, we implement a TF-IDF algorithm that calculates the sum of query keyword...

متن کامل

Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF

In 2012, the University of Hildesheim participated in the CLEF-IP claims-to-passage task. 4 runs were submitted and different approaches tested. The tested approaches included a language independent trigram search approach, one approach formulating a query in the source language only and another approach with querys translated to English, German, French and Spanish. The results were not satisfa...

متن کامل

Report on the CLEF-IP 2013 Experiments: Multilayer Collection Selection on Topically Organized Patents

This technical report presents the work which has been carried out using Distributed Information Retrieval methods for federated search of patent documents for the passage retrieval starting from claims (patentability or novelty search) task. Patent documents produced worldwide have manually-assigned classification codes which in our work are used to cluster, distribute and index patents throug...

متن کامل

Chemnitz at CLEF IP 2012: Advancing Xtrieval or a Baseline Hard to Crack

For the 2012 CLEF-IP Claims to passage task we reused and improved our Xtrieval framework. Our two-step approach comprises creating two Lucene indexes: one containing the whole patent application documents and one containing the same documents split into passages. We prepared three setups and conducted each with a translated and an untranslated topic set, which was just applied to the claims. T...

متن کامل

BiTeM site report for the Claims to Passage task in CLEF-IP

In CLEF-IP 2012, we participated in the Claims to Passage task where the goal was to return relevant passages according to sets of claims, for patentability or novelty search purposes. The collection contained 2.3M of documents, corresponding to an estimated volume of 250M of passages. To cope with the problems induced by this large dataset, we designed a two-step retrieval system. In the first...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013