Integrating the Probabilistic Models BM25/BM25F into Lucene

نویسندگان

  • Joaquín Pérez-Iglesias
  • José R. Pérez-Agüera
  • Víctor Fresno-Fernández
  • Yuval Z. Feinstein
چکیده

This document describes the BM25 and BM25F implementation using the Lucene Java Framework. The implementation described here can be downloaded from [Pérez-Iglesias 08a]. Both models have stood out at TREC by their performance and are considered as stateof-the-art in the IR community. BM25 is applied to retrieval on plain text documents, that is for documents that do not contain fields, while BM25F is applied to documents with structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Probabilistic Relevance Framework: BM25 and Beyond

The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970–1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Aga...

متن کامل

A Short Note on Proximity-based Scoring of Documents with Multiple Fields

Œe BM25 ranking function is one of the most well known query relevance document scoring functions and many variations of it are proposed. Œe BM25F function is one of its adaptations designed formodeling documentswithmultiple fields. Œe Expanded Span method extends a BM25-like function by taking into considerations of the proximity between term occurrences. In this note, we combine these two var...

متن کامل

When Simple is (more than) Good Enough: Effective Semantic Search with (almost) no Semantics

• Baseline retrieval • Flat text representation • Standard retrieval models (LM, BM25) • Fielded representation • Predicates holding title values are put in a separate field • Fielded variants of retrieval models (LMF, BM25F) • Entity importance • Weigh trusted, high-quality sources higher (DBpedia) • Extended preprocessing • Content extraction from URIs ...

متن کامل

A Comparative Study of Probabilistic and Language Models for Information Retrieval

Language models for information retrieval have received much attention in recent years, with many claims being made about their performance. However, previous studies evaluating the language modelling approach for information retrieval used different query sets and heterogeneous collections, which make reported results difficult to compare. This research is a broad-based study that evaluates la...

متن کامل

A Probabilistic Model of Learning Fields in Islamic Economics and Finance

In this paper an epistemological model of learning fields of probabilistic events is formalized. It is used to explain resource allocation governed by pervasive complementarities as the sign of unity of knowledge. Such an episteme is induced epistemologically into interacting, integrating and evolutionary variables representing the problem at hand. The end result is the formalization of a p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0911.5046  شماره 

صفحات  -

تاریخ انتشار 2009