document weight

Semantic Search: Document Ranking and Clustering Using Computer Science Ontology and N-Grams

Journal: :JDIM 2014

Thanyaporn Boonyoung Anirach Mingkhwan

Semantic similarity has become an important tool and widely been used to solve traditional Information Retrieval problems. This study adopts ontology of computer science and proposes an ontology indexing weight based on Wu and Palmer’s edge counting measure and uses the N-grams method for computing a family of word similarity. The study also compares the subsumption weight between Hliaoutakis a...

متن کامل

Learning Term Weights for Ad-hoc Retrieval

Journal: :CoRR 2016

Benjamin Piwowarski

Most Information Retrieval models compute the relevance score of a document for a given query by summing term weights specific to a document or a query. Heuristic approaches, like TF-IDF, or probabilistic models, like BM25, are used to specify how a term weight is computed. In this paper, we propose to leverage learning-to-rank principles to learn how to compute a term weight for a given docume...

متن کامل

A Bayesian Approach for Learning Document Type Relevance

2007

Peter C. K. Yeung Stefan Büttcher Charles L. A. Clarke Maheedhar Kolla

Retrieval accuracy can be improved by considering which document type should be filtered out and which should be ranked higher in the result list. Hence, document type can be used as a key factor for building a re-ranking retrieval model. We take a simple approach for considering document type in the retrieval process. We adapt the BM25 scoring function to weight term frequency based on the doc...

متن کامل

A Framework for Sampling-Based XML Data Pricing

Journal: :Trans. Large-Scale Data- and Knowledge-Centered Systems 2016

Ruiming Tang Antoine Amarilli Pierre Senellart Stéphane Bressan

While price and data quality should define the major tradeoff for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document,...

متن کامل

Another expression of the MacWilliams identities and its applications

Journal: :Advances in Mathematics of Communications 2022

Let \begin{document}$ \mathcal C $\end{document} be a maximum distance separable (MDS) linear code over finite field id="M2">\begin{document}$ \Bbb F_q $\end{document}. In this paper, we present new formula of its weight distribution, which can seen as another expression the Ma...

متن کامل

Effect of weight assignment in data fusion based information retrieval

Journal: :Int. Arab J. Inf. Technol. 2011

Batri Krishnan Murugesh Veerasamy Gopalan Nagammapudur

Variation in performances of an Information Retrieval system, which merges results from a number of retrieval schemes possessing equal and unequal weights, is studied in this paper. Weight of the retrieval schemes for a particular document is derived from the relevance scores of that corresponding document. Since, the relevance scores are varying from document to document and corpus to corpus, ...

متن کامل

Comparative Analysis of IDF Methods to Determine Word Relevance in Web Document

2014

Jitendra Nath Singh Sanjay K. Dwivedi

Inverse document frequency (IDF) is one of the most useful and widely used concepts in information retrieval. When it is used in combination with the term frequency (TF), the result is a very effective term weighting scheme (TF-IDF) that has been applied in information retrieval to determine the weight of the terms. Terms with high TF-IDF values imply a strong relationship with the document the...

متن کامل

Nonexistence of some ternary linear codes with minimum weight -2 modulo 9

Journal: :Advances in Mathematics of Communications 2023

One of the fundamental problems in coding theory is to find \begin{document}$ n_q(k,d) $\end{document}, minimum length id="M4">\begin{document}$ n $\end{document} for which a linear code id="M5">\begin{document}$ dimension id="M6">\begin{document}$ k and weight id="M7">\begin{d...

متن کامل

การคัดกรองเอกสารที่สืบคนโดยการแปลงน้ําหนัก-ระยะหาง Filtering Search Document using Weight-Distance Transformation

2009

Searching information from the Internet via available search engines is often overwhelmed by myriad of resulting documents that are mostly irrelevant. The problem lies in the use of proper keywords arranged in the right order. This paper proposes an effective filtering approach that exploits various existing techniques through a sequence of transformations. The proposed approach employs ontolog...

متن کامل

A Feature Weight Adjustment Algorithm for Document Categorization

2000

Shrikanth Shankar George Karypis

In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intra-nets. Automatic text categorization, which is the task of assigning text documents to pre-speci ed classes (topics or themes) of documents, is an important task that can help both in organizing as well as in nding information on thes...

متن کامل