Fusion of Multiple Corrupted Transmissions and its effect on Information Retrieval
نویسندگان
چکیده
Much previous work has focused on correction of OCR degraded text with little work addressing the possibility of fusing the generated text from different OCR systems, which are assumed to produce different types of errors. This paper explores text fusion, which involves the use of language modeling to determine which OCR system (if any) properly recognized individual words. The technique was applied on Arabic text that was synthetically degraded using different models of OCR degradation. The different degraded versions were consequently fused leading to a new version with significantly fewer errors than any of the original versions. Also, the effect of fusion on retrieval was examined. Index Terms – Text Fusion, OCR, Language Modeling, and Information Retrieval
منابع مشابه
Two Experiments on Retrieval With Corrupted Data and Clean Queries in the TREC-4 Adhoc Task Environment: Data Fusion and Pattern Scanning
We report on several experiments in using data fusion to improve information retrieval, and in approximate text and 5-gram mathcing methods for retrieval of corrupted text, in the TREC context.
متن کاملFactors Affecting Student's Scientific Information Retrieval based on Fuzzy Logic Method Compared to Traditional Method
Background and aim: The aim of this study was to identify the factors affecting on students' performance in information retrieval based on fuzzy logic method compared to traditional method. Materials and methods: This survey-descriptive study was performed using quantitative approach. The research population was 34 PhD students, and the researcher-made questionnaire was used. Data were analyzed...
متن کاملبررسی تأثیرات ریشهیابی در بازیابی اطلاعات در زبان فارسی
Using the language-specific behavior in information retrieval systems can improve the quality of the retrieved results significantly. Part of the word that remains after removing its affixes is called stem. Stemming process can be used for improving the relevancy of the results in information retrieval system. Different morphological variants of words (plural, past tense…) will be mapped into t...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کامل