Document Detection Data Preparation
نویسنده
چکیده
The document collection needed to reflect the corpus imagined to be seen by analysts. This meant that a very large collection was needed to test the scaling of the algorithms, including documents from many different domains to test the domain independence of the algorithms. Additionally the documents selected needed to mirror the different types of documents used in the TIPSTER application. Specifically they had to have a varied length, a varied writing style, a varied level of editing and a varied vocabulary. As a final requirement, the documents had to cover different fimeframes to show the effects of document date on the routing task.
منابع مشابه
Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملPlagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...
متن کاملAn Efficient Partition Technique to reduce the Attack Detection Time with Web based Text and PDF files
In this paper we propose an efficient partition technique for web based files (jsp, html, php), text (word, text files) and PDF files. We are working in the direction of attack time detection. For this motivation we are considering mainly two factors first in the direction of minimizing the time, second in the direction of file support. For minimizing the time we use partitioning method. We als...
متن کاملObject Motion Detection in Video Frames Using Background Frame Matching
In this project we present detection the motion in video frames using background frame Matching. These document video surveillance systems have become widely available to ensure safety and security in both the public and private sectors due to incidents of terrorist activity and other social problems. This paper proposes a novel motion detection method with a background model module and an obje...
متن کاملExamining the Ethical Foundations of Compensation for Mistakes and Forgeries in the Preparation of Official Documents
Background: Preparing a formal transaction document is one of the specific duties of notaries public, which requires the use and observance of various substantive and formal conditions. Failure to comply with any of these conditions can lead to the annulment of the document by the court and the responsibility to compensate the clerks. Compensation by the clerks in various articles such as Artic...
متن کامل