Semi-Automatic Reconstruction of Cross-Cut Shredded Documents
نویسندگان
چکیده
We propose a new approach for cross-cut shredded document reconstruction and evaluate it on the DARPA Shredder Challenge dataset. We begin by pre-processing chads. A set of costs based on shape (gaps, overlaps, edge similarity), graphical content (ruling line alignment, text line alignment), and semantic content (character and letter combinations) is calculated and used to rank putative chad matches. Documents are then reconstructed chad-by-chad. We introduce the concept of an oracle which knows the ground truth for puzzles one and two of the DARPA Shredder Challenge dataset, replacing the need for human verification of matches and adding the capability to evaluate the efficiency of algorithms for reassembling cross-cut shredded documents in a standard, quantitative way, both issues which have not been addressed by previous attempts at solving this problem. Keywords—Document Image Analysis, Shredded Document Reconstruction, Optical Character Recognition
منابع مشابه
Reconstructing Shredded Documents
This project looks at the challenges involved in the automatic reconstruction of strip (vertically cut) and cross (both vertically and horizontally cut) shredded documents. The unshredding problem is of interest in the fields of forensics, investigative sciences, and archaeology. All stages of the unshredding pipeline are analysed, starting from scanned images of shreds and ending with reconstr...
متن کاملA Memetic Algorithm for Reconstructing Cross-Cut Shredded Text Documents
The reconstruction of destroyed paper documents became of more interest during the last years. On the one hand it (often) occurs that documents are destroyed by mistake while on the other hand this type of application is relevant in the fields of forensics and archeology, e.g., for evidence or restoring ancient documents. Within this paper, we present a new approach for restoring cross-cut shre...
متن کاملEnhancing a Genetic Algorithm with a Solution Archive to Reconstruct Cross Cut Shredded Text Documents
In this work the concept of a trie-based complete solution archive in combination with a genetic algorithm is applied to the Reconstruction of Cross-Cut Shredded Text Documents (RCCSTD) problem. This archive is able to detect and subsequently convert duplicates into new yet unvisited solutions. Cross-cut shredded documents are documents that are cut into rectangular pieces of equal size and sha...
متن کاملAn alternative clustering approach for reconstructing cross cut shredded text documents
In this paper, we propose a clustering approach for solving the problem of reconstructing cross-cut shredded documents. This problem is important in the field of forensic science. Unlike other clustering approaches which are applied as a preprocessing step before the actual reconstruction algorithms, our clustering approach is part of the reconstruction process itself. We define a new cost func...
متن کاملReconstructing Cross Cut Shredded Documents with a Genetic Algorithm with Solution Archive
The reconstruction of shredded documents is of high interest not only in forensic science but also when documents are destroyed unintentionally. Reconstructing cross-cut shredded documents (RCCSTD) is particularly difficult since the documents are cut into rectangular pieces of equal size. Since shape information along the edges—in contrast to hand torn pieces—cannot be exploited, the reconstru...
متن کامل