منابع مشابه
Classification of Errors in Text
This paper presents two classifications of errors in Czech texts. As a basic resource we use the corpus (Chyby – Errors) which has been continuously developed from 1999–2000 ([1]). The corpus text contains various kinds of errors such as spelling, typographical, grammatical, semantic, lexical, and stylistic ones. They have been corrected manually and annotated according to the classification of...
متن کاملText Indexing with Errors
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the numbe...
متن کاملLarge Text Searching Allowing Errors
We present a full inverted index for exact and approximate string matching in large texts. The index is composed of a table containing the vocabulary of words of the text and a list of positions in the text corresponding to each word. The size of the table of words is usually much less than 1% of the text size and hence can be kept in main memory, where most query processing takes place. The te...
متن کاملCorrecting ‘Wrong-Column’ Errors in Text Databases
We present a novel data-driven approach for detecting and correcting errors in text databases. We focus on information that was accidentally entered in an incorrect column. Unlike machine-learning approaches to data cleaning that assume the database cells to contain atomic or numeric content, our method takes into account substrings of textual cells, and treats error detection and correction as...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Royal Society of Medicine
سال: 2008
ISSN: 0141-0768,1758-1095
DOI: 10.1258/jrsm.2008.08k016