Pattern Recognition of Strings with Substitutions, Insertions, Deletions and Generalized Transpositions1

نویسندگان

  • B. J. Oommen
  • R. K. S. Loke
چکیده

We study the problem of recognizing a string Y which is the noisy version of some unknown string X* chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Although some work has been done to extend the traditional set of edit operations to include the straightforward transposition of adjacent characters2 [LW75] the problem is unsolved when the transposed characters are themselves subsequently substituted, as is typical in cursive and typewritten script, in molecular biology and in noisy chain-coded boundaries. In this paper we present the first reported solution to the analytic problem of editing one string X to another, Y using these four edit operations. A scheme for obtaining the optimal edit operations has also been given. Both these solutions are optimal for the infinite alphabet case. Using these algorithms we present a syntactic pattern recognition scheme which corrects noisy text containing all these types of errors. The paper includes experimental results involving sub-dictionaries of the most common English words which demonstrate the superiority of our system over existing methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Noisy Subsequence Recognition Using Constrained String Editing Involving Substitutions, Insertions, Deletions and Generalized Transpositions1

We consider a problem which can greatly enhance the areas of cursive script recognition and the recognition of printed character sequences. This problem involves recognizing words/strings by processing their noisy subsequences. Let X* be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X*. We study the problem of estimating X* by processing Y, a noisy version o...

متن کامل

String editing under a combination of constraints

Let X and Y be any two strings of finite lengths N and M , respectively, over a finite alphabet. An edit distance between X and Y is defined as the minimum sum of elementary edit distances associated with edit operations of substitutions, deletions, and insertions needed to transform X to Y . In this paper, the problem of efficient computation of such a distance is considered under the assumpti...

متن کامل

Structured Motifs Recognition in DNA sequences

In this paper is presented a methodology for structured motifs recognition (SMR) in DNA sequences. The SMR problem consists of finding all instances of a triple-pattern PL − PC − PR in a DNA sequence, where PL, PC and PR are based on the IUPAC alphabet, and PL and PR are both separated from PC by a distance no greater than ”n” characters, which is provided as input. In this problem an inexact a...

متن کامل

Peptide classification using optimal and information theoretic syntactic modeling

We consider the problem of classifying peptides using the information residing in their syntactic representations. This problem, which has been studied for more than a decade, has typically been investigated using distance-based metrics that involve the edit operations required in the peptide comparisons. In this paper, we shall demonstrate that the Optimal and Information Theoretic (OIT) model...

متن کامل

Investigation of Polymorphisms in Non-Coding Region of Human Mitochondrial DNA in 31 Iranian Hypertrophic Cardiomyopathy (HCM) Patients

The D-loop region is a hot spot for mitochondrial DNA (mtDNA) alterations, containing two hypervariable segments, HVS-I and HVS-II. In order to identify polymorphic sites and potential genetic background accounting for Hypertrophic CardioMyopathy (HCM) disease, the complete non-coding region of mtDNA from 31 unrelated HCM patients and 45 normal controls were sequenced. The sequences were aligne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995