Languages with mismatches
نویسندگان
چکیده
In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r . The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S| · logk+1 |S|) and answering queries in time O(|x | + |occ(x)|) for any word x , where occ is the list of all occurrences of x in S up to errors. c © 2007 Elsevier B.V. All rights reserved.
منابع مشابه
Assessing the Quality of Persian Translation of Orwell’s Nineteen Eighty-Four Based on House’s Model: Overt-Covert Translation Distinction
This study aimed to assess the quality of Persian translation of Orwell's (1949) Nineteen Eighty-Four by Balooch (2004) based on House's (1997) model of translation quality assessment. To do so, 23 pages (about 10 percent) of the source text were randomly selected. The profile of the source text register was produced and the genre was realized. The source text profile was compared to t...
متن کاملAssessing the Quality of Persian Translation of Orwell’s Nineteen Eighty-Four Based on House’s Model: Overt-Covert Translation Distinction
This study aimed to assess the quality of Persian translation of Orwell's (1949) Nineteen Eighty-Four by Balooch (2004) based on House's (1997) model of translation quality assessment. To do so, 23 pages (about 10 percent) of the source text were randomly selected. The profile of the source text register was produced and the genre was realized. The source text profile was compared to t...
متن کاملValency mismatches and the coding of reciprocity in Australian languages
Reciprocals are characterized by a crossover of thematic roles within a single clause. Their peculiar semantics often creates special argument configurations not found in other clause types. While some languages either encode reciprocals by clearly divalent, transitive clauses, or clearly monovalent, intransitive clauses, others adopt a more ambivalent solution. We develop a typology of valency...
متن کاملThe Effect of L1 Persian on the Acquisition of English L2 Orthographic System on the Shared Grounds
This paper elaborates on Persian and English orthographic shared aspects to study the effects of L1 Persian on learning English as a foreign language. While there are some examples of letter and sound mismatches in the orthographic system of both languages, those of English are more complex than Persian. In order to see the effect of the mismatch between orthography and transcription, 40 Persia...
متن کاملMismatches and Divergences: the Continuum Perspective
In this paper, we address the issue of resolving divergences (such as he swam across the river translates into French as il a traversé la rivière à la nage) and mismatches (such as fish translates into Spanish as pez and pescado) in a uniform way. First, we present empirical evidence that only a continuum perspective on divergences and mismatches can help translate them in different languages. ...
متن کاملHow to Overcome Translation Mismatches - An Inference Driven Mapping between Meaning Representations
This paper deals with issues that a bidirectional GermanRussian machine translation system faces when the meaning of spatial prepositions in these languages does not line up. A uniform representation language is used to define the meaning of spatial prepositions in a language independent way. This formal language makes it possible to compare monolingual meaning representations and allows for th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Theor. Comput. Sci.
دوره 385 شماره
صفحات -
تاریخ انتشار 2007