الگوریتم levenshtein

نتایج جستجو برای: الگوریتم levenshtein

تعداد نتایج: 22948 فیلتر نتایج به سال:

A Dialect Distance Metric Based on String and Temporal Alignment

2013

Thomas Kisler Uwe D. Reichel

The Levenshtein distance is an established metric to represent phonological distances between dialects. So far, this metric has usually been applied on manually transcribed word lists. In this study we introduce several extensions of the Levenshtein distance by incorporating probabilistic edit costs as well as temporal alignment costs. We tested all variants for compliance with the axioms that ...

متن کامل

New Upper Bounds for Some Spherical Codes

2009

Peter Boyvalenkov Peter Kazakov R. Hill

The maximal cardinality of a code W on the unit sphere in n dimensions with (x, y) ≤ s whenever x, y ∈ W, x 6= y, is denoted by A(n, s). We use two methods for obtaining new upper bounds on A(n, s) for some values of n and s. We find new linear programming bounds by suitable polynomials of degrees which are higher than the degrees of the previously known good polynomials due to Levenshtein [11,...

متن کامل

A Cognitively Grounded Measure of Pronunciation Distance

2014

Martijn Wieling John Nerbonne Jelke Bloem Charlotte Gooskens Wilbert Heeringa R. Harald Baayen

In this study we develop pronunciation distances based on naive discriminative learning (NDL). Measures of pronunciation distance are used in several subfields of linguistics, including psycholinguistics, dialectology and typology. In contrast to the commonly used Levenshtein algorithm, NDL is grounded in cognitive theory of competitive reinforcement learning and is able to generate asymmetrica...

متن کامل

ارائه‌ی یک الگوریتم ترکیبی برای خوشه‌بندی داده‌ها با استفاده از الگوریتم‌های K-m‌e‌a‌n‌s و الکترومغناطیس

Journal: :مهندسی عمران 2017

متن کامل

EACL - Expansion of Abbreviations in CLinical text

2014

Lisa Tengstrand Beáta Megyesi Aron Henriksson Martin Duneld Maria Kvist

In the medical domain, especially in clinical texts, non-standard abbreviations are prevalent, which impairs readability for patients. To ease the understanding of the physicians’ notes, abbreviations need to be identified and expanded to their original forms. We present a distributional semantic approach to find candidates of the original form of the abbreviation, and combine this with Levensh...

متن کامل

Searching for repeated words in a

1995

Marie-France Sagot Vincent Escalier Alain Viari Henri Soldano

We present in this paper an algorithm that locates similar words common to a set of strings deened over an alphabet , where the similarity is stated in terms of a Levenshtein edit distance. The comparison of the words in the strings is realized by using a reference object called a model which is a word over. This allows us to perform a multiple comparison of the strings as opposed to pairwise c...

متن کامل

A New Phonetic Candidate Generator for Improving Search Query Efficiency

2011

Bo Peng Yao Qian Frank K. Soong Bo Zhang

Misspelled query due to homophones or mispronunciation is difficult to be corrected in the conventional spelling correction methods. In phonetic candidate generation, the generator is to produce candidates which are phonetically similar to a given query. In this paper, we present a new phonetic candidate generator for improving the search efficiency of a query. The proposed generator consists o...

متن کامل

The Relative Divergence of Dutch Dialect Pronunciations from their Common Source: An Exploratory Study

2007

Wilbert Heeringa Brian Joseph

In this paper we use the Reeks Nederlandse Dialectatlassen as a source for the reconstruction of a ‘proto-language’ of Dutch dialects. We used 360 dialects from locations in the Netherlands, the northern part of Belgium and French-Flanders. The density of dialect locations is about the same everywhere. For each dialect we reconstructed 85 words. For the reconstruction of vowels we used knowledg...

متن کامل

Construction of optimal codes in deletion and insertion metric

Journal: :CoRR 2010

Hyun Kwang Kim Joon Yop Lee Dong Yeol Oh

We improve Levenshtein’s upper bound for the cardinality of a code of length four that is capable of correcting single deletions over an alphabet of even size. We also illustrate that the new upper bound is sharp. Furthermore we construct an optimal perfect code that is capable of correcting single deletions for the same parameters.

متن کامل

Phylogeny and geometry of languages from normalized Levenshtein distance

Journal: :CoRR 2011

Maurizio Serva

The idea that the distance among pairs of languages can be evaluated from lexical differences seems to have its roots in the work of the French explorer Dumont D’Urville. He collected comparative words lists of various languages during his voyages aboard the Astrolabe from 1826 to 1829 and, in his work about the geographical division of the Pacific, he proposed a method to measure the degree of...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید