A Double Metaphone Encoding for Approximate Name Searching and Matching in Bangla
نویسندگان
چکیده
Almost any word can be a Bangali name, and the name in turn is often spelled in many different ways, all of which are considered correct and interchangeable. The reason for the spelling complication is two-fold: (1) there is a large gap between the script and pronunciation in Bangla, largely attributed to the large scale Sanskritization process that started in the 12 century and continued throughout the middle ages, and (2) typical Bangla names have very different origins, from the indigenous names derived primarily from Sanskrit, to the imported Muslim names from Persian and Arabic, Christian names from Portuguese, and even the names from popular Western TV soap-operas. However, there is always a large degree of phonetic similarity in the spelling variants of a name, which is the key to searching and matching names in records. We present a Double Metaphone encoding for Bangla names, taking into account the various spelling and phonetic rules in use, which can be used by applications to search for and match names. We encode the spelling variants of a large number of names found in the literature to demonstrate that the encoding does indeed show that the variants of a name are equivalent. A name searching algorithm may employ various figures of merit to narrow the list of possibilities when searching for similar names; we demonstrate one such figure of merit using name encoding and edit distance that has shown good promise. Keyword: Name Searching, Name Encoding, Phonetic Encoding, Double Metaphone Encoding, Bangla, Bengali
منابع مشابه
High speed data retrieval from national data center (ndc) reducing time and ignoring spelling error in search key based on double metaphone algorithm
Fast and efficient data management is one of the demanding technologies of today’s aspect. This paper proposes a system which makes the working procedures of present manual system of storing and retrieving huge citizen’s information of Bangladesh automated and increases its effectiveness. The implemented search methodology is user friendly and efficient enough for high speed data retrieval igno...
متن کاملC o nt e xt Bas ed W ord Predi ction f or Text ing 1 L a ng ua ge
The use of digital mobile phones has led to a tremendous increase in communication using SMS. On a phone keypad, multiple words are mapped to same numeric code. We propose a Context Based Word Prediction system for SMS messaging in which context is used to predict the most appropriate word for a given code. We extend this system to allow informal words (short forms for proper English words). Th...
متن کاملA Comprehensive Roman (english)-to-bangla Transliteration Scheme
A transliteration scheme from Roman (English) to Bangla can help increase the use of Bangla in essential and diverse computing areas such as word processing, Internet and mobile communication and information query and retrieval. The Bangla script’s irregular phonetic nature and its large repertoire of consonant clusters (juktakkhors) create a large gap between the pronunciation and the orthogra...
متن کاملIndexing Methods for Approximate String Matching
Indexing for approximate text searching is a novel problem receiving much attention because of its applications in signal processing, computational biology and text retrieval, to name a few. We classify most indexing methods in a taxonomy that helps understand their essential features. We show that the existing methods, rather than completely diierent as they are regarded, form a range of solut...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کامل