Phonetic Models for Generating Spelling Variants
نویسندگان
چکیده
Proper names, whether English or non-English, have several different spellings when transliterated from a non-English source language into English. Knowing the different variations can significantly improve the results of name-searches on various source texts, especially when recall is important. In this paper we propose two novel phonetic models to generate numerous candidate variant spellings of a name. Our methods show threefold improvement over the baseline and generate four times as many good name variants compared to a human while maintaining a respectable precision of 0.68.
منابع مشابه
HMM-based Pronunciation Dictionary Generation
In this paper, we discuss automatically generating a phonetic pronunciation from an orthographic spelling of words. The letter-sequence to phoneme-sequence mapping is useful in a variety of contexts, including text-to-speech applications, automatic spelling correction, and generating a pronunciation lexicon for a new training dataset which contains out-of-vocabulary words. A system based on hid...
متن کاملPhonetic Spelling Filter for Keyword Selection in Drug Mention Mining from Social Media
Social media postings are rich in information that often remain hidden and inaccessible for automatic extraction due to inherent limitations of the site's APIs, which mostly limit access via specific keyword-based searches (and limit both the number of keywords and the number of postings that are returned). When mining social media for drug mentions, one of the first problems to solve is how to...
متن کاملA Double Metaphone Encoding for Approximate Name Searching and Matching in Bangla
Almost any word can be a Bangali name, and the name in turn is often spelled in many different ways, all of which are considered correct and interchangeable. The reason for the spelling complication is two-fold: (1) there is a large gap between the script and pronunciation in Bangla, largely attributed to the large scale Sanskritization process that started in the 12 century and continued throu...
متن کاملA Framework for Computational Processing of Spelling Variation
Many languages like Hindi, especially those which have been used as (spoken) link languages and don’t have a long history of standardization, allow variant spellings of the same words. One of the reasons for this is the influence of the writer’s first language or dialect. In this paper we present a computational framework for predicting spelling variations based on the speaker’s dialect. This m...
متن کاملAnalysis of Phonetic Matching Approaches for Indic Languages
Phonetic matching plays an important role in multilingual information retrieval, where data is manipulated in multiple languages. User needs information in their local language which may be different from the language where data has been maintained. In such an environment, we need a system which matches the strings phonetically irrespective of errors either exactly or approximately. There are m...
متن کامل