Improving Precision and Recall for Soundex Retrieval
نویسندگان
چکیده
We present a phonetic algorithm that fuses existing techniques and introduces new features. This combination offers improved precision and recall.
منابع مشابه
Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique
This paper presents an algorithm for Thai-English crosslanguage transliterated word retrieval. The algorithm enables retrieval of documents containing either the English keywords or the corresponding English-to-Thai transliterated words. This is done by retrieving documents based on phonetic codes of keywords rather than the keywords themselves. The phonetic coding is based on the Soundex codin...
متن کاملFinding Content in File-Sharing Networks When You Can't Even Spell
The query success rate in current filesharing systems is low, for example, only 7-10% in Gnutella. An often-overlooked cause for this low recall is simply that keywords in queries and document descriptions are misspelled. Although many sophisticated approximate matching techniques have been developed by the Information Retrieval community, to our knowledge, they have not been used in popular P2...
متن کاملTransliterated arabic name search
We address name search for transliterated Arabic given names. In previous work, we addressed similar problems with English and Arabic surnames. In each previous case, we used a variant of Soundex and n-grams to improve precision and recall of name matching compared against well known approaches such as the Russell Soundex algorithm. Unlike prior work, the proposed approach does not rely upon So...
متن کاملCross-language Phonetic Similarity Measure on Terms Appeared in Asian Languages
This study aims to develop a phonetic similarity measurement method across Asian languages. The method, cross-language similarity algorithm aggregates the transcription of language-specific Romanization, the International Phonetic Alphabet, the Soundex algorithm, and Levenshtein distance. To evaluate the proposed algorithm, this study involves an experiment using ninety-two chemical element nam...
متن کاملEfficient Name Variation Detection
Semantic integration, link analysis and other forms of evidence detection often require recognition of multiple occurrences of a single name. However, names frequently occur in orthographic variations resulting from phonetic variations and transcription errors. The computational expense of similarity assessment algorithms usually precludes application to all pairs of strings. Instead, it is typ...
متن کامل