On searching misspelled collections
نویسندگان
چکیده
Over two thirds of misspelled queries are caused by transformation errors (insertion, deletion, replacement, and inversion; Li, Duan, & Zhai, 2012; Pollock & Zamora, 1984). Spelling-correction approaches must address these common transformation errors but many cannot without training data. For example, the USHMM has a document collection comprising 13 languages. The collection is too large for the low volume of queries to be used for training. Worse, should a supervised approach be deployed, the model might overfit to frequently queried languages, biasing against the results for minority languages. Although there are many kinds of spelling-correction algorithms, three prominent types are:
منابع مشابه
A Fast and Accurate Method for Approximate String Search
This paper proposes a new method for approximate string search, specifically candidate generation in spelling error correction, which is a task as follows. Given a misspelled word, the system finds words in a dictionary, which are most “similar” to the misspelled word. The paper proposes a probabilistic approach to the task, which is both accurate and efficient. The approach includes the use of...
متن کاملThe Effect of Specialized Multimedia Collections on Web Searching
Multimedia Web searching is a significant information activity for many people. Major Web search engines are critical resources in people’s efforts to locate relevant online multimedia information. It is therefore important that we understand how searchers are utilizing these Web information systems in their quest to retrieve multimedia information to design effective Web systems in support of ...
متن کامل字形相似別字之自動校正方法 (Automatic Correction for Graphemic Chinese Misspelled Words) [In Chinese]
No matter that learning Chinese as a first or second language, a quite important issue, misspelled words, needs to be addressed. Many studies proposed that there was a suggestion of correcting misspelled words for students who are still schooling as well as a suggestion of teaching and learning strategies of Chinese characters for teachers. Although in schooling, it does to prevent students who...
متن کاملMethods and Procedures of Sampling, Preservation and Identification for Fish Taxonomy Studies
Taxonomyhas two important roles: to name organisms and to classify them. Classifications are useful because they contain information about relationships.All species in the same genus should share many behavioral, biochemical, ecological and biological properties because they are closely related evolutionarily. The effect of pollution on a species at one location should be similar to the effect ...
متن کاملApplying Inference Networks to Multiple Collection Searching
The paper describes how to use inference networks to solve two problems in searching multiple collections: collection selection and result merging. The eeectiveness of the approaches is demonstrated with the INQUERY system and 3 gigabyte TREC collections.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 66 شماره
صفحات -
تاریخ انتشار 2015