A Machine Learning Approach to Building Domain-Specific Search Engines
نویسندگان
چکیده
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also difficult and timeconsuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describe new research in reinforcement learning, text classification and information extraction that enables efficient spidering, populates topic hierarchies, and identifies informative text segments. Using these techniques, we have built a demonstration system: a search engine for computer science research papers available at www.cora.justresearch.com.
منابع مشابه
Keyword Spices: A New Method for Building Domain-Specific Web Search Engines
This paper presents a new method for building domain-specific web search engines. Previous methods eliminate irrelevant documents from the pages accessed using heuristics based on human knowledge about the domain in question. Accordingly, they are hard to build and can not be applied to other domains. The keyword spice method, in contrast, improves search performance by adding domain-specific k...
متن کاملA Machine Learning Approach to Building Domain-Speci c Search Engines
Domain-speci c search engines are becoming increasingly popular because they o er increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also di cult and timeconsuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-speci c search engines. We describe new...
متن کاملBuilding Domain-Specific Search Engines with Machine Learning Techniques
Domain-specific search engines are growing in popularity because they offer increased accuracy and extra functionality not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by age-group, size, location and cost over .summer camps. Unfortunately these domain-specific search engines are difficult and timeconsuming to maintain. This paper pr...
متن کاملBuilding Domain-Speci c Search Engines with Machine Learning Techniques
Domain-speci c search engines are growing in popularity because they o er increased accuracy and extra functionality not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by age-group, size, location and cost over summer camps. Unfortunately these domain-speci c search engines are di cult and timeconsuming to maintain. This paper proposes...
متن کاملLearning to rank documents with support vector machines via active learning
Navigating through the debris of the information explosion requires powerful, flexible search tools. These tools must be both useful and useable; that is, they must do their jobs effectively without placing too many burdens on the user. While general interest search engines, such as Google, have addressed this latter challenge well, more topic-specific search engines, such as PubMed, have not. ...
متن کامل