A Method for Creating a High Quality Collection of Researchers' Homepages from the Web
نویسندگان
چکیده
This paper proposes a method for creating a high quality collection of researchers’ homepages. The proposed method consists of three phases: rough filtering of the possible web pages, accurate evaluation of the web pages and precise selection of the correct homepages. For the rough filtering, the authors first define content-based keyword-lists, then generate filtering rules and relax the rules with heuristics. For the evaluation and the selection, they use a support vector machine with the feature sets derived from the content words of the web pages and propose an approach utilizing web-specific properties for improving the measures. Keyword Web Mining, Web Information Retrieval, Machine Learning, Web Page Classification
منابع مشابه
Study on Building a High-Quality Homepage Collection from the Web Considering Page Group Structures
This disseration is devoted to investigate the method for building a high-quality homepage collection from the web efficiently by considering the page group structures. We mainly investigate in researchers’ homepages and homepages of other categories partly. A web page collection with a guaranteed high quality (i.e., high recall and high precision) is required for implementing high quality web-...
متن کاملدروازه اطلاعات علمی،پژوهشی، و فناورانه ایران: خدمتی نوین برای پژوهشگران ایرانی
Information Subject Gateways are providing access to the necessary quality controlled databases among the vast resources for users of the web and saving them from the confusion and perplexity among the sources on the web. The main objective of this research is creating Iranian Gateway for Scientific, Research, and Technological Information as a valuable source for use by academics and researche...
متن کاملResearcher affiliation extraction from homepages
Our paper discusses the potential use of Web Content Mining techniques for gathering scientific social information from the homepages of researchers. We will introduce our system which seeks [affiliation, position, start year, end year] information tuples on these homepages along with preliminary experimental results. We believe that the lessons learnt from these experiments may be useful for f...
متن کاملطبقهبندی کاربردی کارکردهای عوامل نرمافزاری هوشمند و تطبیق آنها با ویژگیهای وبسایتهای کتابخانههای دیجیتال
Purpose: Web services are presently considered as technologies with highest number of applications for the purpose of providing the automatic, high-quality, and fast information interactions. The aim of this paper is therefore to provide a comprehensive framework for a collection of significant services offered by Farsi websites in libraries to be used in future designs. It also aims to classif...
متن کاملThe Status of Medical Education Studies by Iranian Researchers Among the Educational Publications Indexed in Web of Science
Introduction: Medical education is a broad field of study that, as a subset of educational research, examines inputs, processes and outputs associated with teaching and learning medical sciences. The purpose of this study was to investigate the status of medical education studies by Iranian researchers among the educational publications indexed in Web of Science from 1990 to 2015. Methods: Th...
متن کامل