A Method for Creating a High Quality Collection of Researchers' Homepages from the Web

نویسندگان

  • Yuxin Wang
  • Keizo Oyama
چکیده

This paper proposes a method for creating a high quality collection of researchers’ homepages. The proposed method consists of three phases: rough filtering of the possible web pages, accurate evaluation of the web pages and precise selection of the correct homepages. For the rough filtering, the authors first define content-based keyword-lists, then generate filtering rules and relax the rules with heuristics. For the evaluation and the selection, they use a support vector machine with the feature sets derived from the content words of the web pages and propose an approach utilizing web-specific properties for improving the measures. Keyword Web Mining, Web Information Retrieval, Machine Learning, Web Page Classification

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Building a High-Quality Homepage Collection from the Web Considering Page Group Structures

This disseration is devoted to investigate the method for building a high-quality homepage collection from the web efficiently by considering the page group structures. We mainly investigate in researchers’ homepages and homepages of other categories partly. A web page collection with a guaranteed high quality (i.e., high recall and high precision) is required for implementing high quality web-...

متن کامل

دروازه اطلاعات علمی،‌پژوهشی، و فناورانه ایران: خدمتی نوین برای پژوهشگران ایرانی

Information Subject Gateways are providing access to the necessary quality controlled databases among the vast resources for users of the web and saving them from the confusion and perplexity among the sources on the web. The main objective of this research is creating Iranian Gateway for Scientific, Research, and Technological Information as a valuable source for use by academics and researche...

متن کامل

Researcher affiliation extraction from homepages

Our paper discusses the potential use of Web Content Mining techniques for gathering scientific social information from the homepages of researchers. We will introduce our system which seeks [affiliation, position, start year, end year] information tuples on these homepages along with preliminary experimental results. We believe that the lessons learnt from these experiments may be useful for f...

متن کامل

طبقه‎بندی کاربردی کارکردهای عوامل نرم‎افزاری هوشمند و تطبیق آنها با ویژگی‎های وب‎سایت‎های کتابخانه‎های دیجیتال

Purpose: Web services are presently considered as technologies with highest number of applications for the purpose of providing the automatic, high-quality, and fast information interactions. The aim of this paper is therefore to provide a comprehensive framework for a collection of significant services offered by Farsi websites in libraries to be used in future designs. It also aims to classif...

متن کامل

The Status of Medical Education Studies by Iranian Researchers Among the Educational Publications Indexed in Web of Science

  Introduction: Medical education is a broad field of study that, as a subset of educational research, examines inputs, processes and outputs associated with teaching and learning medical sciences. The purpose of this study was to investigate the status of medical education studies by Iranian researchers among the educational publications indexed in Web of Science from 1990 to 2015. Methods: Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005