Website Identification DEA Internship Report

نویسندگان

  • Pierre P. Senellart
  • Grégory Cobéna
چکیده

I present in this paper a method to discover the set of webpages contained in a logical website, based on the link structure of the Web graph. Such a method is useful to identify the boundaries of what to crawl, in the context of Web archiving. For this purpose, I combine the use of an online version of the preflow-push algorithm, an algorithm for the maximum flow problem in traffic networks, and of the Markov CLuster (MCL) algorithm. The latter is used on a crawled portion of the Web graph in order to build a seed of initial webpages, a seed which is extended by the former. Experiments on subsites of the INRIA Website, which give satisfactory results, are described.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English - Journalism - Communications-Publishing - Advertising

English Journalism Communications-Publishing Advertising Career/Job Resources BookJobs.com [1] -Bookjobs.com is an online job/internship board with publishing opportunities throughout the US Corporation For Public Broadcasting [2] -This is a nationwide job/internship website with postings for positions in the public media sector. Council for PR Firms [3] – Career Center – This website has natio...

متن کامل

Internship Report: Metaobject Protocols For Distributed Programming

This DEA internship report proposes a study and a classi cation of best known Metaobject Protocols (MOPs). Far from being totally exhaustive, it explains the nowadays motivation and use of MOPs by practical examples in many application areas. These examples naturally lead to distinguish di erent kinds of use of MOPs techniques and also, as one can expect, di erent kinds of implementation. These...

متن کامل

Trajectories of depressive symptoms in response to prolonged stress in medical interns.

OBJECTIVE The high degree of heterogeneity in the development of depression under stress is unaccounted for in traditional statistical modeling. We employ growth mixture modeling to identify classes of individuals at highest risk of depression under stress. METHOD Medical internship was used as a prospective stress model. Interns from US residency programs completed demographic, psychological...

متن کامل

Identifying E-commerce Website Design Inefficiencies: a Business Value-driven Approach Using Dea

Managers at e-commerce firms are in need of proven methods for website evaluation. So, one of the most pressing issues is whether the design of their online storefronts is effective, and if not, which areas require attention and improvements. However, current approaches (e.g., user testing, inspection, inquiry) are not well suited to the task at hand. This paper proposes a new business value-dr...

متن کامل

Program Disparities in Unmatched Internship Applicants

Predoctoral internship represents an important capstone in the training of clinical and counseling psychologists. However, in the past decade there has been growing concern over the number of applicants to internship who have not been matched to an internship site. We investigated the scope of the internship match problem by assessing program-level contributions to the number of unmatched inter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003