نتایج جستجو برای: web wrapper generation

تعداد نتایج: 567401  

2000
Zoé Lacroix

Nowadays scientiic data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data analyzing and visualization tools. Building a digital library for scientiic data requires accessing and manipulating data extracted from at les or documents retrieved from the...

2008
Saurabh Mittal Bernard P. Zeigler Jose L. Risco Martin Jesús M. de la Cruz

This research work provides a methodology to use Discrete Event Systems Specification (DEVS) to design and evaluate the performance of web services within a Service Oriented Architecture (SOA). We will show how a Web Service Description Language (WSDL) document can be mapped to a DEVS model in an automated manner through a DEVS abstract service wrapper. This work will describe the underlying ar...

Journal: :PVLDB 2011
Aditya G. Parameswaran Nilesh N. Dalvi Hector Garcia-Molina Rajeev Rastogi

In this paper, we consider the problem of constructing wrappers for web information extraction that are robust to changes in websites. We consider two models to study robustness formally: the adversarial model, where we look at the worst-case robustness of wrappers, and probabilistic model, where we look at the expected robustness of wrappers, as web-pages evolve. Under both models, we present ...

1999
Zoé Lacroix

Web datasources usually allow a restricted access (through CGI calls) and their output consists of generated HTML documents. Unfortunately , in many cases the data they provide happen to be available only on the Web. In this paper, we describe a system based on a Web wrapper combined with an object multidatabase system that enables the user to query Web datasources as well as other datasources ...

2001
Heekyoung Seo Jaeyoung Yang Joongmin Choi

Previous researches on automatic information extraction experienced difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources. As a result, many real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents a meth...

1999
William W. Cohen

We present general-purpose methods for recognizing certain types of structure in HTML documents. The methods are implemented using WHIRL, a "soft" logic that incorporates a notion of textual similarity developed in the information retrieval community. In an experimental evaluation on 82 Web pages, the structure ranked first by our method is "meaningful"--i.e., a structure that was used in a han...

2005
Christian Schindler Pranjal Arya Andreas Rath Wolfgang Slany

The htmlButler project aims at enhancing the usability of visual wrapper technology while preserving versatility. htmlButler will allow, for an untrained user who has only the most basic web knowledge, to visually specify simple but useful wrappers and, for a more tech-savvy user, to visually or otherwise specify more complex wrappers. htmlButler was started 2005/2 and is based on visual wrappi...

1999
Arnaud Sahuguet Fabien Azavant

The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human and make information accessible to applications, in order to offer automation, inter-operation and Web-awareness among services. To do so, information from Web sources needs to be accessible in a structured way. XML and it...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید