web data record extraction

Logic-based web information extraction

Journal: :ACM SIGMOD Record 2004

SHELDON: Semantic Holistic FramEwork for LinkeD ONtology Data

2014

Diego Reforgiato Recupero Andrea Giovanni Nuzzolese Sergio Consoli Aldo Gangemi Valentina Presutti

SHELDON is the first true hybridization of NLP machine reading and Semantic Web. It is a framework that builds upon a machine reader for extracting RDF graphs from text so that the output is compliant to Semantic Web and Linked Data patterns. It extends the current human-readable web by using Semantic Web practices and technologies in a machine-processable form. Given a sentence in any language...

متن کامل

Research on Model of Network Information Extraction Based on Improved Topic-focused Web Crawler Key Technology

2016

Mo Chen Xiao-Ping Yang

Original scientific paper This research has caught researchers' wide attention for extracting network information exactly with the arrival of the big data era characterized by semistructured or unstructured text. This paper proposes a model of network information extraction based on improved topic-focused web crawler key technology taking Web news as object of extraction. The authors elaborate ...

متن کامل

Identifying Web Tables: Supporting a Neglected Type of Content on the Web

2015

Michael Galkin Dmitry Mouromtsev Sören Auer

The abundance of the data in the Internet facilitates the improvement of extraction and processing tools. The trend in the open data publishing encourages the adoption of structured formats like CSV and RDF. However, there is still a plethora of unstructured data on the Web which we assume contain semantics. For this reason, we propose an approach to derive semantics from web tables which are s...

متن کامل

Web Entities Extraction Based on Semi-Structured Semantic Database

Journal: :JNW 2013

Fang Dong Mengchi Liu Kun Ma

Web is the biggest source of information and contains many entities and relationships between them, extracting these data from Massive Web pages and Integrating to a Semi-Structured Data with rich semantics will be more conducive to the management and use of these web data. On this premise, a comprehensive method is proposed to perform extraction the entities and relationships from the webpages...

متن کامل

An Effective and Efficient Web News Extraction Technique for an Operational NewsIR System

2007

Javier Parapar Álvaro Barreiro

Web information extraction, in particular web news extraction is an open research problem and it is a key point in NewsIR systems. Current techniques fail in the quality of the results, the high computational cost or the necessity of human intervention, all of them critical issues in a real system. We present an automated approach to news recognition and extraction based on a set of heuristics ...

متن کامل

Web scraping technologies in an API world

Journal: :Briefings in bioinformatics 2014

Daniel Glez-Peña Anália Lourenço Hugo López-Fernández Miguel Reboiro-Jato Florentino Fernández Riverola

Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents,...

متن کامل

Infrastructure for quality transformation: measurement and reporting in veterans administration intensive care units.

Journal: :BMJ quality & safety 2011

Marta L Render Ron W Freyberg Rachael Hasselbeck Timothy P Hofer Anne E Sales James Deddens Odette Levesque Peter L Almenoff

BACKGROUND Veterans Health Administration (VA) intensive care units (ICUs) develop an infrastructure for quality improvement using information technology and recruiting leadership. METHODS Setting Participation by the 183 ICUs in the quality improvement program is required. Infrastructure includes measurement (electronic data extraction, analysis), quarterly web-based reporting and implementati...

متن کامل

What can I do there? Towards the automatic discovery of place-related services and activities

Journal: :International Journal of Geographical Information Science 2012

Ahmed N. Alazzawi Alia I. Abdelmoty Christopher B. Jones

The current Web is rich in geographically referenced data. Mining, retrieving, and sharing this data raises the need for rich geographical place name resources that record spatial and thematic elements of geographical places. Here, possible services offered at a place and human activities that can be practised there are considered useful concepts to discover and encode in place name resources. ...

متن کامل

Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System

Journal: :IJDWM 2012

Christie I. Ezeife Titas Mutsuddy

The process of extracting comparative heterogeneous web content data which are derived and historical from related web pages is still at its infancy and not developed. Discovering potentially useful and previously unknown information or knowledge from web contents such as “list all articles on ‘Sequential Pattern Mining’ written between 2007 and 2011 including title, authors, volume, abstract, ...

متن کامل