web documents

Improving Classification of Multi-Lingual Web Documents using Domain Ontologies

2005

Marina Litvak Mark Last Slava Kisilevich

In this paper, we deal with the problem of analyzing and classifying web documents to several major categories/classes in a given domain using domain ontology. We present the ontology-based web content mining methodology that contains such main stages as collecting a training set of labeled documents from a given domain, building a classification model above this domain given the domain ontolog...

متن کامل

Adaptive Distribution Support for Co-authored Documents on the Web

2005

Sonia Mendoza Dominique Decouchant Alberto L. Morán Ana María Martínez Enríquez Jesús Favela

In order to facilitate and improve collaboration among co-authors, working in the Web environment, documents must be made seamlessly available to them. Web documents may contain multimedia resources, whose management raises important issues due to the constraints and limits imposed by Web technology. This paper proposes an adaptive support for distributing shared Web documents and multimedia re...

متن کامل

Ontology Based Pivoted normalization using Vector Based Approach for information Retrieval

Journal: :CoRR 2013

Vishal Jain Mayank Singh

Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad Associate Professor, Computer Science and Engineering Department, Lingaya’s University, Faridabad [email protected], [email protected] ABSTRACT An ample amount of documents present on web puts the users in state of dilemma. Users get confused about relevance of documents. Relevance means ...

متن کامل

Translating Documents into Semantic Documents using Semantic Web and Web2.0

2006

Hak Lae Kim Hong-Gee Kim Jaehwa Choi Stefan Decker

Managing metadata of documents is a difficult and slippery for desktop users. A wide variety of technologies have been applied for supporting requirements of metadata management, ranging from the acquisition, creation, maintenance, retrieval, reuse, and publishing of metadata. We introduce essential concepts of a semantic document and implement the necessary functionality of metadata managing p...

متن کامل

Detecting Vital Documents in Massive Data Streams

Journal: :OJWT 2015

Shun Kawahara Kazuhiro Seki Kuniaki Uehara

Existing knowledge bases, including Wikipedia, are typically written and maintained by a group of voluntary editors. Meanwhile, numerous web documents are being published partly due to the popularization of online news and social media. Some of the web documents, called “vital documents”, contain novel information that should be taken into account in updating articles of the knowledge bases. Ho...

متن کامل

Spatio-Temporal Web Sensors Using Web Queries vs. Documents

2013

Shun Hattori

In the Web world, one user creates and uploads a Web document about a phenomenon or event in the physical world, while another user retrieves and consumes Web documents by submitting a Web query. There have been many researches to mine Web documents in the exploding Web world, especially User Generated Content such as weblogs and microblogs, for knowledge about various phenomena and events in t...

متن کامل

Selective web information retrieval

2006

Vasileios Plachouras

One of the main challenges in Web information retrieval is the number of different retrieval approaches that can be used for ranking Web documents. In addition to the textual content of Web documents, evidence from the structure of Web documents, or the analysis of the hyperlink structure of the Web, can be used to enhance the retrieval effectiveness. However, not all the queries benefit equall...

متن کامل

Sampling, information extraction and summarisation of Hidden Web databases

Journal: :Data Knowl. Eng. 2006

Yih-Ling Hedley Muhammad Younas Anne E. James Mark Sanderson

Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users’ queries. The majority of these documents are generated through Web page templates, which contain information that is often irrelevant to queries. In this paper, we present a system designed to detect and extract query-related information from documents sampled from database...

متن کامل

Relation based Measuring of Semantic Similarity for Web Documents

2015

Poonam Chahal Manjeet Singh Suresh Kumar Hongzhe Liu

The World Wide Web (WWW) is the information resource centre in which information exists in the structure of web pages which are interlinked with each other. From the huge amount of information present on WWW it has been found difficult to extract the relevant information for the query given by the user. The reason for this is that the information exists on web is in natural language. The layere...

متن کامل

Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives

2015

Tarcisio Souza Elena Demidova Thomas Risse Helge Holzmann Gerhard Gossen Julian Szymanski

Long-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to access the documents within Web archives is pro...

متن کامل