An Embedded Tree Data Model for Web Content Adaptation
نویسنده
چکیده
This thesis presents a novel data model named Content Container Tree (CCT) to enable dynamic web content adaptation for heterogeneous terminal devices. The rationale for the CCT data model is that a web page can be visually divided into several segments and each segment can be iteratively divided into several smaller sub-segments until certain granularity level has been reached; the hierarchy of the web page segments can be expressed with a tree structure; thus, by traversing this segmentation tree, the user can find and locate a particular segment of interest which is at the dimension comparable to the screen size of the terminal device. The CCT data model organizes the semantics of the hierarchical content segments with a tree structure and embeds this semantic tree into the content DOM tree to enable the user to navigate through the content segments at different levels and retrieve the target content segment according to their semantics. Based on the CCT data model, the high level structural adaptation can be explicitly separated from the low level presentational adaptation, resulting in a highly modular and extensible framework for web content adaptation which is proposed in this thesis too. The CCT web content adaptation approach is distinguished from other approaches in the following aspects: first, the CCT approach is data oriented rather than process oriented; second, the CCT approach supports multilevel granularity of the contents; finally, CCT approach is dynamic; result pages can be generated on the fly by request. As an empirical evaluation, a content adaptation application is built according to the CCT adaptation framework to show the feasibility and effectiveness of our approach.
منابع مشابه
Data Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملAnalyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملVIPS: A VIsion based Page Segmentation Algorithm
A new web content structure analysis based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure ...
متن کاملDevelopment of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism
Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...
متن کاملCSI in the Web 2.0 Age: Data Collection, Selection, and Investigation for Knowledge Discovery
The growing popularity of various Web 2.0 media has created massive amounts of user-generated content such as online reviews, blog articles, shared videos, forums threads, and wiki pages. Such content provides insights into web users’ preferences and opinions, online communities, knowledge generation, etc., and presents opportunities for many knowledge discovery problems. However, several chall...
متن کامل