A Case-Based Semi-automatic Transformation from HTML Documents to XML Ones — Using the Similarity between HTML Documents Constituting a Series —
نویسندگان
چکیده
منابع مشابه
Keyword based Automatic Summarization of HTML Documents
Automatic summarization [5] can be defined as the procedure to create a short version of a text by a computer program. Its product still contains the most important points of the existing text. Multi-document summarization [6] can be defined as an automatic procedure which extracts information from multiple texts that is written about the same topic. Resulting summary report allows individual u...
متن کاملTransformation-Based Learning for Automatic Translation from HTML to XML
Format tags implicitly represent content information in the same ambiguous, context dependent manner that words represent semantics in natural language. Translation from format to content markup shares many characteristics with tagging and parsing tasks in computational linguistics. The transformation-based learning (TBL) paradigm has recently been applied to numerous computational linguistics ...
متن کاملContent Extraction from HTML Documents
In recent times, the way people access information from the web has undergone a transformation. The demand for information to be accessible from anywhere, anytime, has resulted in the introduction of Personal Digital Assistants (PDAs) and cellular phones that are able to browse the web and can be used to find information using wireless connections. However, the small display form factor of thes...
متن کاملTemplate-Based Information Mining from HTML Documents
Tools for mining information from data can create added value for the Iqternet. As the majority of electronic documents available over the network are in unstructured textual form, extracting useful information from a document usually involves information retrieval techniques or manual processing. This paper presents a novel approach to mining information from HTML documents using tree-structur...
متن کاملWeb Ecology: Recycling HTML Pages as XML Documents Using W4F
In this paper we present the World-Wide WebWrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to extract information from HTML pages in a structured way, a mapping to export it as XML documents and some visual tools to assist the user during wrapper creation. Moreover, the entire description of wrappers is fully d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Japanese Society for Artificial Intelligence
سال: 2001
ISSN: 1346-0714,1346-8030
DOI: 10.1527/tjsai.16.408