A Case-Based Semi-automatic Transformation from HTML Documents to XML Ones — Using the Similarity between HTML Documents Constituting a Series —

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword based Automatic Summarization of HTML Documents

Automatic summarization [5] can be defined as the procedure to create a short version of a text by a computer program. Its product still contains the most important points of the existing text. Multi-document summarization [6] can be defined as an automatic procedure which extracts information from multiple texts that is written about the same topic. Resulting summary report allows individual u...

متن کامل

Transformation-Based Learning for Automatic Translation from HTML to XML

Format tags implicitly represent content information in the same ambiguous, context dependent manner that words represent semantics in natural language. Translation from format to content markup shares many characteristics with tagging and parsing tasks in computational linguistics. The transformation-based learning (TBL) paradigm has recently been applied to numerous computational linguistics ...

متن کامل

Content Extraction from HTML Documents

In recent times, the way people access information from the web has undergone a transformation. The demand for information to be accessible from anywhere, anytime, has resulted in the introduction of Personal Digital Assistants (PDAs) and cellular phones that are able to browse the web and can be used to find information using wireless connections. However, the small display form factor of thes...

متن کامل

Template-Based Information Mining from HTML Documents

Tools for mining information from data can create added value for the Iqternet. As the majority of electronic documents available over the network are in unstructured textual form, extracting useful information from a document usually involves information retrieval techniques or manual processing. This paper presents a novel approach to mining information from HTML documents using tree-structur...

متن کامل

Web Ecology: Recycling HTML Pages as XML Documents Using W4F

In this paper we present the World-Wide WebWrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to extract information from HTML pages in a structured way, a mapping to export it as XML documents and some visual tools to assist the user during wrapper creation. Moreover, the entire description of wrappers is fully d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Japanese Society for Artificial Intelligence

سال: 2001

ISSN: 1346-0714,1346-8030

DOI: 10.1527/tjsai.16.408