Automating XML mark-up using a two stage machine learning technique
نویسندگان
چکیده
We introduce a novel two-stage automatic XML mark-up system, which combines the WEBSOM approach to document categorisation in conjunction with the C5 inductive learning algorithm. The WEBSOM method clusters the XML marked-up documents such that semantically similar documents lie close together on a Self-Organising Map (SOM). The C5 algorithm automatically learns and applies mark-up rules derived from the nearest SOM neighbours of an unmarked document. The system learns from mark-up errors to improve accuracy. The automatically marked-up documents produced by the system are also categorized on the SelfOrganizing Map, to further refine SOMs document coverage.
منابع مشابه
Automating XML Markup using Machine Learning Techniques
In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algori...
متن کاملFault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملTraitements automatiques pour la migration de documents numériques vers XML
More and more companies are migrating their legacy document management systems toward XML format, the industrial standard for data exchange. In order to reduce the migration cost we propose an approach aimed at automating the conversion of layout-oriented documents to semantic-oriented annotations. The conversion module uses supervised machine learning techniques to learn a conversion model for...
متن کاملArgumentation Mark-Up: A Proposal
This is a proposal for a an XML mark-up of argumentation. The annotation can be used to help the reader (e.g. by means of selective highlighting or diagramming), and for further processing (summarization, critique, use in information retrieval). The article proposes a set of markers derived from manual corpus annotation, exemplifies their use, describes a way to assign them using surface cues a...
متن کاملForecasting the Tehran Stock market by Machine Learning Methods using a New Loss Function
Stock market forecasting has attracted so many researchers and investors that many studies have been done in this field. These studies have led to the development of many predictive methods, the most widely used of which are machine learning-based methods. In machine learning-based methods, loss function has a key role in determining the model weights. In this study a new loss function is ...
متن کامل