Effective Metadata Extraction from Irregularly Structured Web Content
نویسندگان
چکیده
© Effective Metadata Extraction from Irregularly Structured Web Content Baoyao Zhou, Wei Liu, Yu Yang, Weichun Wang, Ming Zhang
منابع مشابه
Automatic metadata mining from multilingual enterprise content
Personalization is increasingly vital especially for enterprises to be able to reach their customers. The key challenge in supporting personalization is the need for rich metadata, such as metadata about structural relationships, subject/concept relations between documents and cognitive metadata about documents (e.g. difficulty of a document). Manual annotation of large knowledge bases with suc...
متن کاملOntoExtractor: A Fuzzy-Based Approach to Content and Structure-Based Metadata Extraction
This paper describes OntoExtractor a tool for extracting metadata from heterogeneous sources of information, producing a “quick-and-dirty” hierarchy of knowledge. This tool is specifically tailored for a quick classification of semi-structured data. By this feature, OntoExtractor is convenient for dealing with a web-based data source.
متن کاملUsers as Crawlers: Exploiting Metadata Embedded in Web Pages for User Profiling
In the last years we have witnessed the rapid growth of a broad range of Semantic Web technologies that have been successfully employed to enhance information retrieval, data mining and user experience in real-world applications. Several authors have proposed approaches towards ontological user modelling in order to address different issues of personalized systems, such as the cold start proble...
متن کاملDBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons
The Wikimedia Commons is an online repository of over twenty-five million freely usable audio, video and still image files, including scanned books, historically significant photographs, animal recordings, illustrative figures and maps. Being volunteer-contributed, these media files have different amounts of descriptive metadata with varying degrees of accuracy. The DBpedia Information Extracti...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کامل