Effective Metadata Extraction from Irregularly Structured Web Content

نویسندگان

Baoyao Zhou

Wei Liu

Yu Yang

Weichun Wang

Ming Zhang

چکیده

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic metadata mining from multilingual enterprise content

Personalization is increasingly vital especially for enterprises to be able to reach their customers. The key challenge in supporting personalization is the need for rich metadata, such as metadata about structural relationships, subject/concept relations between documents and cognitive metadata about documents (e.g. difficulty of a document). Manual annotation of large knowledge bases with suc...

متن کامل

OntoExtractor: A Fuzzy-Based Approach to Content and Structure-Based Metadata Extraction

This paper describes OntoExtractor a tool for extracting metadata from heterogeneous sources of information, producing a “quick-and-dirty” hierarchy of knowledge. This tool is specifically tailored for a quick classification of semi-structured data. By this feature, OntoExtractor is convenient for dealing with a web-based data source.

متن کامل

Users as Crawlers: Exploiting Metadata Embedded in Web Pages for User Profiling

In the last years we have witnessed the rapid growth of a broad range of Semantic Web technologies that have been successfully employed to enhance information retrieval, data mining and user experience in real-world applications. Several authors have proposed approaches towards ontological user modelling in order to address different issues of personalized systems, such as the cold start proble...

متن کامل

DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons

The Wikimedia Commons is an online repository of over twenty-five million freely usable audio, video and still image files, including scanned books, historically significant photographs, animal recordings, illustrative figures and maps. Being volunteer-contributed, these media files have different amounts of descriptive metadata with varying degrees of accuracy. The DBpedia Information Extracti...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Effective Metadata Extraction from Irregularly Structured Web Content

نویسندگان

چکیده

منابع مشابه

Automatic metadata mining from multilingual enterprise content

OntoExtractor: A Fuzzy-Based Approach to Content and Structure-Based Metadata Extraction

Users as Crawlers: Exploiting Metadata Embedded in Web Pages for User Profiling

DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

عنوان ژورنال:

اشتراک گذاری