Wikipedia Mining for an Association Web Thesaurus Construction
نویسندگان
چکیده
Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency Inversed Backward link Frequency) and the extension method “forward / backward link weighting (FB weighting)” in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.
منابع مشابه
Extracting Structured Knowledge for Semantic Web by Mining Wikipedia
Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an invaluable corpus has been confirmed. Wikipedia’s impressive characteristics are not limited to the scale, but also include the dense link structure...
متن کاملA Search Engine for Browsing the Wikipedia Thesaurus
Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In our previous work, we proposed link structure mining algorithms to extract a huge scale and accurate association thesaurus from Wikipedi...
متن کاملAssociation Thesaurus Construction for Interactive Query Expansion Based on Association Rule Mining
This paper presents an interactive query expansion method with association thesaurus, which is mined from the ‘selected web pages’ of users in the query logs. The ‘selected web pages’ of users are transferred into ‘sets of query terms’ and then used for term correlation mining. Accordingly, various association thesauruses concerning different query terms are constructed from these term correlat...
متن کاملMining Enterprise Websites for Association Thesaurus Construction
Enterprise websites are useful resources for obtaining information about products and services of companies. Typically on these websites, a product is associated to a Web page, and related products are connected by hyperlinks. As a result, the connectivity graph of an enterprise website exposes the company’s products (nodes) and how they are associated (links). This paper presents a novel appro...
متن کاملAutomatic Topic Ontology Construction Using Semantic Relations from WordNet and Wikipedia
Due to the explosive growth of web technology, a huge amount of information is available as web resources over the Internet. Therefore, in order to access the relevant content from the web resources effectively, considerable attention is paid on the semantic web for efficient knowledge sharing and interoperability. Topic ontology is a hierarchy of a set of topics that are interconnected using s...
متن کامل