wikipedia mining

Modeling Heterogeneous Networks for Information Ranking, Enrichment and Resolution on Microblogs

2015

Hongzhao Huang

Microblogging, a new type of online information sharing platform through short messages of up to 140 characters, has grown up quickly and received increasing attentions in recent years. A microblogging platform (e.g., Twitter) enables both individuals and organizations to disseminate information, from current affairs to breaking news in a timely fashion, which makes it a valuable knowledge sour...

متن کامل

Wikipedia and Medicine: Quantifying Readership, Editors, and the Significance of Natural Language

2015

James M Heilman Andrew G West

BACKGROUND Wikipedia is a collaboratively edited encyclopedia. One of the most popular websites on the Internet, it is known to be a frequently used source of health care information by both professionals and the lay public. OBJECTIVE This paper quantifies the production and consumption of Wikipedia's medical content along 4 dimensions. First, we measured the amount of medical content in both...

متن کامل

Directions for Exploiting Asymmetries in Multilingual Wikipedia

2009

Elena Filatova

Multilingual Wikipedia has been used extensively for a variety Natural Language Processing (NLP) tasks. Many Wikipedia entries (people, locations, events, etc.) have descriptions in several languages. These descriptions, however, are not identical. On the contrary, descriptions in different languages created for the same Wikipedia entry can vary greatly in terms of description length and inform...

متن کامل

Mining Large-Scale Knowledge Sources for Case Adaptation Knowledge

2007

David B. Leake Jay H. Powell

Making case adaptation practical is a longstanding challenge for casebased reasoning. One of the impediments to widespread use of automated case adaptation is the adaptation knowledge bottleneck: the adaptation process may require extensive domain knowledge, which may be difficult or expensive for system developers to provide. This paper advances a new approach to addressing this problem, propo...

متن کامل

Analysis of Textual Data based on multiple 2-class Classification Models

2008

Shigeaki Sakurai Ryohei Orihara

This paper proposes a new method for analyzing textual data. The method deals with items of textual data, where each item is described based on various viewpoints. The method acquires 2class classification models of the viewpoints by applying an inductive learning method to items with multiple viewpoints. The method infers whether the viewpoints are assigned to the new items or not by using the...

متن کامل

Extending DBpedia with Wikipedia List Pages

2013

Heiko Paulheim Simone Paolo Ponzetto

Thanks to its wide coverage and general-purpose ontology, DBpedia is a prominent dataset in the Linked Open Data cloud. DBpedia’s content is harvested from Wikipedia’s infoboxes, based on manually created mappings. In this paper, we explore the use of a promising source of knowledge for extending DBpedia, i.e., Wikipedia’s list pages. We discuss how a combination of frequent pattern mining and ...

متن کامل

Adaptive Concept Resolution for document representation and its applications in text mining

Journal: :Knowl.-Based Syst. 2015

Lidong Bing Shan Jiang Wai Lam Yan Zhang Shoaib Jameel

It is well-known that synonymous and polysemous terms often bring in some noise when we calculate the similarity between documents. Existing ontologybased document representation methods are static so that the selected semantic concepts for representing a document have a fixed resolution. Therefore, they are not adaptable to the characteristics of document collection and the text mining problem...

متن کامل

A language-independent method for the extraction of RDF verbalization templates

2014

Basil Ell Andreas Harth

With the rise of the Semantic Web more and more data become available encoded using the Semantic Web standard RDF. RDF is faced towards machines: designed to be easily processable by machines it is difficult to be understood by casual users. Transforming RDF data into human-comprehensible text would facilitate non-experts to assess this information. In this paper we present a languageindependen...

متن کامل

Query classification using Wikipedia

Journal: :IJIIDS 2011

Richard Khoury

Identifying the intended topic that underlies a user’s query can benefit a large range of applications, from search engines to question-answering systems. However, query classification remains a difficult challenge due to the variety of queries a user can ask, the wide range of topics users can ask about, and the limited amount of information that can be mined from the query. In this paper, we ...

متن کامل

Combining Dictionary- and Corpus-Based Concept Extraction

2016

Joan Codina-Filbà Leo Wanner

Concept extraction is an increasingly popular topic in deep text analysis. Concepts are individual content elements. Their extraction offers thus an overview of the content of the material from which they were extracted. In the case of domain-specific material, concept extraction boils down to term identification. The most straightforward strategy for term identification is a look up in existin...

متن کامل