نتایج جستجو برای: wikipedia mining
تعداد نتایج: 92181 فیلتر نتایج به سال:
Biomedical synonyms are important resources for Natural Language Processing in Biomedical domain. Existing synonym resources (e.g., the UMLS) are not complete. Manual efforts for expanding and enriching these resources are prohibitively expensive. We therefore develop and evaluate approaches for automated synonym extraction from Wikipedia. Using the inter-wiki links, we extracted the candidate ...
Previous attempts in extracting parallel data from Wikipedia were restricted by the monotonicity constraint of the alignment algorithm used for matching possible candidates. This paper proposes a method for exploiting Wikipedia articles without worrying about the position of the sentences in the text. The algorithm ranks the candidate sentence pairs by means of a customized metric, which combin...
The automatic generation of entity profiles from unstructured text, such as Knowledge Base Population, if applied in a multi-lingual setting, generates the need to align such profiles from multiple languages in an unsupervised manner. This paper describes an unsupervised and language-independent approach to mine name translation pairs from entity profiles, using Wikipedia Infoboxes as a stand-i...
ABSTRACT Wikipedia is the biggest ever created encyclopedia and the fifth most visited website in the world. Tens of millions of people surf it every day, seeking answers to various questions. Collective user activity on the pages leaves publicly available footprints of human behavior, making Wikipedia a great source of the data for largescale analysis of collective dynamical patterns. The dyna...
We present a corpus of time-aligned spoken data of Wikipedia articles as well as the pipeline that allows to generate such corpora for many languages. There are initiatives to create and sustain spoken Wikipedia versions in many languages and hence the data is freely available, grows over time, and can be used for automatic corpus creation. Our pipeline automatically downloads and aligns this d...
Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency Inversed Backward link Frequency) and the extension method ...
We present in this paper a method to extract geospatial entities and relationships from the unstructured text of the English language Wikipedia. Using a novel approach that applies SVMs trained from purely structural features of text strings, we extract candidate geospatial entities and relationships. Using a combination of further techniques, along with an external gazetteer, the candidate ent...
Mined Semantic Analysis (MSA) is a novel distributional semantics approach which employs data mining techniques. MSA embraces knowledge-driven analysis of natural languages. It uncovers implicit relations between concepts by mining for their associations in target encyclopedic corpora. MSA exploits not only target corpus content but also its knowledge graph (e.g., "See also" link graph of Wikip...
The report hereby is to represent the principle, the searching process and experiment results. We report our systems and experiments in the intent task of NTCIR 9. The research aims at evaluating the effectiveness of the proposed methods on query intent mining and results diversification in terms of web search. In the subtopic mining subtask, we combine the extracted candidates from search logs...
We describe an approach for mining parallel sentences in a collection of documents in two languages. While several approaches have been proposed for doing so, our proposal differs in several respects. First, we use a document level classifier in order to focus on potentially fruitful document pairs, an understudied approach. We show that mining less, but more parallel documents can lead to bett...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید