WeSeE-Match results for OEAI 2012

نویسنده

  • Heiko Paulheim
چکیده

WeSeE-Match is a simple, element-based ontology matching tool. Its basic technique is invoking a web search engine request for each concept and determining element similarity based on the similarity of the search results obtained. Multi-lingual ontologies are translated using a standard web based translation service. The results show that the approach, despite its simplicity, is competitive with the state of the art. 1 Presentation of the system 1.1 State, Purpose, and General Statement The idea of WeSeE-Match is to use information on the web for matching ontologies. When developing the algorithm, we were guided by the way a human would possibly solve a matching task. Consider the following example from the OAEI anatomy track1: one element in the reference alignment are the two classes with labels eyelid tarsus and tarsal plate, respectively. As a person not trained in anatomy, one might assume that they have something in common, but one could not tell without doubt. For a human, the most straight forward strategy in the internet age would be to search for both terms with a search engine, look at the results, and try to figure out whether the websites returned by both searches talk about the same thing. Implicitly, what a human does is identifying relevant sources of information on the web, and analyzing their contents for similarity with respect to the search term given. This naive algorithm is implemented in WeSeE-Match. 1.2 Specific Techniques Used The core idea of our approach is to use a web search engine for retrieving web documents that a relevant for concepts in the ontologies to match. For getting search terms from ontology concepts (i.e., classes and properties), we use the labels, comments, and URI fragments of those concepts as search terms. The search results of all concepts are then compared to each other. The more similar the search results are, the higher the concepts’ similarity score. To search for websites, we use the Microsoft Bing Search API2. We use URI fragments, labels, and comments of each concept as search strings, and perform some preprocessing, i.e., splitting camel case and underscore separated words into single words, 1 http://oaei.ontologymatching.org/2012/anatomy/ 2 http://www.bing.com/toolbox/bingdeveloper/ and omitting stop words. While the approach itself is independent of the actual search engine used (although the results might differ), we have chosen Bing to evaluate our approach because of the larger amount of queries that can be posed in the free version (compared to, e.g., Google). For every search result, all the titles and summaries of web pages provided by the search engine are put together into one describing document. This approach allows us to parse only the search engine’s answer, while avoiding the computational burden of retrieving and parsing all websites in the result sets. The answer provided by the Bing search engine contains titles and excerpts from the website (i.e., some sentences surrounding the occurance of the search term in the website). Therefore, we do not use whole websites, but ideally only relevant parts of those web sites, i.e., we exploit the search engine both for information retrieval and for information extraction. For each concept c, we perform a single search each for the fragment, the label, and the comment (if present), thus, we generate up to three documents docfragment(c), doclabel(c), and doccomment(c). The similarity score for each pair of concepts is then computed as the maximum similarity over all of the documents generated for those concepts: sim(c1, c2) := maxi,j∈{fragment,label,comment}sim (doci(c1), docj(c2)) (1) For computing the similarity sim∗ of two documents, we compute a TF-IDF score, based on the complete set of documents retrieved for all concepts in both ontologies. Using the TF-IDF measure for computing the similarity of the documents has several advantages. First, stop words like and, or, and so on are inherently filtered, because they occur in the majority of documents. Second, terms that are common in the domain and thus have little value for disambiguating mappings are also weighted lower. For example, the word anatomy will occur quite frequently in the anatomy track, thus, it has only little value for determining mappings there. On the other hand, in the library track, it will be a useful topic identifier and thus be helpful to identify mappings. The TF-IDF measure guarantees that the word anatomy gets weighted accordingly in each track. The result is a score matrix with elements between 0 and 1 for each pair of concepts from both ontologies. For each row and each column where there is a score exceeding τ , we return that pair of concepts with the highest score as a mapping. Since most ontology matching problems only look for 1 : 1 mappings, we optionally use edit distance for tie breaking if there is more than one candidate sharing the maximum score. This happens, for example, for pairs like Proceedings – Proceedings and Proceedings – InProceedings in the conference track, which get very similar scores. Using the edit distance as a mechanism for tie breaking ensures that Proceedings is mapped to Proceedings and not to InProceedings. Figure 1 shows the entire process using the introductory example from the OAEI anatomy dataset, computing the similarity score for tarsal plate and eyelid tarsus. For multi-lingual ontologies, we first translate the fragments, labels, and comments to English as a pivot language [2], using the Bing Search API’s translation capabilities. The translated concepts are then processed as described above. The whole process is illustrated in Fig. 2. Title: tarsal plate definition of tarsal plate in the Medical ... Abstract: plate (plāt) 1. a flat structure or layer, as a ... ...

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WeSeE-Match results for OAEI 2013

WeSeE-Match is a simple, element-based ontology matching tool. Its basic technique is invoking a web search engine request for each concept and determining element similarity based on the similarity of the search results obtained. Multi-lingual ontologies are translated using a standard web based translation service. Furthermore, it implements a simple strategy for selecting candidate mappings ...

متن کامل

WikiMatch results for OEAI 2012

WikiMatch is a matching tool which makes use of Wikipedia as an external knowledge resource. The overall idea is to search Wikipedia for a given concept and retrieve all pages describing the term. If there is a large amount of common pages for two terms, then the concepts will have similar semantics. We make also use of the inter-language links between Wikipedias in different languages to match...

متن کامل

The Impact of the First Goal in the Final Result of the Futsal Match

Among the many technical and tactical aspects of the behavior of players, the goals are the most studied. The goal is the key to success for teams and its analysis in all matches of a major futsal tournament (World Cup) that allows multiple assessments. The aim of this study was to analyze the impact of the first goal for the final result in the futsal match, identifying the team that scored th...

متن کامل

Optimization of 3D Planning Dosimetry in a Breast Phantom for the Match Region of Supraclavicular and Tangential Fields

Introduction: The complex geometry of breast and also lung and heart inhomogeneities near the planning target volume (PTV) result in perturbations in dose distribution. This problem can result in overdosage or underdosage in the match region of the three treatment fields. The purpose of this study is to create a homogeneous dose distribution in the match region between the supraclavicular and t...

متن کامل

The Use of Fundamental Color Stimulus to Improve the Performance of Artificial Neural Network Color Match Prediction Systems

In the present investigation attempts were made for the first time to use the fundamental color stimulus as the input for a fixed optimized neural network match prediction system. Four sets of data having different origins (i.e. different substrate, different colorant sets and different dyeing procedures) were used to train and test the performance of the network. The results showed that th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012