A Similarity Measure for the ALN Description Logic

نویسندگان

  • Nicola Fanizzi
  • Claudia d’Amato
چکیده

This work presents a similarity (and a derived dissimilarity) measure for Description Logics that are the theoretical counterpart of the standard representations for ontological knowledge. The focus is on the definition of a similarity measure for ALN concept descriptions, based both on the syntax and on the semantics of the descriptions elicited from the current state of the world. An extension of the measure is proposed for involving individuals and then for evaluating their (dis-)similarity, which makes it suitable for several (inductive) tasks. 1 Assessing the Similarity in Concept Languages In the Semantic Web perspective [3], similarity plays an important role in several tasks, such as classification, clustering, retrieval and knowledge integration. Nevertheless, we are still at an initial phase in the definition of measures for assessing the similarity or the dissimilarity of concepts as described in the standard ontology languages [5]. Various distance measures for concept representations have been proposed in the literature (see a survey in [20]); they can be essentially categorized in three different types. Path distance measures have been defined as a function of the distance between terms in the hierarchical structure underlying an ontology [6]. The feature matching approach [24] uses both common and discriminant features among concepts and/or concept instances to compute the semantic similarity. Finally, there are methods founded on the information content [19, 10] where a similarity measure for concepts within a hierarchy is defined in terms of the variation of the information content conveyed by the concepts and the one conveyed by their immediate common super-concept. This is a measure of the variation of the information from a description level to a more general one. Other measures compute the similarity among concepts belonging to different ontologies (e.g. see [25]). In [21] a similarity function determines similar classes by using a matching process making use of synonym sets, semantic neighborhood, and discriminating features that are classified into parts, functions, and attributes (see a recent survey in [23]). However, for the moment, this topic is beyond the scope of our work. As pointed out in [5], most of the measures proposed so far are applicable to the assessment of the similarity of atomic concepts (within a hierarchy) rather than on composite ones or they refer to very simple ontologies, built only using simple relations such as is-a and part-of (typical of lexical ontologies). Nevertheless, the standard ontology languages (e.g., OWL [18]) are founded in Description Logics (henceforth DLs) since they borrow the typical DLs constructors. Thus, it becomes necessary to investigate the similarity of more complex concept descriptions expressed in DLs. However, it has been observed that the structure of the descriptions becomes much less important when richer representations are adopted, due to the expressive operators that can be employed. An approach intended for information retrieval purposes on DLs knowledge bases [16], aims at finding commonalities among concepts or among assertions, employing the Least Common Subsumer (LCS) operator [7] that computes the most specific generalization of the input concepts (with respect to subsumption). Considered a knowledge base and a query concept, a filter mechanism selects another concept from the knowledge base that is relevant for the query concept. Then the LCS of the two concepts is computed and finally all concepts subsumed by the LCS are returned. Most of the measures defined in the cited works are suitable for very simple languages and not for the composite descriptions that can be obtained using the operators of DLs. Hence the semantics of these descriptions derives almost straightforwardly from their simple structures. We decided to focus our attention on measures which are essentially founded on semantics. Initially, we have defined dissimilarity measures between concept descriptions that virtually may work for any representation [9], being based exclusively on semantics. But this falls short when individuals come into play. Indeed, in the tasks which represent the final aim of our investigation on these measures, such as clustering, classification and retrieval, it is necessary to compute distances between individuals and concepts or between individuals. By recurring the notion of most specific concept (MSC) of an individual with respect to an ABox [1], measures based both on the concept structure and their semantics can be extended to such cases. On the grounds of these ideas, we could define measures which are suitable for composite DLs descriptions and in particular for ALC [8, 10]. These measures elicit the underlying semantics by querying the knowledge base for assessing the information content of concept descriptions with respect to the knowledge base, as proposed also in [2]. In the perspective of defining a measure for more expressive ontology languages endowed with more constructors, with this work we intend to investigate and extend these ideas to languages endowed with numeric restrictions, starting from ALN . The remainder of this paper is organized as follows. In Sect. 2 the representation language ALN is presented. The similarity measure is illustrated and discussed in Sect. 3, with the extension to the cases involving individuals. Final remarks and possible applications and developments of the measure are examined in Sect. 4. 2 Background: The ALN Description Logic ALN is a DLs language which allows for the expression of universal features and numeric constraints [1]. It has been adopted because of the tractability of the main related reasoning services [11]. Furthermore it has already been adopted also in other frameworks for learning in hybrid representations such as CARIN-ALN [22] or IDLP [13]. In order to keep this paper self-contained, syntax and semantics for the reference representation is briefly recalled with the characterization of the descriptions in terms of concept graphs. 2.1 Syntax and Semantics In DLs, primitive concepts NC = {A, . . .} are interpreted as subsets of a certain domain of objects and primitive roles NR = {R, S, . . .} are interpreted as binary relations on such a domain. In ALN , composite concept descriptions are built using atomic concepts and primitive roles by means of the constructors presented in Table 1. The meaning of such descriptions is defined by means of an interpretation I = (∆I , ·I), where ∆I is the domain of the interpretation and the functor ·I (the interpretation function) maps concept and role descriptions to their extension: ∀C ∈ NC : CI ⊆ ∆I and ∀R ∈ NR : RI ⊆ ∆I ×∆I . Table 1. Constructors and related interpretations for ALN . Name Syntax Semantics top concept > ∆I bottom concept ⊥ ∅ primitive concept A AI ⊆ ∆ primitive negation ¬A ∆I \AI concept conjunction C1 u C2 CI 1 ∩ CI 2 value restriction ∀R.C {x ∈ ∆I | ∀y (x, y) ∈ RI → y ∈ CI} at-most restriction ≤ n.R {x ∈ ∆I | |{y ∈ ∆I | (x, y) ∈ RI}| ≤ n} at-least restriction ≥ n.R {x ∈ ∆I | |{y ∈ ∆I | (x, y) ∈ RI}| ≥ n} A knowledge base K = 〈T ,A〉 contains two components: a T-box T and an A-box A. T is a set of concept definitions C ≡ D, meaning CI = DI , where C is the concept name and D is a description given in terms of the language constructors. Differently from ILP, each (non primitive) concept has a single definition. Moreover, the DLs definitions are assumed not to be recursive, i.e. concepts cannot be defined in terms of themselves. The A-box A contains extensional assertions on concepts and roles, e.g. C(a) and R(a, b), meaning, respectively, that aI ∈ CI and (aI , bI) ∈ RI . Note that, differently from the examples in the ILP setting, the concept description C can be more complex than LP facts. For instance they could assert a universal property of the an individual: (∀R.(A u ¬B))(a) that is, role R relates a exclusively to individuals that are instances of the concept A u ¬B. Example 2.1. Examples of ALN descriptions are : Single ≡ Personu ≤ 0.marriedTo Polygamist ≡ Person u ∀marriedTo.Person u ≥ 2.marriedTo Bigamist ≡ Person u ∀marriedTo.Person u = 2.marriedTo MalePolygamist ≡ Male u Person u ∀marriedTo.Person u ≥ 2.marriedTo The notion of subsumption between DLs concept descriptions can be given in terms of the interpretations defined above: Definition 2.1 (subsumption). Given two concept descriptions C and D, C subsumes D iff it holds that CI ⊇ DI for every interpretation I. This is denoted denoted by C w D. The induced equivalence relationship, denoted C ≡ D, amounts to C w D and D w C. Note that this notion is merely semantic and independent of the particular DLs language adopted. It is easy to see that this definition also applies to the case of role descriptions. A related inference used in the following is instance checking, that is deciding whether an individual is an instance of a concept [12, 1]. Conversely, it may be necessary to solve the realization problem that requires finding the concepts which an individual belongs to, especially the most specific one: Definition 2.2. Given an ABox A and an individual a, the most specific concept of a w.r.t. A is the concept C, denoted MSCA(a), such that A |= C(a) and ∀D such that A |= D(a), it holds: D w C. 2.2 Structural Characterizations Semantically equivalent (yet syntactically different) descriptions can be given for the same concept. However they can be reduced to a canonical form by means of equivalence-preserving rewriting rules, e.g. ∀R.C1 u ∀R.C2 ≡ ∀R.(C1 u C2) (see [17, 1]). The normal form employs the notation needed to access the different parts (sub-descriptions) of a concept description C: – prim(C) denotes the set of all (negated) concept names occurring at the top level of the description C; – valR(C) denotes conjunction of concepts C1u· · ·uCn in the value restriction of role R, if any (otherwise valR(C) = >); – minR(C) = max{n ∈ IN | C v (≥ n.R)} (always a finite number); – maxR(C) = min{n ∈ IN | C v (≤ n.R)} (if unlimited then maxR(C) =∞). 1 It holds even in case no such R−filler is given. 2 Here (= n.R) is an abbreviation for (≤ n.R u ≥ n.R). Definition 2.3 (ALN normal form). A concept description C is in ALN normal form iff C = > or C = ⊥ or

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learnability of Description Logic Programs

Carin-ALN is an interesting new rule learning bias for ILP. By allowing description logic terms as predicates of literals in datalog rules, it extends the normal bias used in ILP as it allows the use of all quantified variables in the body of a clause. It also has at-least and at-most restrictions to access the amount of indeterminism of relations. From a complexity point of view Carin-ALN allo...

متن کامل

A Semantic Similarity Measure for Expressive Description Logics

A totally semantic measure is presented which is able to calculate a similarity value between concept descriptions and also between concept description and individual or between individuals expressed in an expressive description logic. It is applicable on symbolic descriptions although it uses a numeric approach for the calculus. Considering that Description Logics stand as the theoretic framew...

متن کامل

Computing Probabilistic Least Common Subsumers in Description Logics

Computing least common subsumers in description logics is an important reasoning service useful for a number of applications. As shown in the literature, it can, for instance, be used for similarity-based information retrieval where information retrieval is performed on the basis of the similarities of user-specified examples. In this article, we first show that, for crisp DLs, in certain cases...

متن کامل

On the Learnability of Description Logic Programms

Carin-ALN as proposed recently by Rouveirol and Ventos [2000] is an interesting new rule learning bias for ILP. By allowing description logic terms as predicates of literals in datalog rules, it extends the normal bias used in ILP as it allows the use of all quantified variables in the body of a clause, instead of the normal exist quantified variables and it has atleast and atmost restrictions ...

متن کامل

SIM-DLA: A Novel Semantic Similarity Measure for Description Logics Reducing Inter-concept to Inter-instance Similarity

While semantic similarity plays a crucial role for human categorization and reasoning, computational similarity measures have also been applied to fields such as semantics-based information retrieval or ontology engineering. Several measures have been developed to compare concepts specified in various description logics. In most cases, these measures are either structural or require a populated...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006