Inducing Constraint-Based Grammars using a Domain Ontology

نویسنده

Smaranda Muresan

چکیده

In many knowledge intensive applications, there is a critical need to populate knowledge bases rapidly and to keep them up to date. Since the World Wide Web is a large source of information that continuously is being updated, a solution is to automatically acquire knowledge from text, which requires language understanding in a smaller or grater degree. The need for “rapid” text-to-knowledge acquisition imposes some critical conditions on the methods used: scalability and adaptability. Thus, there is a need to move from handcrafted grammars and hand-build systems to learning methods. However, most statistical and learning techniques have been applied only to restricted domains (e.g., air travel domain) or tasks (e.g., information extraction where the knowledge is limited to a priori relations and entities), reducing the variety of the acquired knowledge. This thesis presents a framework for domain specific textto-knowledge acquisition, with focus on medical domain. The main challenge of this domain is the abundance of linguistic phenomena that require both syntactic and semantic information in order to “understand” the meaning of the text, and thus to acquire knowledge. Examples include prepositional phrases, coordinations, noun-noun compounds and nominalizations, phenomena which are not well covered by existing syntactic or semantic parsers. In my thesis, I propose a relational learning framework for the induction of a constraint-based grammar able to capture both syntax and aspects of meaning in an interleaved manner from a small number of semantically annotated data. The novelty of this framework is the learning method based on an ordered set of examples. This approach to learning follows the argument that language acquisition is an incremental process, in which simpler rules are acquired prior to complex ones. Several new theoretical concepts need to be tied together in order to make the approach feasible and theoretically sound: 1) a type of constraint-based grammar, called lexicalized well-founded grammar, which is learnable and able to capture large fragments of natural language; 2) a semantic representation, which we call semantic molecule that can be linked to the grammar and is simple enough to allow the relational learning of the grammar; 3) a small ordered set of semantically annotated examples, called representative examples, which is used as our training data; and 4) an ontology-based semantic interpretation encoded as a constraint at the grammar rule level (Φonto), which refrains from full logical analysis of meaning, known to be intractable. On the application side, the grammar learning is used for rapid acquisition of medical terminological knowledge from text. Semantic molecule. Given a natural language expression, w, we denote by w′ = h ./ b the semantic molecule of w, where h is the head acting as a valence for semantic composition, and b is the body acting as the semantic representation of w. The head is represented as a one level feature structure (i.e., feature values are atomic), while the body is a Canonical Logical Form given as a flat semantic representation, similar with Minimal Recursion Semantics (MRS). Unlike MRS, it uses as semantic primitives a set of framebased atomic predicates of the form: concept.attr=concept, suitable for the interpretation on the ontology: concept corresponds to a frame in the ontology and attr is a slot of the frame, encoding either a property or a relation. For example, for the adjective “chronic” we have the following semantic molecule:   cat a head X mod Y   ./ [X .isa = chronic,Y.Has prop = X ]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inducing Constraint-based Grammars from a Small Semantic Treebank

We present a relational learning framework for grammar induction that is able to learn meaning as well as syntax. We introduce a type of constraint-based grammar, lexicalized well-founded grammar (lwfg), and we prove that it can always be learned from a small set of semantically annotated examples, given a set of assumptions. The semantic representation chosen allows us to learn the constraints...

متن کامل

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Lexicalized Well-Founded Grammars: Learnability and Merging

This paper presents the theoretical foundation of a new type of constraint-based grammars, Lexicalized Well-Founded Grammars, which are adequate for modeling human language and are learnable. These features make the grammars suitable for developing robust and scalable natural language understanding systems. Our grammars capture both syntax and semantics and have two types of constraints at the ...

متن کامل

Generating LTAG grammars from a lexicon/ontology interface

This paper shows how domain-specific grammars can be automatically generated from a declarative model of the lexicon-ontology interface and how those grammars can be used for question answering. We show a specific implementation of the approach using Lexicalized Tree Adjoining Grammars. The main characteristic of the generated elementary trees is that they constitute domains of locality that sp...

متن کامل

A Logic Programming Based Approach to QA@CLEF05 Track

In this paper the methodology followed to build a questionanswering system for the Portuguese language is described. The system modules are built using computational linguistic tools such as: a Portuguese parser based on constraint grammars for the syntactic analysis of the documents sentences and the user questions; a semantic interpreter that rewrites sentences syntactic analysis into discour...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Inducing Constraint-Based Grammars using a Domain Ontology

نویسنده

چکیده

منابع مشابه

Inducing Constraint-based Grammars from a Small Semantic Treebank

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Lexicalized Well-Founded Grammars: Learnability and Merging

Generating LTAG grammars from a lexicon/ontology interface

A Logic Programming Based Approach to QA@CLEF05 Track

عنوان ژورنال:

اشتراک گذاری