Informatik Hybrid Methods of Natural Language Analysis
نویسنده
چکیده
The automatic retrieval of syntax structure has been a longstanding goal of computer science, but one that still does not seem attainable. Two different approaches are commonly employed: 1. A method of grammatical description that is deemed adequate is implemented in a way that allows the necessary generalizations to be expressed, and at the same time is still computationally feasible. General principles of the working language are expressed within this model. Analysing unknown input then consists in computing a structure that conforms to all principles that constitute the model. Often it is also necessary to select the desired solution from several possible ones. 2. Instead of linguistic assumptions, a large set of problems already solved is used to induce a probability model that defines a total ordering on all possible structures of the working language. The parsing task is thus transformed into an optimization problem that always selects the structure that is most similar to the previous solutions in some way. The exact nature of this similarity is defined by the algorithm used for extracting the model. There are obvious and important upand downsides to both approaches. Theorydriven parsers often cannot cover arbitrary input satisfactorily because not all occurring structures were anticipated. Among the analyzable sentences, the ambiguity of the results is often very high; thousands of analyses for a sentence of a dozen words are not uncommon. Worse yet, each of these problems can be solved only at the expense of the other. Finally, this kind of grammar development requires enormous effort by experts qualified in a particular language. The currently predominant statistical approaches exhibit largely complementary features (automatic extraction of grammar rules, ambiguity resolution via numeric scores, robustness through smoothing and interpolation). However, they lack the perspicuity of the rule-based approach: • Acceptable and less acceptable structures are processed indifferently. Unusual or wrong turns of phrase are only detected insofar as analysis sometimes fails altogether.
منابع مشابه
Powerful semantics can make language processing more robust
Human language is an inferential coding system which means that not all information to interpret an utterance is explicitly communicated, it must be inferred. Moreover the meaning of human language passes by intermediary of rich conceptualizations of the world which are culture and language dependent. These two features make natural language processing very difficult and introduce a glass ceili...
متن کاملIntegrating deep and shallow natural language processing components: representations and hybrid architectures
We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processin...
متن کاملSymbolic Parsing and Probabilistic Decision Making. the Speech and Language Experience with Hybrid Information Processing
In natural language technology up to now most projects were based on either logical and linguistic methods or they were strictly based on stochastic techniques alone borrowed from pattern recognition. This article discusses hybrid symbolic and stochastic techniques in natural language processing as they are currently explored in many projects and in particular in our work within the Verbmobil p...
متن کاملAutomated Analysis of Reasoning and Argumentation Structures in Texts
In many application areas of intelligent systems, natural language communication is considered a major source for substantial progress, even for systems whose pure reasoning capabilities are exceptional. Unfortunately, it turns out to be extremely difficult to build adequate natural language processing facilities for the interaction with such systems. In this talk, I will expose some fundamenta...
متن کاملA hybrid approach robust text analysis
This paper addresses the problem of performing structural and semantic analysis of data where the syntactic and semantic models of the domain are inadequate, and robust methods must be employed to perform a “best approximation” to a complete analysis. This problem is particularly pertinent in the domain of text analysis. The ability to deal with large amounts of possibly illformed or unforeseen...
متن کاملHERALD Hybrid Environment for Robust Analysis of Language Data
This project addresses the problem of performing structural and semantic analysis of data where the syntactic and semantic models of the domain are inadequate, and robust methods must be employed to perform a “best approximation” to a complete analysis. This problem is particularly pertinent in the domain of text analysis. The ability to deal with large amounts of possibly ill-formed or unfores...
متن کامل