Robust deep linguistic processing
نویسنده
چکیده
This dissertation deals with the robustness problem of deep linguistic processing. Hand-crafted deep linguistic grammars provide precise modeling of human languages, but are deficient in their capability of handling ill-formed or extra-grammatical inputs. In this dissertation, we argue that with a series of robust processing techniques, improved coverage can be achieved without sacrificing efficiency or specificity of deep linguistic processing. An overview of the robustness problem in state-of-the-art deep linguistic processing systems reveals that insufficient lexicon and overrestricted constructions are the major sources for the lack of robustness. Targeting both, several robust processing techniques are proposed as add-on modules to the existing deep processing systems. For the lexicon, we propose a deep lexical acquisition model to achieve automatic online detection and acquisition of missing lexical entries. The model is further extended for acquiring multiword expressions which are syntactically and/or semantically idiosyncratic. The evaluation shows that our lexical acquisition results significantly improved grammar coverage without noticeable degradation in accuracy. For the constructions, we propose the partial parsing strategy to maximally recover the intermediate results when the full analysis is not available. Partial parse selection models are proposed and evaluated. Experiment results show that the fragment semantic outputs recovered from the partial parses are of good quality and high value for practical usage. Also, the efficiency issues are carefully addressed with new extensions to the existing efficient processing algorithms.
منابع مشابه
Lexical Entry Templates for Robust Deep Parsing
We report on the development and employment of lexical entry templates in a large–coverage unification–based grammar of Spanish. The aim of the work reported in this paper is to provide robust deep linguistic processing in order to make the grammar more adequate for industrial NLP applications.
متن کاملA Robust And Hybrid Deep-Linguistic Theory Applied To Large-Scale Parsing
Modern statistical parsers are robust and quite fast, but their output is relatively shallow when compared to formal grammar parsers. We suggest to extend statistical approaches to a more deep-linguistic analysis while at the same time keeping the speed and low complexity of a statistical parser. The resulting parsing architecture suggested, implemented and evaluated here is highly robust and h...
متن کاملCombining Shallow and Deep Processing for a Robust, Fast, Deep-Linguistic Dependency Parser
This paper describes Pro3Gres, a fast, robust, broad-coverage parser that delivers deep-linguistic grammatical relation structures as output, which are closer to predicate-argument structures and more informative than pure constituency structures. The parser stays as shallow as is possible for each task, combining shallow and deep-linguistic methods by integrating chunking and by expressing the...
متن کاملLanguage Processing for Spoken Dialogue Systems: Is Shallow Parsing Enough?
With maturing speech technology, spoken dialogue systems are increasingly moving from research prototypes to fielded systems. The fielded systems however generally employ much simpler linguistic and dialogue processing strategies than the research prototypes. We describe an implemented spoken-language dialogue system for a travel planning domain which supports a mixed initiative dialogue strate...
متن کاملAnswer Validation Through Robust Logical Inference
The paper featuresMAVE, a knowledge-based system for answer validation through deep linguistic processing and logical inference. A relaxation loop is used to determine a robust indicator of logical entailment. The system not only validates answers directly, but also gathers evidence by proving the original question and comparing results with the answer candidate. This method boosts recall by up...
متن کامل