Restarting Automata: Motivations and Applications
نویسندگان
چکیده
Automata with restart operation (henceforth RA) constitute an interesting class of acceptors and reduction systems at the same time, and simultaneously display a remarkable ’explicative power’ wrt. the analysis of formal and natural languages. This paper focuses on RA as a formal-theoretical tool for modelling the analysis by reduction of natural languages and on the role of analysis by reduction (i.e. of analysis by RA) in dependency approach to syntax, and it also outlines some practical applications. 1. Restarting automata – bottom up analysers The aim of this contribution is to present the main motivations for study of restarting automata and outline some applications of restarting automata in linguistics in an informal way. All the formal notions can be found in [1] or in [3], which also contains the bibliographic data. A restarting automaton M can be described as a non-deterministic device with a finite control unit and a head with a lookahead window attached to a linear list. The automaton works in cycles. Within one cycle, it performs the following sequence of actions: moves the head along the word on the list (possibly in both directions); rewrites — a limited number of times within one cycle — the contents of the list within the scope of its lookahead by another (usually shorter) string defined by instructions of the automaton; continues by scanning some further symbols; “restarts” – i.e. resets the control unit into the initial state and places the head on the left end of the (shortened) word. The computation halts when M enters an accepting or a rejecting state. 2 M. Plátek, M. Lopatková, K. Oliva M can be viewed as a recognizer and as a (regulated) reduction system, as well. A reduction by M is defined trough the rewriting operations performed during one individual cycle by M , in which M distinguishes two alphabets: the input alphabet and the working alphabet. For any input word w, M induces a set (due to non-determinism) of sequences of reductions. We call this set the complete analysis of w by M (CA(w,M)). CA(w,M) is composed of accepting sequences and rejecting sequences of reductions. If, for a given w, CA(w,M) contains at least one accepting sequence, w is accepted. Otherwise, w is rejected. From the techniques summarized in [1] and quoted in [3] it follows that restarting automata define a taxonomy for types of regulation of bottom-up syntactic analysers (parsers) for several types of (originally unregulated) grammars (e.g., CFGs, pure grammars, Marcus grammars, categorial grammars, and for variants of these allowing for discontinuous constituency). Restarting automata enable direct modelling of correct syntactic phenomena as well as of syntactic incorrectness. This is crucial since in the current techniques of computational linguistics the positive and negative observations (constraints) play equally important roles. Let this be explained in more detail. Syntactic observations of natural languages are collected stepwise, in a slow and uneasy way. At any moment of the development of a realistic syntactic description, the collection of positive syntactic observations is far from complete and precise. The same, then, holds also for negative observations. Because of lack of initial completness and precision, both types of observations have to be collected iteratively and gradually integrated into the analysers (grammars), and their formal description then individually verified and debugged. Such an approach, in turn, can be efficiently supported by (implementing the analysers by means of) restarting automata. 2. Restarting automata – analysis by reduction A special (restricted) type of restarting automata is the RLW -automata, cf. [3]. The operation of an RLW -automaton can be viewed as implementation of a set of metainstructions, each of which describes the actions to be performed within an individual cycle of the automaton. Each meta-instruction has the form (R1, u → v,R2), where R1, R2 are regular expressions, u, v are strings over the input alphabet (|v| < |u|), and u → v is a rewriting rule. In one cycle, the RLW -automaton rewrites (according to the meta-instruction) the contents u of its lookahead by the string v (only one rewriting operation can be performed within one cycle). An important property of a meta-instruction is that it can be verified (debugged) individually. A very important feature of operation of RLW -automata is the “error preserving property”: if an input sentence is incorrect then the reduced sentence is also incorrect (after any number of cycles). In addition, deterministic RLW -automata have “correctness preserving property”: if a sentence is correct then the reduced sentence is correct as well. Obviously, both these properties are important for the description (and build-up of syntactic structures) of natural languages. Due to the above, the RLW -automata can serve as a framework for formalization of the analysis method called analysis by reduction (AR), which constitutes a natural and above all solid basis for building dependency grammars of natural languages. In particular, AR helps to find (define) the relations of: vertical dependency, which holds in case one of the two words constituting the dependency pair can be omitted without harming the grammaticality of the sentence, while the other word cannot be omitted; horizontal dependency, which holds if none of the two words constituting the dependency pair can be omitted without making the result ungrammatical; coordination and other non-dependency relations; as well as to state which pairs of words have no syntactic relation to each other. Restarting Automata: Motivations and Applications 3 Let it be remarked that in the visualization of dependency trees (grammars), both vertical and horizontal dependencies are usually represented as one type, e.g., as oblique dependencies (see the Figs. below for examples of dependency trees). The transformation of vertical dependencies into the oblique dependencies is trivial. On the other hand, in order to obtain the oblique dependencies from the horizontal ones (i.e. to determine the ’governors’ of these dependencies) finer methods are to be used, (e.g., analogy). Let us outline some principles of the analysis by reduction and its relation to syntactic (in)dependencies and coordinations: • The AR consists of stepwise simplification of a complete sentence in such a way that its syntactic correctness is preserved; at the end, the ’core’ of the sentence is obtained. • The reduction is the basic operation of AR: in each step (cycle) the simplification is performed by deleting at least one word of the sentence and possibly rewriting other words in the sentence. The deletion is ’justified’ by linguistically motivated rules. • The process is non-deterministic, mutually independent words (parts) can be deleted in any order (preserving syntactic correctness is the only criterion for the success of the reduction). • All the vertically dependent words (parts) must be deleted before their governor (from the corresponding dependency structure) is deleted; all the horizontally dependent words must be deleted simultanously (i.e. within one cycle). • Two parts of a sentence can be considered as a coordination only if these parts are not (syntactically) dependent on each other. The following three examples illustrate the way how dependency structures are obtained via AR. Note the contrast between the (superficial) similarity of the three sample sentences and the differences among the resulting types of AR’s. Er starb vor den Ferien. Er starb eine Woche vor den Ferien. *Er starb eine Woche. Er starb eine Woche vor den Ferien Er starb. Figure 1: First example of AR and the corresponding dependency structure. The sentence in Fig.1 can undergo two different reductions in the first reduction step (in the left part of Fig.1): either the deletion of the string of horizontally dependent words vor den Ferien or the deletion of the horizontally dependent pair eine Woche. In the first case the syntactic correctness of the sentence is violated (which is marked off by an asterisk and italics – in this case the reduction fails), in the second case the correctness is preserved. The second reduction step is then determined unambiguously: it consists in the deletion of the words vor den Ferien. In such a way we have obtained only one correct sequence of reductions and one incorrect reduction. This result determines unambiguously the dependency structure from the right part of Fig.1. The oblique (in this example, originally vertical) dependencies are represented by arrows, the horizontal dependencies are represented by the remaining strings of words in the ’nodes’. Let us consider the second example. In this case only the first reduction is correct, while the second one yields an incorrect string. In this way we obtain only one arrow denoting a vertical dependency in the resulting dependency structure (right half of the picture). It is the one between the nodes with Angst and vor den Ferien. The other one (originally horizontal), between Er hatte and Angst, is obtained by analogy, i.e. via the observation that the parts with finite verbs occur ”usually” on the top of a dependency structure, similarly as in the previous example. 4 M. Plátek, M. Lopatková, K. Oliva *Er hatte vor den Ferien. Er hatte Angst vor den Ferien. Er hatte Angst. Er hatte
منابع مشابه
Lower Bounds for Nonforgetting Restarting Automata and CD-Systems of Restarting Automata
Some lower bound results are presented that separate the language classes accepted by certain types of nonforgetting restarting automata or CD-systems of restarting automata from each other.
متن کاملRelations and Transductions Realized by Restarting Automata
In the present thesis we introduce several ways of extending restarting automata to devices for realizing binary (word) relations, which is mainly motivated by linguistics. In particular, we adapt the notion of input/output-relations and proper-relations to restarting automata and to parallel communicating systems of two restarting automata (PC-systems). Further, we introduce a new model, calle...
متن کاملRestarting Automata and Variants of j-Monotonicity
In the literature various notions of (non-) monotonicity of restarting automata have been studied. Here we introduce two new variants of (non-) monotonicity for restarting automata and for two-way restarting automata: left (non-) monotonicity and right-left (non-) monotonicity. It is shown that for the various types of deterministic and nondeterministic (two-way) restarting automata without aux...
متن کاملRestarting Automata with Auxiliary Symbols and Small Lookahead
We present a study on lookahead hierarchies for restarting automata with auxiliary symbols and small lookahead. In particular, we show that there are just two different classes of languages recognised by RRWW automata, through the restriction of lookahead size. We also show that the respective (left-) monotone restarting automaton models characterise the context-free languages and that the resp...
متن کاملCD-Systems of Restarting Automata Governed by Explicit Enable and Disable Conditions
We introduce a new mode of operation for CD-systems of restarting automata by providing explicit enable and disable conditions in the form of regular constraints. We show that, for each CD-systemM of restarting automata and each mode m of operation considered by Messerschmidt and Otto, there exists a CD-systemM′ of restarting automata of the same type asM that, working in the new mode ed, accep...
متن کامل