Automatically Constructing Control Systems by Observing Human Behaviour
نویسنده
چکیده
We describe experiments to devise machine learning methods for the construction of control systems by observing how humans perform control tasks. The present technique uses a propositional learning system to discover rules for flying an aircraft in a flight simulation program. We discuss the problems encountered and present them as a challenge for researchers in Inductive Logic Programming. Overcoming these problems will require ILP methods that go beyond our current knowledge, including induction over noisy numeric domains, dealing with time and causality and complex predicate invention. 1. Learning Control Rules Almost all applications of inductive learning, so far, have been in classification tasks such as medical diagnosis. For example, medical records of patients’ symptoms and accompanying diagnoses made by physicians are entered into an induction program which constructs rules that will automatically diagnose new patients on the basis of the previous data. The output is a classification. We are interested in automatically building control rules that output an action. That is, when a state of a dynamic system arises that requires some corrective action, the rules should be able to recognise the state and output the appropriate action. Just as diagnostic rules can be learned by observing a physician at work, we should be able to learn how to control a system by watching a human operator at work. In this case, the data provided to the induction program are logs of the actions taken by the operator in response to changes in the system. In a preliminary study (Sammut, Hurst, Kedzier and Michie, 1992), we have been able to synthesise rules for flying an aircraft in a flight simulator. The rules are able to make the plane take off, fly to a specified height and distance from the runway, turn around and land safely on the runway. While control systems have been the subject of much research in machine learning in recent years, we know of few attempts to learn control rules by observing human behaviour. Michie, Bain and Hayes-Michie (1990) used an induction program to learn rules for balancing a pole (in simulation) and earlier work by Donaldson (1960), Widrow and Smith (1964) and Chambers and Michie (1969) demonstrated the feasibility of learning by imitation, also for pole-balancing. To our knowledge, the autopilot described here is the most complex control system constructed by machine learning methods. However, there are still many research issues to be investigated and they are the subject of this paper. The main problems we discuss are listed below. • The difference between learning classifications and learning actions is that the learning algorithm must recognise that actions are performed in response to, and result in, changes in the system being controlled. Classification algorithms only deal with static data and do not have to cope with temporal and causal relations. • In our preliminary study we were able to demonstrate the feasibility of learning a specific control task. The next challenge is to build a generalised method that can learn basic skills that can be used in a variety of tasks. These skills become building blocks that can be assembled into a complete new controller to meet the demands of a specified task. • One of the limitations we have encountered with existing learning algorithms is that they can only use the primitive attributes supplied in the data. This results in control rules that cannot be understood by a human expert. Constructive induction (or predicate invention) may be necessary to build higher-level attributes that simplify the rules. We believe it is important that machine learning research should be directed towards acquiring control knowledge since this will give us a way of describing human subcognitive skills and it will result in useful engineering tools. One of the outstanding problems our research addresses is that subcognitive skills are inaccessible to introspection. For example, if you are asked by what method you ride a bicycle, you will not be able to provide an adequate answer because that skill has been learned and is executed at a subconscious level. By monitoring the performance of a subcognitive skill, we are able to construct a functional description of that skill in the form of symbolic rules. This not only reveals the nature of the skill but also may be used as an aid to training since the student can be explicitly shown what he or she is doing. Learning control rules by induction provides a new way of building complex control systems quickly and easily. For example, the need in aerospace for pilots to control airplanes close to the margin of instability is putting increasing pressure on present techniques both of pilot training and of flight automation. We claim that it will be possible to build a pilot’s assistant using inductive methods. A control engineer is only able to supply automated modules, such as autolanders, provided that envisaged meteorological or other conditions are not too abnormal. There are specialised manoeuvres that the pilot would be relieved to see encapsulated into an automated subtask, but which cannot, for reasons of complexity and unpredictability, be tackled with standard control-theoretic tools. Yet they can be tackled, often at the expense of effectiveness or safety, by a trained pilot’s skills that have been acquired by practice but which the pilot cannot explain. Control engineers and programmers, much as they might wish to, at present have no way to capture these procedures so as to solve the flight automation problem. In this context, the industry requires a convenient, and not too expensive, means of automatically constructing models of individual piloting skills. While our experiments have been primarily concerned with flight automation, inductive methods can be applied to a wide range of related problems. For example, an anaesthetist can be seen as controlling a patient in an operating theatre in much the same way as a pilot controls an aircraft. The anaesthetist monitors the patient’s condition just as a pilot monitors the aircraft’s instruments. The anaesthetist changes dosages of drugs and gases to alter the state of a system (the patient) in the same way that a pilot alters thrust and attitude to control the state of a system (the aircraft). A flight plan can be divided into stages where different control strategies are required, eg. take-off, straight and level flight, landing, etc. So too, the administration of anaesthetics can be divided into stages: putting the patient to sleep, maintaining a steady state during the operation and revival after the procedure has been completed. In the next section, we will describe our preliminary experiments using a decision tree induction program. While we were able to meet our initial goals, we believe that we are reaching the limits of the descriptive power of propositional learning algorithms and will have to a first-order system. Unfortunately, no existing Inductive Logic Programming algorithm is suitable for use in control applications. Section 3 describes some of the problems that we face and section 4 suggests a number of avenues of research for ILP. 2. Preliminary Study This section provides a brief description of our preliminary study into constructing rules for an autopilot by logging the flights of human pilots. The reader is referred to (Sammut, Hurst, Kedzier and Michie, 1992) for more detail. The source code to a flight simulation program was made available to us by Silicon Graphics Incorporated (SGI). Our task was to log actions taken by ‘pilots’ during a number of ‘flights’ on the simulator. These logs were then used to construct, by induction, a set of rules that could fly the aircraft through the same flight plan that the pilots flew. The results presented below are derived from the logs of three subjects who each ‘flew’ 30 times. We will refer to the performance of a control action as an ‘event’. During a flight, up to 1,000 events can be recorded. With three pilots and 30 flights each the complete data set consists of about 90,000 events. An autopilot has been constructed for each of the three subjects. Each pilot is treated separately because different pilots can fly the same flight plan in different ways. The central control mechanism of the simulator is a loop that interrogates the aircraft controls and updates the state of the simulation according to a set of equations of motion. Before repeating the loop, the instruments in the display are updated. The display update has been modified so that when the pilot performs a control action by moving the mouse or changing the thrust or flaps settings, the action and the state of the simulation are written to a log file. The data recorded are: on_ground boolean: is the plane on the ground? g_limit boolean: have we exceeded the plane’s g limit wing_stall boolean: has the plane stalled? twist integer: 0 to 360 ̊ (in tenths of a degree, anti-clockwise) elevation integer: 0 to 360 ̊ (in tenths of a degree, anti-clockwise) azimuth integer: 0 to 360 ̊ (in tenths of a degree, anti-clockwise) roll_speed integer: 0 to 360 ̊ (in tenths of a degree per second) elevation_speed integer: 0 to 360 ̊ (in tenths of a degree per second) azimuth_speed integer: 0 to 360 ̊ (in tenths of a degree per second) airspeed integer: (in knots) climbspeed integer: (feet per second) E/W distance real: E/W distance from centre of runway (in feet) altitude real: (in feet) N/S distance real: N/S distance from northern end of runway (in feet) fuel integer: (in pounds) rollers real: ±4.3 elevator real: ±3.0 rudder real: not used thrust integer: 0 to 100% flaps integer: 0 ̊, 10 ̊ or 20 ̊ spoilers integer: not relevant for a Cessna Most of the attributes of an event are numeric, including real numbers, sub-ranges and circular measures. Since there can be an enormous amount of variation in the way pilots fly, the data are very noisy. Note also that the output value of induction is a control setting such as the position of the flaps, rollers or elevator. Thus, the output values are also required to be numeric. At the start of a flight, the aircraft is pointing North, down the runway. The subject is required to fly a well-defined flight plan that consists of the following manoeuvres: take off and fly to an altitude of 2,000 feet; level out and fly to a distance of 32,000 feet from the starting point; turn right to a compass heading of approximately 330 ̊; at a North/South distance of 42,000 feet; turn left to head back towards the runway; line up on the runway and descend; land on the runway. The data from each flight were segmented into the stages listed above. For each stage we construct four separate decision trees for the elevator, rollers, thrust and flaps. The rudder is not used. A program filters the flight logs generating four input files for the induction program. The attributes of a training example are the flight parameters of the simulator, listed above. The dependent variable or class value is the attribute describing a control action. The reason for segmenting the data is that each stage requires a different manoeuvre. By combining all the data from all stages, we would be expecting the induction program to construct seven sets of rules for controlling the aircraft in each of the seven stages. This makes the program’s task more difficult than is necessary since we have already defined the sub-tasks and have told the human subjects what they are. It is reasonable that the learning program should have the same information as the pilots. For the preliminary study, we used the decision tree induction program C4.5 (Quinlan, 1987). To test the induced rules, the original autopilot code in the simulator is replaced by the rules. A post-processor converts C4.5’s decision trees into if-statements in C so that they can be incorporated into the flight simulator easily. Hand-crafted C code determines which stage the flight has reached and decides when to change stages. The appropriate rules for each stage are then selected in a switch statement. Each stage has four, independent if-statements, one for each action. We demonstrate how these rules operate by describing the controllers for the first stage. The critical rule at take-off is the elevator rule: elevation > 4 : level_pitch elevation <= 4 : | airspeed <= 0 : level_pitch | airspeed > 0 : pitch_up_5 This states that as thrust is applied and the elevation is level, pull back on the stick until the elevation increases to 4 ̊. Because the controls take some time to respond, the final elevation usually reaches 11 ̊, which is close to the values obtained by the pilot. pitch_up_5 indicates a large elevator action, whereas, pitch_up_1 would indicate a gentle elevator action. The other significant control at this stage is flaps: elevation <= 6 : full_flaps elevation > 6 : no_flaps Once the aircraft has reached an elevation angle of 6 ̊, the flaps are raised. The rules we have synthesised are successful in the sense that the plane follows the flight plan just as the human trainer would and lands safely on the runway. Because induction over a large set of data has an averaging effect, the autopilot actually flies more smoothly than the trainer. Figure 1 shows a profile of the trainer’s flight, plotting the E/W distance travelled as a function of the N/S distance away from the runway. Each point represents an action being taken by the pilot. This flight can be compared with the autopilot’s flight shown in Figure 2. A similar comparison can be made between the altitude profiles for the trainer and the autopilot, shown in figures 3 and 4, 0 -10000 -20000 -30000 -40000 -50000 -2000
منابع مشابه
Towards Automated Code Generation for Autonomous Mobile Robots
With the expected growth in mobile robotics the demand for expertise to develop robot control code will also increase. As end-users cannot be expected to develop this control code themselves, a more elegant solution would be to allow the end-users to teach the robot by demonstrating the task. In this paper we show how route learning tasks may be “translated” directly into robot control code sim...
متن کاملCILIOS: Connectionist inductive learning and inter-ontology similarities for recommending information agents
For a software information agent, operating on behalf of a human owner and belonging to a community of agents, the choice of communicating or not with another agent becomes a decision to take, since communication generally implies a cost. Since these agents often operate as recommender systems, on the basis of dynamic recognition of their human owners’ behaviour and by generally using hybrid ma...
متن کامل02 Modeling Interaction Using Learnt Qualitative Spatial Relations and Variable Length Markov Models
Motivated by applications such as automated visual surveillance and video monitoring and annotation, there has been a lot of interest in constructing cognitive vision systems capable of interpreting the high level semantics of dynamic scenes. In this paper we present a novel approach for automatically inferring models of object interactions that can be used to interpret observed behaviour withi...
متن کاملModeling Interaction Using Learnt Qualitative Spatio-Temporal Relations and Variable Length Markov Models
Motivated by applications such as automated visual surveillance and video monitoring and annotation, there has been a lot of interest in constructing cognitive vision systems capable of interpreting the high level semantics of dynamic scenes. In this paper we present a novel approach for automatically inferring models of object interactions that can be used to interpret observed behaviour withi...
متن کاملThe Impact of Intra-Network Communications of Actors on Financial Reporting Quality by Structural Equations Technique
Actor-network theory, which is considered as a development of socio-technical structuralism school, observes reservation and stability of networks containing personal and impersonal components such as individuals, organizations, communication software and hardware, and infrastructural standards by examination of socio-technical dimensions concurrently.The goal of this research is studying the i...
متن کاملScaffolding effective help-seeking behaviour in mastery and performance oriented learners
In order to build learning systems that care we need to increase our understanding of the affective and motivational dimensions of learning. This will allow us to develop a model of our learners which extends beyond their behaviours to the feelings and motivations which underlie those behaviours. In this paper we focus on children’s (10 yrs) help-seeking behaviour when using an interactive lear...
متن کامل