Learning to Fly Claude Sammut Donald Michie Scott Hurst Dana Kedzier The Turing Institute 36 North Hanover Street School of Computer Science and Engineering Glasgow, G1 2AD University of New South Wales United Kingdom Sydney, Australia Abstract behaviour. Michie, Bain and Hayes-Michie (1990) used an induction program to learn rules for balancing a pole (in This paper describes experiments in applying in- simulation) and earlier work by Donaldson (1960), ductive learning to the task of acquiring a com- Widrow and Smith (1964) and Chambers and Michie plex motor skill by observing human subjects. A (1969) demonstrated the feasibility of learning by imita- flight simulation program has been modified to tion, also for pole-balancing. To our knowledge, the au- log the actions of a human subject as he or she topilot described here is the most complex control system flies an aircraft. The log file is used to create the constructed by machine learning methods. The task we set input to an induction program. The output from ourselves was to teach the autopilot how to take off; fly the induction program is tested by running the to a set altitude and distance; turn around and land. We de- simulator in autopilot mode where the autopilot scribe our experiments with a particular aircraft simulation code is derived from the decision tree formed by and discuss the problems encountered and how they were induction. The autopilot must fly the plane ac- solved. We also discuss some of the remaining difficul- cording to a strictly defined flight plan. ties. 1 . THE PROBLEM 2 . THE FLIGHT SIMULATOR In this paper, we report on experiments that demonstrate The source code to a flight simulator was made available machine learning of a reactive strategy to control a dy- to us by Silicon Graphics Incorporated. The central con- namic system by observing a controller that is already trol mechanism of the simulator is a loop that interrogates skilled in the task. We have modified a flight simulation the aircraft controls and updates the state of the simulation program to log the actions taken by a human subject as he according to a set of equations of motion. Before repeating or she flies an aircraft. The log file is used to create the the loop, the instruments in the display are updated. The input to an induction program. The quality of the output simulator gives the user a choice of aircraft to fly. We from the induction program is tested by running the simu- have restricted all of our experiments to the simulation of lator in autopilot mode where the autopilot code is derived a Cessna, being easier for our subjects to learn to fly than from the decision tree formed by induction. the various fighters or larger aircraft available. A practical motivation for trying to solve this problem is One feature of the flight simulator that has had a signifi- that it is often difficult to construct controllers for com- cant effect on our experiments is that it is non-determinis- plex systems using classical methods. Anderson and tic. The simulator runs on a multi-tasking Unix system, Miller (1991) describe a problem with present-day au- not on a dedicated real-time system. Thus, it is not possi- tolanders, namely that they are not designed to handle ble to give a guaranteed real-time response because the large gusts of wind when close to landing. Similar prob- flight simulator can be interrupted by other processes or lems occur for helicopter pilots who must manoeuvre I/O traffic. If nothing is done to compensate for these in- their aircraft in high winds while there is a load slung be- terruptions, a person operating the simulator would notice neath the helicopter. Learning by trial-and-error could be that the program’s response to control actions would used in simulation, but if we already have a skilled con- change. If no other processes were stealing CPU time it troller, namely, a human pilot, then it is more economical would respond quickly but it could become very sluggish to learn by observing the pilot. when other processes were competing for the CPU. While control systems have been the subject of much re- To minimise the effects of variations in execution speed, search in machine learning in recent years, we know of the simulator regularly interrogates a real-time clock. This few attempts to learn control rules by observing human is used to calculate the number of main control loops be- - 1 -
ing executed each second. If the simulation has slowed turn is when the last grid line was reached. This corre- down since the last interrogation, the time interval used in sponds to about 42,000 feet. The turn is considered solving the equations of motion is altered to allow the complete when the azimuth is between 140˚ and 180˚. simulation to ‘catch up’. The time interval is also changed 5. Line up on the runway. The aircraft was considered to in response to an increase in execution speed. To a human be lined up when the aircraft's azimuth is less than 5˚ operator, who has a sense of time, this approximates uni- off the heading of the runway and the twist is less that form response. However, these adjustments do not ensure ± 10˚ from horizontal. a perfectly uniform response. Therefore, to an autopilot that has no external sense of time, the effects of its con- 6. Descend to the runway, keeping in line. The subjects trol actions will be somewhat different from one run to were given the hint that they should have an ‘aiming the next and even during one flight. point’ near the beginning of the runway. We have chosen to treat this problem as a challenge. If we 7. Land on the runway. are able to devise rules that can control a noisy system, we will have done well and in fact, the rules that have We will refer to the performance of a control action as an been generated can handle considerable variation. Thus we ‘event’. During a flight, up to 1,000 events can be can be optimistic that the methods we are developing can recorded. With three pilots and 30 flights each the com- be extended to more complex systems that have real dis- plete data set consists of about 90,000 events. The data turbances such as wind and genuinely noisy controls. recorded in each event are: Another ‘feature’ that we discovered about the Silicon on_ground boolean: is the plane on the ground? Graphics flight simulator is that the rudder does not have a g_limit boolean: have we exceeded the plane’s g realistic effect on the aircraft. Fortunately this did not af- limit fect us since none of our pilots used the rudder. While a wing_stall boolean: has the plane stalled? real pilot would frown upon this practice, it is possible to twist integer: 0 to 360˚ (in tenths of a degree, fly a real airplane without using the rudder (the rudder is see below) used in turns to stop the plane from ‘sliding’ with the re- elevation integer: 0 to 360˚ (in tenths of a degree, sult that the g-forces are not directed towards the floor as see below) they should be). azimuth integer: 0 to 360˚ (in tenths of a degree, 3 . LOGGING FLIGHT INFORMATION see below) roll_speed integer: 0 to 360˚ (in tenths of a degree The display update has been modified so that when the pi- per second) elevation_speed lot performs a control action by moving the control stick integer: 0 to 360˚ (in tenths of a degree (the mouse) or changing the thrust or flaps settings, the per second) azimuth_speed state of the simulation is written to a log file. Initially, integer: 0 to 360˚ (in tenths of a degree we obtained the services of 20 volunteers, believing that per second) airspeed the more logs we had from a variety of subjects the more integer: (in knots) climbspeed robust would be our rules. As we discuss later, we found integer: (feet per second) E/W distance that it was better to collect many logs from a small num- real: E/W distance from centre of run- ber of pilots. All the results presented below are derived way (in feet) altitude from the logs of three subjects who each ‘flew’ 30 times. real: (in feet) N/S distance real: N/S distance from northern end of At the start of a flight, the aircraft is pointing North, runway (in feet) down the runway. The subject is required to fly a well-de- fuel integer: (in pounds) fined flight plan that consists of the following manoeu- real: ± 4.3 rollers vres: real: ± 3.0 elevator rudder real: not used 1. Take off and fly to an altitude of 2,000 feet. thrust integer: 0 to 100% flaps integer: 0˚, 10˚ or 20˚ 2. Level out and fly to a distance of 32,000 feet from the starting point. The elevation of the aircraft is the angle of the nose rela- 3. Turn right to a compass heading of approximately tive to the horizon. The azimuth is the aircraft’s compass 330˚. The subjects were actually told to head toward a heading and the twist is the angle of the wings relative to particular point in the scenery that corresponds to that the horizon. The elevator angle is changed by pushing the heading. mouse forward (positive) or back (negative). The rollers are changed by pushing the mouse left (positive) or right 4. At a North/South distance of 42,000 feet, turn left to (negative). Thrust and flaps are incremented and decre- head back towards the runway. The scenery contains mented in fixed steps by keystrokes. The angular effects of grid marks on the ground. The starting point for the the elevator and rollers are cumulative. For example, in - 2 -
Recommend
More recommend