Lesson 5 – Low-level control and learning Anders Lyhne Christensen, D6.05, anders.christensen@iscte.pt INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS
Overview � Low-level control � Ad-hoc � Sense-think-act loop � Event driven control � Learning and adaptation I � Types of learning � Issues in learning � Example: Q-learning applied to a real robot (next time, we will discuss an interesting approach to learning called evolutionary robotics in more detail)
Low-level control We will cover three types of low-level control: � Stream of instructions � Classic Control loop � Event-driven languages Other approaches such as logic programming exist, but we will not cover those in this course.
Stream of instructions Example: // move forward for 2 seconds: moveForward(speed = 10) sleep(2000) if (obstacleAhead()) { turnLeft(speed = 10) sleep(1000) } else { … } � Suitable for industrial, assembly line robots � Easy to describe a fixed, predefined task as a recipe � Little branching
Classic control loop Sense Think Act The loop usually has a fixed duration, e.g. 100 ms and is called repeatedly
Classic control loop Sense Think Act The loop usually has a fixed duration, e.g. 100 ms and is called repeatedly
Classic control loop while (!Button.ESCAPE.isPressed()) { long startTime = System.currentTimeMillis(); sense(); // read sensors think(); // plan next action act(); // do next action try { Thread.sleep(100 – ( System.currentTimeMillis() – startTime)); } catch (Exception e) {} }
Event-driven languages URBI script – examples: Ball tracking: whenever (ball.visible) { headYaw.val += camera.xfov * ball.x & headPitch.val += camera.yfov * ball.y }; Interaction: at (speech.hear("hello")) { voice.say("How are you?") & robot.standup(); }
Distributed and event-driven Proximity sensors microcontroller Event bus Left motor microcontroller Right motor microcontroller ….
Based on slides from Prof. Lynn E. Parker LEARNING AND ADAPTATION
What is Learning/Adaptation? � Many definitions: � Modification of behavioral tendency by experience. (Webster 1984) � A learning machine, broadly defined, is any device whose actions are influenced by past experiences. (Nilsson 1965) � Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population. (Simon 1983) � An improvement in information processing ability that results from information processing activity. (Tanimoto 1990) � Our operational definition: � Learning produces changes within an agent that over time enable it to perform more effectively within its environment.
What is Relationship between Learning and Adaptation? � Evolutionary adaptation: Descendents change over long time scales based on the success or failure of their ancestors in the environment � Structural adaptation: Agents adapt their morphology with respect to the environment � Sensor adaptation: An agent’s perceptual system becomes more attuned to its environment � Behavioral adaptation: An agent’s individual behaviors are adjusted relative to one another � Learning: Essentially anything else that results in a more ecologically fit agent (can include adaptation).
Habituation and Sensitization � Adaptation may produce habituation or sensitization � Habituation: � An eventual decrease in or cessation of a behavioral response when a stimulus is presented numerous times � Useful for eliminating spurious or unnecessary responses Example of habituation � Generally associated with relatively insignificant stimuli, such as loud noise � Sensitization: � The opposite – an increase in the probability of a behavioral response when a stimulus is repeated frequently � Generally associated with “dire” stimuli, like electrical shocks Sensitization
Learning � Learning, on the other hand, can improve performance in additional ways: � Introducing new knowledge (facts, behaviors, rules) into the system � Generalizing concepts from multiple examples � Specializing concepts for particular instances that are in some way different from the mainstream � Reorganizing the information within the system to be more efficient � Creating or discovering new concepts � Creating explanations of how things function � Reusing past experiences
AI Research has Generated Several Learning Approaches � Reinforcement learning: rewards and/or punishments are used to alter numeric values in a controller � Evolutionary learning: Genetic operators such as crossover and mutation are used over populations of controllers, leading to more efficient control strategies � Neural networks: A form of reinforcement learning that uses specialized architectures in which learning occurs as the result of alterations in synaptic weights � Learning from experience: � Memory-based learning: myriad individual records of past experiences are used to derive function approximators for control laws � Case-based learning: Specific experiences are organized and stored as a case structure, then retrieved and adapted as needed based on the current situational context
Learning Approaches (con’t.) � Inductive learning: Specific training examples are used, each in turn, to generalize and/or specialize concepts or controllers � Explanation-based learning: Specific domain knowledge is used to guide the learning process � Multistrategy learning: Multiple learning methods compete and cooperate with each other, each specializing in what it does best
Challenges with Learning � Credit assignment problem: How is credit or blame assigned to a particular piece or pieces of knowledge in a large knowledge base, or to the components of a complex system responsible for either the success or failure of an attempt to accomplish a task? � Saliency problem: What features in the available input stream are relevant to the learning task? � New term problem: When does a new representational construct (concept) need to be created to capture some useful feature effectively? � Indexing problem: How can a memory be efficiently organized to provide effective and timely recall to support learning and improved performance? � Utility problem: How does a learning system determine that the information it contains is still relevant and useful? When is it acceptable to forget things?
Example: Q-Learning Algorithm � Provides ability to learn by determining which behavioral actions are most appropriate for a given situation � State-action table: Actions, a State, x Next state y , utility E(y) � E(y) = utility of state y
Update function for Q(x,a) � Q(x,a) � Q(x,a) + β (r + λ E(Y) – Q(x,a)) � Where: � β is learning rate parameter � r is the payoff (reward or punishment) � λ is a parameter, called the discount factor, ranging between 0 and 1 � E(y) is the utility of the state y that results from the action and is computed by: E(y) = max(Q(y,a)) for all actions a � Reward actions are propagated across states so that rewards from similar states can facilitate learning, too. � What is “similar state”? One approach: Weighted Hamming Distance
Utility Function Used to Modify Robot’s Behavioral Responses Initialize all Q(x,a) to 0. Do Forever � Determine current world state s via sensing � 90% of the time choose action a that maximizes Q(x,a) else pick random action � Execute a � Determine reward r � Update Q(x, a) as described � Update Q(x’,a) for all states x’ similar to x End Do
Example of Using Q-Learning: Teaching Box-Pushing � Robot (Obelix): � 8 sonar (4 look forward, 2 look right, 2 look left) � Sonar quantized into two ranges: � NEAR (from 9-18 inches) � FAR (from 18-30 inches) � Forward-looking infrared (IR): � Binary response of 4 inches to indicate when robot in BUMP state � Current to drive motors monitored to determine if robot is STUCK (i.e., input current exceeds a threshold) � Total of 18 bits of sensor information available: 16 sonar bits (NEAR, FAR), two for BUMP and STUCK � Motor control outputs – five choices: Obelix robot and box, 1991 � Moving forward � Turning left 22 degrees � Turning right 22 degrees � Turning more sharply left at 45 degrees � Turning more sharply right at 45 degrees
Robot’s Learning Problem � Learning Problem: � Deciding, for any of the approximately 250,000 perceptual states, which of the 5 possible actions will enable it to find and push boxes around a room without getting stuck 5 actions 250,000 perceptual states = 250,000 x 5 = 1,250,000 state/action pairs to explore!
State Diagram of Behavior Transitions Anything else Finder STUCK + ∆ t Unwedger BUMP • Finder: moves robot toward possible boxes STUCK • Pusher: occurs after STUCK BUMP results from box Pusher find • Unwedger: removes BUMP + ∆ t robot when box is no longer pushable BUMP
Measurement of “State Nearness” � Use 18-bit representation of state (16 for sonar, two for BUMP and STUCK) � Compute Hamming distance between states � Recall: Hamming distance = number of bits in which the two states differ � For this example, states were considered “near” if Hamming distance < 3
Recommend
More recommend