15-382 C OLLECTIVE I NTELLIGENCE - S18 L ECTURE 19: S WARM I NTELLIGENCE 5 / A NT C OLONY O PTIMIZATION 1 I NSTRUCTOR : G IANNI A. D I C ARO
SHORTEST PATHS WITH PHEROMONE LAYING-FOLLOWING Nest Nest t = 0 t = 1 Food Food Pheromone Intensity Scale Nest Nest t = 2 t = 3 Food Food #Pheromone on a branch ∝ Frequency of fw/bw crossing ∝ Length (quality) of paths 2
LET’S ABSTRACT A MORE COMPLEX SCENARIO 𝒚 2 𝒃 22 𝒃 21 𝒃 13 𝒚 7 Nest 𝒚 1 𝒃 12 Source Food 𝒃 11 𝒚 9 Target 𝒃 31 𝒚 3 𝒃 32 𝒚 5 𝒚 8 𝒚 4 Pheromone Intensity Scale 𝒚 6 • Multiple decision nodes: n decision states/nodes, 𝒚 1 , 𝒚 2 , …, 𝒚 n ∈ 𝒀 • Set 𝑩 of decisions / actions, 𝒃 1 , 𝒃 2 , … 𝒃 m, such that at each state 𝒚 a subset ( 𝒚 ) of actions is available or feasible • A path (ant solution ) is constructed through a sequence decisions, for each visited state • Multiple ants iterating path construction (i.e., foraging ) in parallel • A traveling cost is associated to each state transition: colony’s goal is to let the ants moving over the minimum-cost path between nest and food 3
LET’S ABSTRACT A MORE COMPLEX SCENARIO 2 τ ; η 12 η 12 Source 7 Terrain τ ; η 13 1 Morphology 13 Destination τ ; η 14 τ ; η 9 π 14 59 59 3 π ( τ η ) , Stochastic 5 ??? Decision Rule τ ; η 58 58 τ 8 Pheromone 4 Pheromone Intensity Scale 6 • Distributed Optimization Problem • At each state 𝒚 k only local information / constraints (+ some ant memory) is available for taking (a possibly optimized) decision 𝒃 ∈ ( 𝒚 k ) • Pheromone information (dynamic) , parametrized as a vector 𝜐 k (stigmergic variables) • Heuristic information (static, scenario-related) parametrized as a vector 𝜃 k • Ant behavior: Stochastic decision policy 𝜌 ɛ ( 𝒚 k ; 𝜐 k , 𝜃 k ), 𝜌 ɛ : 𝒀 ⟼ 𝑩 How ant colonies solve the Distributed MCP problem? Exploiting pheromone for learning the best (parameters) of the decision policy 4
A N T C O L O N I E S : I N G R E D I E N T S F O R S H O R T E S T PA T H S Forward 2 τ ; η 12 12 Source 7 τ ; η 13 1 13 Destination τ ; η 14 τ ; η 9 14 59 59 3 5 τ ; η 58 58 8 4 Pheromone Intensity Scale Backward 6 • A number of concurrent autonomous (simple?) agents (ants) • Forward-backward constructive path sampling based on the stochastic policy 𝜌 ɛ • Local laying and sensing of pheromone → Pheromone is dynamically updated • Step-by-step stochastic decisions biased by local pheromone intensity and by other local heuristic aspects (e.g., terrain) • Multiple paths are concurrently tried out and implicitly evaluated • Positive feedback e ff ect (local reinforcement of good decisions) • Iteration over time of the path sampling actions • Persistence ( exploitation ) and evaporation ( exploration ) of pheromone 5
FROM ANTS TO ACO: • Let’s mimic ant colonies, with some pragmatic modification …. • Once completed a solution / path: • The sampled solution is evaluated (e.g., sum of the individual costs) • “ Credit ” is assigned to each individual decision belonging to the solution • Pheromone updating: the value of the pheromone variables 𝜐 k associated to each decision in the solution are modified according to the “credit” • Pheromone values can also decade/change for other reasons (e.g., evaporation ) • Pheromone values locally encode how good is to take decision i vs. j as collectively estimated/learned by the agent population through repeated solution sampling Pheromone distribution 2 biases path construction τ ; η 12 π 12 Source 7 τ ; η 13 1 13 Destination τ ; η 14 τ ; η 9 14 59 59 3 Paths τ 5 τ ; η 58 58 Outcomes of path 8 construction modify 4 Pheromone Intensity Scale pheromone distribution 6 6
ANT COLONY OPTIMIZATION METAHEURISTIC: (VERY) GENERAL ARCHITECTURE • Solution construction • Monte Carlo path sampling by N (# states) joint probability distributions parametrized by 𝜐 and 𝜃 variable arrays • Sequential learning by Generalized Policy Iteration (GPI) 7
PHEROMONE AND HEURISTIC ARRAYS 2 τ ; η 12 12 Source 7 τ ; η 13 1 13 Destination τ ; η 14 τ ; η 9 14 59 59 3 5 τ ; η 58 58 8 4 Pheromone Intensity Scale 6 8
ACO FOR THE TRAVELING SALESMAN PROBLEM (TSP) Given G(V, E) find the Hamiltionian tour of minimal cost : NP-Hard 2 Every cyclic permutation of n 12 integers is a feasible solution 8 17 4 11 3 10 1 π 1 = (1 , 3 , 4 , 2 , 6 , 5 , 7 , 1) , π 2 = (2 , 3 , 4 , 5 , 6 , 7 , 1 , 2) 11 3 21 10 c ( π 2 ) = d 23 + d 34 + d 45 + d 56 + d 67 + d 71 + d 12 = 93 5 9 11 16 5 Read also as set of edges: 6 7 19 {(2,3), (3,4), (4,5), (6,7), (7,1), (1,2)} It’s easier to consider fully connected graphs, |E| = |V| |V-1|: If two nodes are not connect, d is infinite “Related” combinatorial optimization problems : VRPs, SOP , TO, QAP , … 9
ACO FOR THE TRAVELING SALESMAN PROBLEM (TSP) • Pheromone variables: 𝜐 ij ∈ ℝ + expresses how beneficial is ( estimated , up to now) to have edge ( i , j ) in the solution to optimize final tour length → |E| variables • Heuristic values 𝜃 ij ∈ ℝ + : problem costs c ij ∈ ℝ + for traveling from i to j → |E| variables 2 12 8 17 4 11 3 10 1 11 3 21 10 5 9 11 16 5 Solution construction strategies (no repair, no look-ahed) 6 7 19 • Extension: when ant k is in city i , how good is expected to include (feasible) city j (next in the solution sequence x k ( t )? → f ( 𝜐 ij, 𝜃 ij) • Insertion: how good is expected to insert (feasible) edge ( m,p) in the partial solution x k ( t )? → f ( 𝜐 mp 𝜃 mp) 10
(META-)ACO FOR CO PROBLEMS (CENTRALIZED SCHEDULE) Initialize � � j ( 0 ) to small random values and let t = 0 ; repeat Place n k ants on randomly chosen origin nodes; foreach ant k = 1 , . . . , n k do Construct a tour � k ( t ) [Update pheromone step-by-step]; path path Evaluate tour � k ( t ) ; end foreach [selected] edge ( � , j ) of the graph do Pheromone evaporation; end foreach [selected] ant k = 1 , . . . , n k do foreach [selected] edge ( � , j ) of � k ( t ) do path Update � � j using tour evaluation results; end end Daemon actions [Local search]; t = t + 1; until stopping condition is true; return best solution generated; 11
Recommend
More recommend