Architecture and Design Methodology for Autonomic Systems-on-Chip (ASoC) A. Bernauer, A. Bouajila, J. Zeppenfeld, W. Stechele, O. Bringmann, A. Herkersdorf, W. Rosenstiel Universität Tübingen Technische Universität München FZI - Forschungszentrum Informatik
Project Reminder Functional Autonomic SoC elements SoC elements Autonomic Element Application Architecture Requirements & AUTONOMIC Layer Characteristics Characteristics Performance Reliability Power FE/AE Parameter Selection Optimization Architecture FE/AE Model Evaluation Functional FUNCTIONAL Layer Element October 7, 2010 ASoC - Architecture and Design Methodology for SoC 2
Phase 3 Work Packages Distributed XCS Demonstrator Combining XCS and LCT Estimating minimum population size Run-time reliability optimization October 7, 2010 ASoC - Architecture and Design Methodology for SoC 4
Distributed XCS • Cooperating XCS instances: benefits for self-adaptation? • Simplified Cell processor, different cooling properties for each core • Varying ambient temperature and activity • Goal: maximum performance, but no timing errors • Configurations – Topology: uni-, bidirectional, complete graph – Emigration strategy: random, numerosity, prediction error – Deletion strategy: by fitness, prediction error – Don’t care probability: 0.1, 0.3, 0.6, 0.9 October 7, 2010 ASoC - Architecture and Design Methodology for SoC 5
Isolated XCS P # =0.3 P # =0.9 SXU2 October 7, 2010 ASoC - Architecture and Design Methodology for SoC 6
Distributed XCS Topology: complete Emigration: selecting by fitness Deletion: selecting by prediction error P # =0.3 P # =0.9 SXU2 October 7, 2010 ASoC - Architecture and Design Methodology for SoC 7
Phase 3 Work Packages Combining XCS and LCT October 7, 2010 ASoC - Architecture and Design Methodology for SoC 8
Combining XCS and LCT [BICC10] Initial rule set Rule-set translation Design-time Run-time rule set rule set Fitness Rule set Fitness Rule set update update update update Action Action HW SW Run-time learning (LCT) Design-time learning (XCS) Hardware Software Goals: • retain capability to self-adapt to unforeseen events • low hardware overhead October 7, 2010 ASoC - Architecture and Design Methodology for SoC 9
Configurations and benchmarks [BICC10] • Rule translation – all-XCS: translate all XCS rules to LCT rules – top-XCS: translate only top XCS rules • Action selection – roulette-wheel: randomly, reward-weighted – winner-takes-all: highest reward • Benchmarks 0 1 2 – Multiplexer – Task allocation 3 4 5 – SoC component parameterization 6 7 8 task allocation on 9 cores October 7, 2010 ASoC - Architecture and Design Methodology for SoC 10
Task allocation benchmark [BICC10] L: Number of cores i: Number of tasks October 7, 2010 ASoC - Architecture and Design Methodology for SoC 11
Task allocation benchmark [BICC10] L: Number of cores i: Number of tasks Single core failure in LCT Double core failure in LCT Self-adaptation at chip level with all-XCS rule translation and winner-takes-all action selection strategy October 7, 2010 ASoC - Architecture and Design Methodology for SoC 12
Phase 3 Work Packages Estimating minimum population size October 7, 2010 ASoC - Architecture and Design Methodology for SoC 13
Estimating minimum population size N [IJCNN10] • State of the art: Calculate N for “regular” problems – “Regular” problem: constant number of relevant bits per problem instance – Example: Multiplexer problem 01 1101 0 (3 relevant bits) • Our extension: Calculate N for “complex” problems – “Complex” problem: variable number of relevant bits per problem instance – Example: 2-out-of-4 task allocation problem 1100 2 (2 relevant bits) allocation possible allocation impossible 1110 0 (3 relevant bits) October 7, 2010 ASoC - Architecture and Design Methodology for SoC 14
Estimating minimum population size N [IJCNN10] Experimental results: → Estimated N is an upper bound; Performance penalty if using smaller N (e.g., when using only 0.75N, only 70% of the problem instances have correctness rate >90%) October 7, 2010 ASoC - Architecture and Design Methodology for SoC 15
Trading classifiers for accuracy [IJCNN10] • Idea: subsume classifiers during rule translation, after learning • Higher correctness rate than when subsuming during learning (SASO [9]) • Comparable population size with SASO [9] October 7, 2010 ASoC - Architecture and Design Methodology for SoC 16
Phase 3 Work Packages Run-time reliability optimization October 7, 2010 ASoC - Architecture and Design Methodology for SoC 17
Run-time reliability optimization Modular redundancy Series October 7, 2010 ASoC - Architecture and Design Methodology for SoC 18
Phase 3 Work Packages Demonstrator October 7, 2010 ASoC - Architecture and Design Methodology for SoC 19
Demonstrator – Hardware • Monitors – Frequency (3 bit) Learning Classifier Table Condition Action Fitness – Utilization (3 bit) 1 X X 0 : 1001 11 Fitness – Workload difference (2 bit) X 0 1 X : 1010 3 Update . . . . . . . . . • X 1 0 X : 1100 15 Actuators – Frequency (4 bit) – Task migration (1 bit) Core1 Core2 Core3 • Evaluator Bus – Learning classifier system UART MEM adapted for efficient HW MAC implementation • Communicator – Sharing of global information – Migration of tasks October 7, 2010 ASoC - Architecture and Design Methodology for SoC 20
Demonstrator – Tasks • Test tasks 1-N – Adjustable, synthetic workloads Learning Classifier Table T3 – After completion, pass data to Condition Action Fitness next task 1 X X 0 : 1001 11 T2 T5 Fitness X 0 1 X : 1010 3 • Update User interface . . . . . . . . . X 1 0 X : 1100 15 T4 T1 – Allow for user interaction • Data generation – Replaces packet reception for Core1 Core2 Core3 standalone demonstration – Generates new data packet Bus every 100µs UART MEM • UART MAC – Pass buffered output data to the UART October 7, 2010 21 ASoC - Architecture and Design Methodology for SoC
Demonstration
Hardware Overheads Flip-Flops LUTs BRAMs Mult. Overhead – Leon3 1749 8936 28 1 Leon3 AE 2122 10213 29 2 14.3% LCT 66 116 1 1 1.4% Act Task. 57 299 0 0 3.5% Act Freq. 7 19 0 0 0.2% Mon Util. 35 74 0 0 0.8% Mon Load 20 40 0 0 0.5% AE IF 173 399 0 0 4.5% Synthesis results for Xilinx Virtex 4 VLX100 October 7, 2010 ASoC - Architecture and Design Methodology for SoC 23
Phase 3 Progress • 3 rd LCS-Workshop, Tübingen, July 1-2, 2010 October 7, 2010 ASoC - Architecture and Design Methodology for SoC 24
Future work • Continue work on Autonomic Layer reliability • Complete dependability assessment • Reduce simulation time due to temperature estimation • Extend theoretical analysis to run-time system • Complete work on demonstrator October 7, 2010 ASoC - Architecture and Design Methodology for SoC 25
Summary • Distributed XCS, solves previously difficult to solve problems • Combining Software XCS with Hardware LCT for Lightweight On-Chip Learning • Estimating minimum population size N • Trading classifiers for accuracy • Demonstrator runs 3 Leon3 cores, distributes tasks and adjusts frequency autonomously October 7, 2010 ASoC - Architecture and Design Methodology for SoC 26
Recent Publications • [ARCS10] J. Zeppenfeld, A. Herkersdorf. Autonomic Workload Management for Multi-Core Processor Systems , ARCS, Hannover, Germany, February 22-25, 2010. • [SORT10] J. Zeppenfeld, A. Bouajila, A. Herkersdorf, W. Stechele. Towards Scalability and Reliability of Autonomic Systems on Chip , 1 st IEEE Workshop on Self-Organizing Real-Time Systems, Carmona, Spain, May 4, 2010. • [IJCNN10] B. Rakitsch, A. Bernauer, O. Bringmann, W. Rosenstiel. Pruning population size in XCS for complex problems, International Joint Conference on Neural Networks at the World Congress on Computational Intelligence (WCCI), Barcelona, Spain, July 18-23, 2010. • [BICC10] A. Bernauer, J. Zeppenfeld, O. Bringmann, A. Herkersdorf, W. Rosenstiel. Combining software and hardware LCS for lightweight on-chip learning, DIPES/BICC 2010, IFIP AICT 329, p.279-290, Brisbane, Australia, September 20-23, 2010. October 7, 2010 ASoC - Architecture and Design Methodology for SoC 27
Recommend
More recommend