models of architecture
play

Models of Architecture Maxime Pelcat INSA Rennes, IETR, Institut - PowerPoint PPT Presentation

Models of Architecture Maxime Pelcat INSA Rennes, IETR, Institut Pascal Nokia Bell Labs 2018 This work has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 732105: CERBERO. INSA


  1. Models of Architecture Maxime Pelcat INSA Rennes, IETR, Institut Pascal Nokia Bell Labs 2018 This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732105: CERBERO.

  2. INSA Rennes – IETR VAADER • INSA Rennes • IETR VAADER • Institut Pascal 2

  3. Models of Architecture • Abstracting computational architecture to – Predict performance – Support current hardware evolutions

  4. Motivation: architecture evolution • Hardware Architectures are becoming – More complex – More heterogeneous – More High Performance embedded Computing (HPeC) • Embedded deep learning, near-sensor computing, fog computing, edge computing, many-cores, etc. • Real-time constraints, stream processing applications

  5. Motivation: HPeC architectures • Let’s look at ARM-based HPeC – Let us consider 4 heterogeneous solutions • ARM = control path + some of the data path • in red: data path Multi-ARM Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP

  6. Motivation: HPeC architectures • Let’s look at ARM-based HPeC Multi-ARM Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP

  7. Motivation: HPeC architectures • ARM big.LITTLE: Samsung Exynos 5422 2GHz 1.4GHz 2MB 0.5MB A15 A15 A7 A7 High Low Performance energy SCU ACE SCU cores cores A15 A15 A7 A7 Easy to program Linux SMP DDR (PoP) Thread migration 2GB 12Gflops <10W

  8. Motivation: HPeC architectures • Let’s look at ARM-based HPeC Multi-ARM Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP

  9. Motivation: HPeC architectures • Multi-ARM + GPGPU: Nvidia Jetson TX1 module Control path Data path 256-core Maxwell GPGPU A57 A57 32 cores /warp SCU 4GB external A57 A57 DDR on 1.6GHz module H.264 Less easy to program Linux SMP + 4K 60Hz CUDA/OpenCL

  10. Motivation: HPeC architectures • Let’s look at ARM-based HPeC Multi-ARM Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP

  11. Motivation: HPeC architectures • Multi-ARM + DSP: Texas Instruments Keystone II TCI6638K2K 1.2GHz Data path 1MB 1MB 1MB 1MB Control path C66 C66 C66 C66 A15 A15 6MB MSMC SCU Teranet 4MB FFTC A15 A15 C66 C66 C66 C66 1.4GHz 1MB 1MB 1MB 1MB Difficult to program (well) Linux SMP + Open Event Machine 160 Gflops <15W

  12. Motivation: HPeC architectures • Let’s look at ARM-based HPeC Multi-ARM Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP

  13. Motivation: HPeC architectures • Multi-ARM + FPGA: Xilinx Zynq Ultrascale + Not GPGPU Control path GPU Data path A53 A53 Up to 4MB Switch FPGA SCU 1MB 1MFF fabric 0.5MLUT A53 A53 1.5GHz R5 R5 600MHz More difficult to program (well) Linux SMP + HLS or HDL

  14. Motivation: HPeC architectures • Current trends – FPGAs are gaining importance: what about flops? – Adding video/image accelerators • Video Compression: H.264/AVC, H.265/HEVC, etc. • AI: For tensor applications  reach 1Tops/W – RISC-V as an open HW competitor to ARM

  15. Motivation: architecture evolution • Towards more complexity – More cores, hierarchies of clusters – Heteronegeneity, Interconnect complexity • Reminds intra-core modifications in XXth SIMD VLIW ALU + + + + Ld × + Str clk clk clk

  16. Motivation: architecture evolution • But there are some differences between intra- core and inter-core parallelism – At coarse grain, PEs communicate asynchronously – There is no (or less) centralized processing decision – There is no performance portability (nothing equivalent to C-to-VLIW compilers) • How can/should we manage this HW complexity? – Can we predict performance at design time? How?

  17. System Objectives Reliability Peak Power T ° C Energy Performance Memory $ $ Unit Cost Security Maintenance Cost Maxime Pelcat 17

  18. System Design: Y-Chart Application Algorithm Architecture Redesign Redesign Design System Prototype Maxime Pelcat 18

  19. Model-Based Design Model of Architecture (MoA) Model of Computation(MoC) conforms to conform to Algorithm Model Algorithm Architecture Model Redesign Redesign KPI Evaluation KPI Maxime Pelcat 19

  20. On MoC Side: Many Results • #EdwardALee, #ProgrammingParadigms • Discrete Event MoCs • Finite State Machines  imperative languages • Functional MoCs • Petri Nets • Dataflow MoCs SDF, CSDF, IDF, IBSDF, PSDF, SPDF, PiSDF, etc. PREESM Maxime Pelcat 20

  21. Dataflow MoCs Case And they are not all here… Feature SDF ADF IBSDF DSSF PSDF PiSDF SADF SPDF DPN KPN Expressivity Low Med. Turing complete Hierarchical X X X X Compositional X X X Reconfigurable X X X X X X Statically schedulable X X X X Decidable X X X X (X) (X) X (X) Variable rates X X X X X X X Non-determinism X X X SDF: Synchronous Dataflow PiSDF Parameterized and Interfaced SDF ADF: Affine Dataflow SADF: Scenario-Aware Dataflow IBSDF: Interface-Based Dataflow SPDF: Schedulable Parametric Dataflow DSSF: Deterministic SDF with Shared Fifos DPN: Dataflow Process Network PSDF: Parameterized SDF KPN: Kahn Process Network

  22. But Still a Lot to Do • on Real-Time Multicore systems especially • Usually, RT application specification = T2 – Multiple tasks sharing resources T1 – Activation periods or triggering events • Objective = keeping resources busy T3 Maxime Pelcat 22

  23. MoCs are not sufficient Model of Computation(MoC) conforms to Algorithm Model Algorithm Energy Evaluation Energy Maxime Pelcat 23

  24. Models of Architecture Model of Architecture (MoA) conform to Algorithm Model Algorithm Architecture Model Redesign Redesign KPI Evaluation KPI Maxime Pelcat 24

  25. Problem: Predict System Quality • How to predict a system « quality » ? – Efficiently (simple procedure) – Early (from abstract models) – Accurately (with a good fidelity) – With reproducibility (same models = same prediction) Maxime Pelcat 25

  26. Model of Architecture • Definition – Model of a system Non-Functional Property – Application-independent – Abstract – Reproducible Pelcat, M; Mercat, A; Desnos, K; Maggiani, L; Liu, Y; Heulot, J; Nezan, J-F; Hamidouche, W; Ménard, D; Bhattacharyya, S (2017) "Reproducible Evaluation of System Efficiency with a Model of Architecture: From Theory to Practice", IEEE TCAD. Maxime Pelcat 26

  27. Model of Architecture Model Reproducible Application- Abstract independent    AADL    MCA SHIM   /   UML MARTE    AAA    CHARMED    S-LAM    MAPS    LSLA Maxime Pelcat 27

  28. Model of Architecture Model G conforms to MoC Model H conforms to MoA Activity MoA depends on MoC One and always the same quality evaluation Reliability Power NFP = MoA( ) activity( ) MoC( ) application Energy Performance Memory Security Cost T ° C Maxime Pelcat 28

  29. Model of Architecture MoC MoA Act KPI Maxime Pelcat 29

  30. LSLA: First MoA • LSLA = Linear System-Level Architecture Model • Motivated by the additive nature of energy consumption Maxime Pelcat 30

  31. System Objectives Reliability Peak Power T ° C Energy Performance Memory $ $ Unit Cost Security Maintenance Cost Maxime Pelcat 31

  32. Energy/Power Define Architecture Need a dissipator Need a fan HPeC 0 20W 20kW 20MW 2W 7W Embedded system HPC Dedicated system influence or conventional system

  33. LSLA Model of Architecture Task3 Task2 1 1 1 signal signal Task5 Task1 1 1 1 1 Task4 token quantum 16+12+22=50 2x+0 3x+0 PE1 PE2 CN Compositional Maxime Pelcat 33 10x+1

  34. LSLA Model of Architecture Task3 Task2 1 1 1 signal signal Task5 SDF: Model of Computation Task1 1 1 1 1 Task4 Activity 16+12+22=50 2x+0 3x+0 PE1 PE2 CN LSLA: Model of Architecture Maxime Pelcat 34 10x+1

  35. LSLA MoA for Energy Prediction • 86% of fidelity on octo-core ARM  Maxime Pelcat 35

  36. LSLA MoA for Energy Prediction • The model is learnt from energy measurements PE PE CN PE PE CN PE PE CN PE PE Maxime Pelcat 36

  37. LSLA MoA for Energy Prediction • The model is learnt from energy measurements 1.5W 1.5W PE PE α CN 1.5W 1.5W PE PE CN 0.3W 0.3W PE PE β CN 0.3W 0.3w γ PE PE Maxime Pelcat 37

  38. LSLA: MoA, not MoHW • LSLA models HW + communication libraries + scheduler + Oss +… • LSLA models the service the platform offers to the applications • Top-down approach – Learning parameters from experiments Maxime Pelcat 38

  39. System Objectives Reliability Peak Power T ° C Energy Memory Latency $ $ Unit Cost Security Maintenance Cost Maxime Pelcat 39

  40. MoAs: Limits of LSLA  • Energy  Linear model OK • Latency ! • Latency does not have an additive nature 1 Latency = sum Task1 Task2 1 1 Task1 1 1 Latency = max 1 Task2 1 Maxime Pelcat 40

  41. Activity & MoA for Latency SDF Task3 Task2 1 1 1 signal signal Task5 Task1 1 1 1 1 Task4 a) b) c) Maxime Pelcat 41

  42. Activity & MoA for Latency Σ  12+12+11=35 a) b) Σ  8+6+11=25 max(35,25)=35 2x+0 3x+0 PE1 PE2 CN MaxPlus Maxime Pelcat 42 10x+1

  43. Activity & MoA for Latency Σ  24 c) 2x+0 3x+0 PE1 PE2 CN Maxime Pelcat 43 10x+1

Recommend


More recommend