Advanced Topics on Heterogeneous System Architectures Performance and Cost � - Hennessy Patterson chapter 1- � Politecnico di Milano � Seminar Room @ DEIB � 30 November, 2017 � Antonio R. Antonio R. Miele Miele � Marco D. Santambrogio Marco D. Santambrogio � Politecnico di Milano �
2 Lectures � • Agenda � – (1) L1: Course introduction – 29 Nov, @ 1.30pm (3h) � – (1) L2: Computer Architecture – 30 Nov, @ 1.30pm (3h) � – (1) L3: FPGA – 4 Dec, @1.30pm (3h) � – (1) L4: FPGA – 5 Dec, @ 1.30pm (3h) � – (1) L5: GPU – 11 Dec, @ 1.30pm (3h) � – (2) L6: OpenCL– 12 Dec, @ 2.30pm (3h) � – (3) L7: OpenCL/Runtime management – 14 Dec, @ 9am (3h) � – (1) L8: Runtime management – 18 Dec, @ 9am (3h) � • Location � 1. @ Seminar Room, Bld 20 � 2. @ N11 � 3. @Seminar Room A. Alario, Bld 21 �
3 Outline � • Measures to evaluate performance � • Quantifying the design process � – Amdahl’s law � – CPU time and CPI � • Other metrics: MIPS and MFLOPS � • Summarize performance � • Energy/Power � • Cost � 3
4 Computer Technology � • Performance improvements: � – Improvements in semiconductor technology � • Feature size, clock speed � – Improvements in computer architectures � • Enabled by HLL compilers, UNIX � • Lead to RISC architectures � – Together have enabled: � • Lightweight computers � • Productivity-based managed/interpreted programming languages �
5 Single Processor Performance � Move to multi-processor RISC 5
6 Nowadays… � • Tegra 2 � – Dual-Core ARM Cortex-A9 � ASUS Eee Pad Slider Tablet – ULP GeForce, 8 cores � Motorola Photon 4G • A6 � – Dual-Core based on ARMv7 � – Triple-core PowerVR SGX 543MP3 GPU � • Tegra 3 � – Quad-Core � – ULP GeForce, 12 cores � HTC One X
7 Moreover… Different Classes of Computers � • Personal Mobile Device (PMD) � – e.g. start phones, tablet computers � – Emphasis on energy efficiency and real-time � • Desktop Computing � – Emphasis on price-performance � • Servers � – Emphasis on availability, scalability, throughput � • Clusters / Warehouse Scale Computers � – Used for “Software as a Service (SaaS)” � – Emphasis on availability and price-performance � – Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks � • Embedded Computers � – Emphasis: price �
8 Issues as new opportunities � • Programming has become very difficult � – Impossible to balance all constraints manually �
9 Issues as new opportunities � • Programming has become very difficult � – Impossible to balance all constraints manually � • More computational horse-power than ever before � § Cores are free �
10 Issues as new opportunities � • Programming has become very difficult � – Impossible to balance all constraints manually � • More computational horse-power than ever before � § Cores are free � • Energy is new constraint � § Software must become energy and space aware �
� 11 Performance Evaluation � • When we say that one computer is faster than another what do we mean? � – It depends on what is important � • Two Metrics: � • Computer system user � – Minimize elapsed time for program execution: � response time: � execution time = time_end – time_start � • Computer center manager � – Maximize completion rate = #jobs/sec � – throughput: total amount of work done in a given time �
12 Response time vs throughput � • Is throughput = 1/average response time? � – YES only if NO overlap � – Otherwise throughput > 1/average response time � – Example: � • A lunch buffet with 5 stations � • Each person takes 2 minutes at each station � • Time per person to fill up the tray is 10 minutes � • BUT throughput is 1 person every 2 minutes � • WHY? � • Overlap: 5 people simultaneously filling the tray � • Without overlap throughput = 1/10 � 12
13 Overview of Factors Affecting Performance � • Algorithm complexity and data set � • Compiler � • Instruction set � • Available operations � • Operating system � • Clock rate � • Memory system performance � • I/O system performance and overhead �
14 The Bottom Line: Performance (and Cost) � Time to run the task (ExTime) � – Execution time, response time, latency � Tasks per day, hour, week, sec, ns … (Performance) � – Throughput, bandwidth �
15 The Bottom Line: Performance � "X is n 'mes faster than Y" means ExTime(Y) Performance(X) --------- = --------------- ExTime(X) Performance(Y) Speed of Concorde vs. Boeing 747 1350/610 = 2.2 Throughput of Boeing 747 vs. Concorde 286700/178200 = 1.6
16 Speedup � “X is n% faster than Y” ⇒ execution time (y) = 1 +__n__ � execution time (x) 100 � performance(x) = ___ 1 execution_time(x) � “X is n% faster than Y” ⇒ performance(x) = 1 + __n__ � performance(y) 100 �
17 Speedup � “X is n% faster than Y” ⇒ execution time (y) = 1 +__n__ � execution time (x) 100 � performance(x) = ___ 1 execution_time(x) � “X is n% faster than Y” ⇒ performance(x) = 1 + __n__ � performance(y) 100 � Speedup(x,y)= Performance(x)/Performance(y) �
18 Focus on the Common Case � • Common sense guides computer design � – Since its engineering, common sense is valuable �
19 Focus on the Common Case � • Common sense guides computer design � – Since its engineering, common sense is valuable � My personal view �
� 20 Focus on the Common Case � • Common sense guides computer design � – Since its engineering, common sense is valuable � • In making a design trade-off, favor the frequent case over the infrequent case � – E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it 1st � – E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st �
� 21 Focus on the Common Case � • Common sense guides computer design � – Since its engineering, common sense is valuable � • In making a design trade-off, favor the frequent case over the infrequent case � – E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it 1st � – E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st �
22 Frequent case � • Frequent case is often simpler and can be done faster than the infrequent case � • What is frequent case and how much What is frequent case and how much performance improved by making case faster performance improved by making case faster � 22
23 How to do it? �
24 Frequent case � • Frequent case is often simpler and can be done faster than the infrequent case � • What is frequent case and how much What is frequent case and how much performance improved by making case faster performance improved by making case faster Amdahl ’ s Law => => Amdahl s Law � 24
� � � � � � � 25 Amdahl's Law � Speedup due to enhancement E: � ExTime w/o E Performance w/ E Speedup(E) = --------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected �
Amdahl ’ s Law � Fraction ⎡ ⎤ enhanced ExTime ExTime Fraction ( ) 1 = × − + new old enhanced ⎢ ⎥ Speedup enhanced ⎣ ⎦ ExTime 1 old Speedup = = overall Fraction ExTime enhanced Fraction new ( 1 ) − + enhanced Speedup enhanced Best you could ever hope to do: 1 Speedup = maximum 1 - Fraction ( ) enhanced
27 Exercise on Amdahl’s Law � Let’s assume that we can improve the CPU speed 5X (with a 5X cost). � Suppose that the CPU is used 50% of the time and that the base CPU cost is 1/3 of the entire system � Is it worth to upgrade the CPU? � Compare speedup and costs! �
� � 28 Solution � • Speedup=1/(0.5+0.5/5)=1.67 � • Increased= (2/3)+(1/3)*5=2.33 �
� � 29 Solution � • Speedup=1/(0.5+0.5/5)=1.67 � • Increased= (2/3)+(1/3)*5=2.33 � It is not worth to � upgrade the CPU! �
� 30 Amdahl’s Law � • Expresses the law of diminishing return � • Corollary � If an enhancement is only usable for a fraction of a task we can’t speed up the task by more than the reciprocal of 1 minus the fraction � Serves as a guide to how much an enhancement will improve performance and how to distribute resources to improve cost/performance �
� 31 Breaking down performance � • A program is broken into instructions � – Hardware is aware of instructions not programs � • At lower level hardware breaks instructions into clock cycles � – Lower level state machines change state every cycle � For example � 500 MHz P-III runs 500M cycles/sec, 1 cycle = 2 ns � 2 GHz P-IV runs 2G cycles/sec, 1 cycle = 0,5 ns � 31
Recommend
More recommend