Rethinking Power, Resilience, and Sustainability Issues for Large-scale Computing and Storage Systems Data-intensive applications Devesh Tiwari Actionable analytical tools, Assistant Professor runtime systems and libraries, Northeastern University resource manager for improving tiwari@northeastern.edu application and system efficiency Selected Recent Publications under power, temperature, performance, DSN’16, DSN’15, DSN’14 resilience, and operating cost constraints SC’16 (2), SC’15 (4), SC’14 MICRO’16, HPCA’16, HPCA’15 IPDPS’16, IPDPS’14, IPDPS’12 USENIX FAST’13, HPCA’11 3 Best Paper Nominations Selected Recent PC Service DSN’17, ICDCS’17, IPDPS’17 CCGrid’17, HoStorage’16 Large-scale compute & storage systems ICDCS’16, IPDPS’16, SC’15
How to provision, manage, and utilize resources in a data center? Research Goal: Improving Cost-efficiency of data centers If you can’t convert something into dollars, it’s probably worth nothing. Power-capping, workload Improving operational efficiency, performance, reliability, data center power/cooling, and reliability of capex & opex cost optimizations heterogeneous data-center systems (relative to 1 core) CoMD+MPI MiniFE+MPI Snap+MPI 2.5 Peak Power 2 1.5 1 2 4 2 4 8 2 4 8 16 32 61 i7 Sandy Bridge Xeon Phi (A) 100% 100% 50% CDF Peak Pow er CDF 50% Peak Pow er Avg. Pow er 0% Avg. Pow er 0% 0% 50% 100% 150% 200% 0% 50% 100% 150% 200% Prediction Error on i7 Prediction Error on Sandy Bridge (C) Need to know how a large-scale system is designed, built, and operated? What are the design trade-offs? What are practical operational issues?
Effective Management and Utilization of Large-scale Systems Accurate peak power prediction for different workloads across platforms I7 (floor) (relative to 1 core) (relative to 1 core) CoMD+MPI MiniFE+MPI Snap+MPI 3 I7 (ceil) 2.5 Peak Power Peak Power Sandy Bridge (floor) Sandy Bridge (ceil) Xeon Phi (floor) 2 2 Xeon Phi (ceil) 1.5 1 1 2 4 8 16 32 61 1 2 4 2 4 8 2 4 8 16 32 61 # of Active Cores Sandy Bridge Xeon Phi i7 (A) Normalized Runtime Normalized Runtime Normalized Runtime Normalized Runtime Sandy Bridge Sandy Bridge 8 8 Sandy Bridge Sandy Bridge Instantaneous Instantaneous Instantaneous Instantaneous 100 100 100 100 4 OMP miniFE Power (W) Power (W) MPI miniFE Power (W) 4 Power (W) MPI CoMD OMP CoMD 2 2 1 1 50 50 50 50 8 4 8 4 0 2 1 2 1 0 0 0 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Normalized Runtime Normalized Runtime Normalized Runtime Normalized Runtime Xeon Phi Xeon Phi Xeon Phi Xeon Phi Instantaneous Instantaneous Instantaneous Instantaneous (W) (W) (W) (W)
Effective Management and Utilization of Large-scale Systems Accurate peak power prediction for different workloads across platforms I7 (floor) (relative to 1 core) (relative to 1 core) CoMD+MPI MiniFE+MPI Snap+MPI 3 I7 (ceil) 2.5 Peak Power Peak Power Sandy Bridge (floor) Sandy Bridge (ceil) Xeon Phi (floor) 2 2 Xeon Phi (ceil) 1.5 1 1 2 4 8 16 32 61 1 2 4 2 4 8 2 4 8 16 32 61 # of Active Cores Sandy Bridge Xeon Phi i7 (A) Normalized Runtime Normalized Runtime ��� �� ������������ �������� ������������� ����������� � 100 100 Sandy Bridge MPI FT Sandy Bridge MPI LU Prediction Error Prediction Error ��� 80 80 Peak Power Peak Power 8 8 ��� 60 60 4 4 40 40 ��� 2 2 20 20 1 1 �� 0 0 ����� ������ ����� ����� ������ ����� 0 50 100 0 50 100 �� � ���� ��� ��� ���� ��� Normalized Runtime Normalized Runtime ����������������
Effective Management and Utilization of Large-scale Systems Power, temperature, and reliability driven optimizations for workloads in data centers (guided by machine learning models) (a) (b) supercomputer ) ) 50 Real PRACTISE emperature( � C) 45 40 35 30 T 25 0 1 2 3 4 5 Time (hour)
Recommend
More recommend