Are in My Way Stanford Clean Slate CTO Summit James Hamilton, - PowerPoint PPT Presentation

Data Center Networks Are in My Way Stanford Clean Slate CTO Summit James Hamilton, 2009.10.23 VP & Distinguished Engineer, Amazon Web Services e: James@amazon.com web: mvdirona.com/jrh/work blog: perspectives.mvdirona.com work with Albert Greenberg, Srikanth Kandula, Dave Maltz, Parveen Patel, Sudipta Sengupta, Changhoon Kim, Jagwinder Brar, Justin Pietsch, Tyson Lamoreaux, Dhiren Dedhia, Alan Judge, & Dave O'Meara

Agenda • Where Does the Money Go? – Is net gear really the problem? • Workload Placement Restrictions • Hierarchical & Over-Subscribed • Net Gear: SUV of the Data Center • Mainframe Business Model • Manually Configured & Fragile at Scale • Problems on the Border • Summary 2009/10/23 http://perspectives.mvdirona.com 2

Where Does the Money Go? • Assumptions: – Facility: ~$200M for 15MW facility, 82% is power dist & mechanical (15-year amort.) – Servers: ~$2k/each, roughly 50,000 (3-year amort.) – Average server power draw at 30% utilization: 80% – Server to Networking equipment ratio: 2.5:1 (“Cost of a Cloud” data) – Commercial Power: ~$0.07/kWhr Monthly Costs Servers 4% Networking 15% Equipment 44% Power Distribution & Cooling 19% Power 18% Other Infrastructure 3yr server & 15 yr infrastructure amortization • Observations: • 62% per month in IT gear of which 44% in servers & storage • Networking 18% of overall monthly infrastructure spend Details at: http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx & http://perspectives.mvdirona.com/2009/03/07/CostOfACloudResearchProblemsInDataCenterNetworks.aspx 2009/10/23 http://perspectives.mvdirona.com 3

Where Does the Power Go? • Assuming a conventional data center with PUE ~1.7 – Each watt to server loses ~0.7W to power distribution losses & cooling – IT load (servers): 1/1.7=> 59% – Networking Equipment => 3.4% (part of 59% above) • Power losses are easier to track than cooling: – Power transmission & switching losses: 8% – Cooling losses remainder:100-(59+8) => 33% • Observations: – Server efficiency & utilization improvements highly leveraged – Cooling costs unreasonably high – Networking power small at <4% 2009/10/23 http://perspectives.mvdirona.com 4

Is Net Gear Really the Problem? • Networking represents only: – 18% of the monthly cost – 3.4% of the power • Much improvement room but not dominant – Do we care? • Servers: 55% Power & 44% monthly cost – Server utilization: 30% is good & 10% common • Networking in way of the most vital optimizations – Improving server utilization – Supporting data intensive analytic workloads 2009/10/23 http://perspectives.mvdirona.com 5

Workload placement restrictions • Workload placement over-constrained problem – Near storage, near app tiers, distant from redundant instances, near customer, same subnet (LB & VM Migration restrictions), … • Goal: all data center locations equidistant – High bandwidth between servers anywhere in DC – Any workload any place – Need to exploit non-correlated growth/shrinkage in workload through dynamic over-provisioning • Resource consumption shaping – Optimize for server utilization rather than locality • We are allowing the network to constrain optimization of the most valuable assets 2009/10/23 http://perspectives.mvdirona.com 6

Hierarchical & over-subscribed Internet Internet CR CR Data Center 80 to 240:1 AR AR AR AR Layer 3 … Oversubscription LB LB Layer 2 S S Key: • CR = L3 Core Router • AR = L3 Access Router S S S S … • S = L2 Switch • LB = Load Balancer • A = Rack of 20 servers … … with Top of Rack switch • Poor net gear price/performance forces 80 to 240:1 oversubscription • Constraints W/L placement and poor support for data intensive W/L – MapReduce, Data Warehousing, HPC, Analysis, .. • MapReduce often moves entire multi-PB dataset during single job • MapReduce code often not executing on node where data resides • Conclusion : Need cheap, non-oversubscribed 10Gbps 2009/10/23 http://perspectives.mvdirona.com 7

Net gear: SUV of the data center • Net gear incredibly power inefficient • Continuing with Juniper EX8216 example: – Power consumption: 19.2kW/pair – Entire server racks commonly 8kW to 10kW • But at 128 ports per switch pair, 150W/port • Typically used as aggregation switch – Assume pair, each with 110 ports “down” & 40 servers/rack – Only: 4.4W/server port in pair configuration • Far from dominant data center issue but still conspicuous consumption 2009/10/23 http://perspectives.mvdirona.com 8

Mainframe Business Model Central Logic Manufacture Central Logic Manufacture • Standard design (x86) • Proprietary & closely • Multiple source guarded • AMD, Intel, Via, … • Single source Finished Hardware Supply Finished Hardware Supply • Proprietary & closely • Standard design • Multiple source guarded • Single source • Dell, SGI, HP, IBM, … System Software Supply System Software Supply • Proprietary & closely • Linux (many guarded distros/support) • Single source • Windows & other proprietary offerings Application Stack Application Stack • Not supported • Public/published APIs • No programming tools • High quality prog tools • No 3 rd party ecosystem • Rich 3 rd party ecosystem Net Equipment Commodity Server • Example : • Juniper EX 8216 (used in core or aggregation layers) • Fully configured list: $716k w/o optics and $908k with optics • Solution : Merchant silicon, H/W independence, open source protocol/mgmt stack 2009/10/23 http://perspectives.mvdirona.com 9

Manually Configured & Fragile at Scale • Unaffordable, scale-up model leads to 2-way redundancy – Recovery oriented computing (ROC) better beyond 2-way • Brownout & partial failure common • Neither false positives nor negatives acceptable & perfect is really hard • Unhealthy equipment continues to operate & drop packets • Complex protocol stacks, proprietary extensions, and proprietary mgmt – Norm is error-prone manual configuration • Networking uses a distributed management model – Complex & slow to converge – Central, net & app aware mgmt is practical even in large DCs (50k+ servers) – Want application input (priorities, requirements, ….) • Scale-up reliability gets expensive faster than reliable – Asymptotically approaches “unaffordable” but never “good enough” – ROC management techniques work best with more than 2-way redundancy 2009/10/23 http://perspectives.mvdirona.com 10

Problems on the Border • All the problems of internal network but more: – Need large routing tables (FIBS in 512k to 1M range) – “Need” large packet buffers (power & cost) – Mainframe Router price point • Example: Cisco 7609 • Fairly inexpensive border router • List price ~$350k for 32 ports or $11k/port – Mainframe DWDM optical price point • Example: Cisco 15454 • List ~$489k for 8 ports or $61k/lambda (10Gbps) • Better at higher lambda counts but usually not needed • High cost of WAN bandwidth serious industry issue • DNS & Routing fragility (attacks & errors common) 2009/10/23 http://perspectives.mvdirona.com 11

Summary • We are learning (again) scale- up doesn’t work – Costly – Insufficiently robust • We are learning (again) that a single-source, vertically integrated supply chain is a bad idea • The ingredients for solution near: – Merchant silicon broadly available – Distributed systems techniques • Central control not particularly hard even at 10^5 servers – Standardized H/W platform layer (OpenFlow) • Need an open source protocol & mgmt stack 2009/10/23 http://perspectives.mvdirona.com 12

More Information • This Slide Deck: – I will post these slides to http://mvdirona.com/jrh/work later this week • VL2: A Scalable and Flexible Data Center Network • http://research.microsoft.com/pubs/80693/vl2-sigcomm09-final.pdf • Cost of a Cloud: Research Problems in Data Center Networks • http://ccr.sigcomm.org/online/files/p68-v39n1o-greenberg.pdf • PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric • http://cseweb.ucsd.edu/~vahdat/papers/portland-sigcomm09.pdf • OpenFlow Switch Consortium • http://www.openflowswitch.org/ • Next Generation Data Center Architecture: Scalability & Commoditization • http://research.microsoft.com/en-us/um/people/dmaltz/papers/monsoon-presto08.pdf • A Scalable, Commodity Data Center Network • http://cseweb.ucsd.edu/~vahdat/papers/sigcomm08.pdf • Data Center Switch Architecture in the Age of Merchant Silicone • http://www.nathanfarrington.com/pdf/merchant_silicon-hoti09.pdf • Berkeley Above the Clouds • http://perspectives.mvdirona.com/2009/02/13/BerkeleyAboveTheClouds.aspx • James’ Blog: – http://perspectives.mvdirona.com • James’ Email: – James@amazon.com 2009/10/23 http://perspectives.mvdirona.com 13 13

Are in My Way Stanford Clean Slate CTO Summit James Hamilton, - PowerPoint PPT Presentation

Data Center Networks Are in My Way Stanford Clean Slate CTO Summit James Hamilton, 2009.10.23 VP & Distinguished Engineer, Amazon Web Services e: James@amazon.com web: mvdirona.com/jrh/work blog: perspectives.mvdirona.com work with

Deadline to implement E-Way Bill Basis Inter-Sate Intra -State Voluntary E-Way Bill 16-01-2018

United Way of Tompkins County United Way Inclusive United Way of Tompkins Community Worldwide

A New Way of Medical A New Way of Medical A New Way of Medical A New Way of Medical

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Region 9/10 2016 Flu Season Old way of reporting Benefits to old way of reporting

United Way Campaign United Way of Central Illinois United Way of Central Illinois

West Seattle Five-Way Intersection W Marginal Way, SW Spokane St, Delridge Way SW, and Chelan Ave

VIKASH KABRA & CO. CHARTERED ACCOUNTANTS E-WAY BILL UNDER GST APPLICA BILITY Nationwide

CA MAYUR PAREKH MAYUR PAREKH & ASSOCIATES CA MAYUR R PAREKH FCA/DISA(ICAI) 1 ELECTRONIC WAY

Presentation on Electronic Way (E-Way) Bill By CA. Mayank Agarwal +91-7879084121

Schedule of E-Way SNo Particulars Scheduled 16 th Jan, 2018 1 Voluntary E Way Bills opted by

The Jesus Way The Way of Love John 14 v 6 I am the way the truth and the life, no one comes to

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

BelKraft Water Purifiers Pure Water Pure Water Pure Water an Easy Way to an Easy Way

THE 50/50 DUAL LANGUAGE IMMERSION MODEL Learning language through content One-way vs. two-way

Results presentation Half year ended 30 September 2016 Introduction Peter Cruddas - Group CEO

FY18 results presentation Presented by John Flavell (CEO) and Susan Mitchell (CFO) Table of 1.

MANAGING BROKER REGIONAL ROUNDTABLES 1 Presentation Overview The role of managing brokers in

THE CLEAR COOPERATION RULE NEW MLS RULE STARTING MAY 1 CLEAR COOPERATION Wi t hin one ( 1 ) b us

Edge Moor Power Plant Regulatory Stakeholders Meeting March 9, 2006 1 Agenda Overview of

Business Continuity Business Continuity Planning Planning Diane Engstrom Presented by:

Communications Plan 2016-17 Principles of a communications plan It provides planned guidelines

SURGE GE PROTECT TECTION ION FI FIRE ALARM RM 1-800-753-2345 Technical Support:

Are in My Way Stanford Clean Slate CTO Summit James Hamilton, - PowerPoint PPT Presentation

Data Center Networks Are in My Way Stanford Clean Slate CTO Summit James Hamilton, 2009.10.23 VP & Distinguished Engineer, Amazon Web Services e: James@amazon.com web: mvdirona.com/jrh/work blog: perspectives.mvdirona.com work with

Deadline to implement E-Way Bill Basis Inter-Sate Intra -State Voluntary E-Way Bill 16-01-2018

United Way of Tompkins County United Way Inclusive United Way of Tompkins Community Worldwide

A New Way of Medical A New Way of Medical A New Way of Medical A New Way of Medical

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Region 9/10 2016 Flu Season Old way of reporting Benefits to old way of reporting

United Way Campaign United Way of Central Illinois United Way of Central Illinois

West Seattle Five-Way Intersection W Marginal Way, SW Spokane St, Delridge Way SW, and Chelan Ave

VIKASH KABRA &amp; CO. CHARTERED ACCOUNTANTS E-WAY BILL UNDER GST APPLICA BILITY Nationwide

CA MAYUR PAREKH MAYUR PAREKH &amp; ASSOCIATES CA MAYUR R PAREKH FCA/DISA(ICAI) 1 ELECTRONIC WAY

Presentation on Electronic Way (E-Way) Bill By CA. Mayank Agarwal +91-7879084121

Schedule of E-Way SNo Particulars Scheduled 16 th Jan, 2018 1 Voluntary E Way Bills opted by

The Jesus Way The Way of Love John 14 v 6 I am the way the truth and the life, no one comes to

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

BelKraft Water Purifiers Pure Water Pure Water Pure Water an Easy Way to an Easy Way

THE 50/50 DUAL LANGUAGE IMMERSION MODEL Learning language through content One-way vs. two-way

Results presentation Half year ended 30 September 2016 Introduction Peter Cruddas - Group CEO

FY18 results presentation Presented by John Flavell (CEO) and Susan Mitchell (CFO) Table of 1.

MANAGING BROKER REGIONAL ROUNDTABLES 1 Presentation Overview The role of managing brokers in

THE CLEAR COOPERATION RULE NEW MLS RULE STARTING MAY 1 CLEAR COOPERATION Wi t hin one ( 1 ) b us

Edge Moor Power Plant Regulatory Stakeholders Meeting March 9, 2006 1 Agenda Overview of

Business Continuity Business Continuity Planning Planning Diane Engstrom Presented by:

Communications Plan 2016-17 Principles of a communications plan It provides planned guidelines

SURGE GE PROTECT TECTION ION FI FIRE ALARM RM 1-800-753-2345 Technical Support:

VIKASH KABRA & CO. CHARTERED ACCOUNTANTS E-WAY BILL UNDER GST APPLICA BILITY Nationwide

CA MAYUR PAREKH MAYUR PAREKH & ASSOCIATES CA MAYUR R PAREKH FCA/DISA(ICAI) 1 ELECTRONIC WAY