idc update on how big data is
play

IDC Update on How Big Data Is Redefining High Performance Computing - PowerPoint PPT Presentation

IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph ejoseph@idc.com Steve Conway sconway@idc.com Chirag Dekate cdekate@idc.com Bob Sorensen bsorensen@idc.com IDC Has Over 1,000 Analysts In 62 Countries


  1. IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway – sconway@idc.com Chirag Dekate – cdekate@idc.com Bob Sorensen – bsorensen@idc.com

  2. IDC Has Over 1,000 Analysts In 62 Countries

  3. Agenda  A Short HPC Market Update  Big Data Challenges and Short Comings  The High End of Big Data • Examples of Very Large Big Data  Examples of How Big Data is Redefining High Performance Computing

  4. HPC Market Update

  5. What Is HPC?  IDC uses these terms to cover all technical servers used by scientists, engineers, financial analysts and others: • HPC • HPTC • Technical Servers • Highly computational servers  HPC covers all servers that are used for computational or data intensive tasks • From a $5,000 deskside server up to over $550 million dollar supercomputer 5

  6. Top Trends in HPC 2013 declined overall – by $800 million • For a total of $10.3 billion • Mainly due to a few very large systems sales in 2012 that weren’t repeated in 2013 • We expect growth in 2015 to 2018 Software issues continue to grow The worldwide Petascale Race is at full speed GPUs and accelerators are hot new technologies Big data combined with HPC is creating new solutions in new areas

  7. IDC HPC Competitive Segments: 2013 HPC Servers $10.3B Workgroup Supercomputers (under $100K) (Over $500K) $1.6B $4.0B Divisional Departmental ($250K - $500K) ($250K - $100K) $1.4B $3.4B

  8. HPC WW Market Trends: A 17 Year Perspective  C

  9. HPC Market Forecasts

  10. HPC Forecasts • Forecasting a 7.4% yearly growth from 2013 to 2018 • 2018 should reach $14.7 billion

  11. HPC Forecasts: By Industry/Applications

  12. The Broader HPC Market

  13. Big Data Challenges And Shortcomings

  14. Defining Big Data

  15. HPDA Market Drivers  More input data (ingestion) • More powerful scientific instruments/sensor networks • More transactions/higher scrutiny (fraud, terrorism)  More output data for integration/analysis • More powerful computers • More realism • More iterations in available time  The need to pose more intelligent questions • Smarter mathematical models and algorithms  Real time, near-real time requirements • Catch fraud before it hits credit cards • Catch terrorists before they strike • Diagnose patients before they leave the office • Provide insurance quotes before callers leave the phone

  16. Top Drivers For Implementing Big Data

  17. Organizational Challenges With Big Data: Government Compared To All Others

  18. Big Data Meets HPC And Advanced Simulation

  19. High Performance Data Analysis Simulation & analytics Needs HPC resources • Search, pattern discovery • High complexity (algorithms) • High time-criticality • Iterative methods • High variability • Established HPC users + new • (On premise or in cloud) commercial users Data of all kinds • The 4 V’s: volume, variety, velocity, value • Structured, unstructured • Partitionable, non-partitionable • Regular, irregular patterns

  20. HPC Adoption Timeline (Examples) 1970 1960 1990 1980 2000 2012

  21. Very Large Big Data Examples

  22. NASA 22

  23. Square Kilometre Arrary – Radio Astronomy for Astrophysics

  24. CERN • LHC: the world’s leading accelerator -- Multiple Nobel Prizes for particle physics work • Innovation driven by the need to distribute massive data sets and the accompanying applications • Altas , one of CERN’s two detectors, generates 1PB of data per second when running! (Not all of this is distributed). • Private cloud distribution to scientists in 20 EU member states plus observer states (single largest user is the U.S.) • Today: only .0000005% of the data is used

  25. NOAA 25

  26. 26

  27. 27

  28. HPC Will Be Used More for Managing Mega-IT Infrastructures

  29. Examples of Big Data Redefining HPC

  30. Use Case: PayPal Fraud Detection The Problem Finding suspicious patterns that we don’t even know exist in related data sets

  31. What Kind of Volume? PayPal’s Data Volumes And HPDA Requirements

  32. Where Paypal Used HPC

  33. The Results  $710 million saved in fraud that they wouldn’t have been able to detect before (in the first year)

  34. There Are New Technologies That Will Likely Cause A Mass Explosion In Data – Requiring HPDA Solutions

  35. GEICO: Real-Time Insurance Quotes  Problem: Need accurate automated phone quotes in 100ms. They couldn’t do these calculations nearly fast enough on the fly.  Solution: Each weekend, use a new HPC cluster to pre-calculate quotes for every American adult and household (60 hour run time)

  36. Something To Think About -- GEICO: Changing The Way One Approaches Solving a Problem • Instead of processing each event one-at-a-time, process it for everyone on a regular basis  It can be dramatically cheaper, faster and offers additional ways to be more accurate  But most of all it can create new and more powerful capabilities • Examples:  For home loan applications – calculate for every adult in the US and every home in the US  For health insurance fraud – track every procedure done on every US person by every doctor – and find patterns

  37. Something To Think About -- GEICO: Changing The Way One Approaches Solving a Problem • Future Examples (continued):  If you add-in large scale data collection via sensors like GPS, drones and RFID tags: • New car insurance rules – The insurance company doesn’t have to pay if you break the law -- like speeding and having an accident • You could track every car at all times – then charge $2 to see where the in-laws are in traffic if they are late for a wedding • Google maps could show in real-time where every letter and package is located • But crooks could also use it in many ways – e.g. watching ATM machines, looking for when guards are on break, …

  38. U.S. Postal Service

  39. U.S. Postal Service

  40. U.S. Postal Service MCDB = memory-centric database

  41. CMS: Government Health Care Fraud  5 separate databases for the big USG health care programs under Centers for Medicare and Medicaid Services (CMS)  Estimated fraud: $150B-$450B <$5B caught today)  ORNL, SDSC have evaluation contracts to unify the databases and perform fraud detection on various architectures

  42. Schrödinger: Cloud-based Lead Discovery for Drug Design NOVARTIS/SCHROEDINGER:  Pharmaceutical company Novartis increased resolution of drug discovery algorithm 10x and wanted to use it to test 21 million small molecules as drug candidates  Novartis used the Schroedinger drug discovery app in AWS public cloud, with the help of Cycle Computing  Initial run used 51,000 AWS cores and took $14,000 and <4 hours  … and its getting cheaper  Later run used 156,000 AWS cores with comparable costs and time

  43. Schrödinger: Cloud-based Lead Discovery for Drug Design

  44. Global Financial Services: Company X  One of the most respected firms in the global financial services industry updates detailed information daily on several million companies around the world.  Clients use the firm's credit ratings and other company information in making lending decisions and for other planning, marketing, and business decision making.  The firm uses statistical models to develop a company's scores and ratings, and for years, the ratings have been prepared and analyzed locally in near real time by the firm's personnel around the world. • This practice is a major competitive advantage but resulted in the creation of hundreds of distinct databases and more than a dozen scoring environments. • Several years ago, the company established a goal of centralizing these resources and chose SAS as the centralization mechanism, including SAS Grid Manager as part of the software stack.

  45. Global Financial Services: Company X  The centralized IT infrastructure created using SAS preserves the advantages of the company's locally created ratings and reports. The new infrastructure provides an effective environment for analytics development and accommodates multiple testing, debt, and production environments in a single stack.  It is flexible enough to allow dynamic prioritization among these environments, according to a company executive. With help from SAS Grid Manager, the company can maximize the use of its computing resources. The software automatically assigns jobs to server nodes with available capacity, instead of having users wait in queue for time on fully utilized nodes.  The company executive estimates that it might cost 30% more to purchase servers with enough capacity to handle these peak workloads on their own.

  46. Global Financial Services: Company X  Several million clients use the firm’s credit ratings to help make lending decisions  Goal: increase efficiency for updating ratings  Result: HPC multi-cluster grid boosted efficiency 30% -- no need to buy additional clusters yet.

  47. Real Estate  Worldwide vacation exchange & rental leader  Goal: Update property valuations several times per day (not possible with enterprise servers)  Results: • HPC technology enabled all updates in 8-9 hours • Avoided move to heuristics • Allowed company to focus on rental side

Recommend


More recommend