Next Generation Cloud Computing: Advanced Services, Architecture, and Technologies Joe Mambretti, Director, (j-mambretti@northwestern.edu) International Center for Advanced Internet Research (www.icair.org) Northwestern University Director, Metropolitan Research and Education Network (www.mren.org) Partner, StarLight/STAR TAP, PI-OMNINet (www.icair.org/omninet) Technische Universit Carolo-Wilhelmina zu Braunschweig Braunschweig, July 1-3, 2009
Introduction to iCAIR: Accelerating Leading Edge Innovation and Enhanced Global Communications through Advanced Internet Technologies, in Partnership with the Global Community • Creation and Early Implementation of Advanced Networking Technologies - The Next Generation Internet All Optical Networks, Terascale Networks, Networks for Petascale Science • Advanced Applications, Middleware, Large-Scale Infrastructure, NG Optical Networks and Testbeds, Public Policy Studies and Forums Related to NG Networks • Three Major Areas of Activity: a) Basic Research b) Design and Implementation of Prototypes c) Operations of Specialized Communication Facilities (e.g., StarLight)
Paradigm Shift – Ubiquitous Services Based on Large Scale Distributed Facility vs Isolated Services Based on Separate Component Resources Traditional Provider Services: Distributed Programmable Resources, Invisible, Static Resources, Dynamic Services, Centralized Management, Visible & Accessible Resources, Highly Layered Integrated As Required, Non-Layered Invisible Nodes, Elements, Hierarchical, Centrally Controlled, Fairly Static Unlimited Services, Functionality, Limited Services, Functionality, Flexibility Flexibility
A Next Generation Architecture: Distributed Facility Enabling Many Types Network/Services Environment: VO FinancialNet Environment: Sensors SensorNet HPCNet Environment: Real Org1 TransLight Environment: Real Org Commodity Environment: Intelligent Environment: Real Org2 R&DNet GovNet1 Internet Power Grid Control Environment: Gov Agency Environment: RFIDNet MedNet RFIDNet Environment: Control Plane Environment: Bio Org PrivNet Environment: Environment: Lab Large Scale System Control BioNet MediaGridNet Environment: Environment: Global App International Gaming Fabric Environment: Financial Org
Cloud Context (1) • In General, Clouds Are A Means To Support Large Scale Computing and Data Capabilities for Distributed On-Demand Resources Using Data Networks (WANs) • There Are Many Different Types of Clouds • Some Are Oriented Toward Services (e.g., Web 2.0 Based) • Some Are Oriented Toward Resources – For Example, On-Demand Computing Instances Using Infrastructure As A Service Techniques (IaaS, Amazon EC2, S3, etc., Eucalyptus) • Some provide Large Scale On-Demand Computing Capacity (G FS/MapReduce/Bigtable, Hadoop, Sector, etc • Some Support Public Services Provided By Global Corporations, Some Support Private Enterprises Through External Resources, Some Support Private Organizations Through Internal Resources – and There Are Many Variations and Hybrids 5
Cloud Context (2) • To Date , Clouds Have Been Successful • However, Current Clouds Have Limitations • For Example, They Do Not Necessarily Provide Optimal Performance • Also, Current Clouds Are Oriented Toward Providing Support For Many Billions of Small Data Over Commodity “Best Effort” Routed Networks • Current Clouds Do Not Provide Optimal Support for Large Capacity Data Flows and Extremely Large Amounts of Individual Data Components • They Have Not Been Integrated Into Next Generation Networking Capabilities • They Do Not Handle Specialized Data Well – e.g., Digital Media
Cloud Context (3) • If Clouds Are Successful, Why Improve Them Despite Limitations? • They Are Successful for Today’s Consumer and Enterprise Services – • However, They Will Not Meet Future Challenges Using Current Architectures and Technologies • Illustration – And Also Motivation For Improving the State-of-the-Art – Large Scale Science • Why Large Scale Science? • Large Scale Science Provides a Looking Glass Into the Future
A Scientific Perspective • Scientific Research Requires The Resolution of Extremely Complex Problems • Scientific Research Requires The Design And Creation Of Specialized Tools • Increasingly, These Tools Are Being Created Using Digital Technologies • Because of the Complexity and Scale of Major Scientific Problems, Many Areas of Research Encounter Technical Barriers Years Before They Are Recognized By Other Domains • Technical Solutions That Are Created Later Migrate To Wider Communities • Can Large Scale Science Use Clouds? Not Until the Limitations Described Earlier Are Addressed
Motivation: Data-Intensive Science & Engineering-e-Science Community Resources ALMA LHC Sloan Digital Sky Survey ATLAS
Magnetic Fusion Energy New Sources Of Power Source: DOE Source: DOE
Spallation Neutron Source (SNS) at ORNL Source: DOE
USGS Images 10,000 Times More Data than Landsat7 Landsat7 Imagery 100 Foot Resolution Draped on elevation data Shane DeGross, Telesis USGS New USGS Aerial Imagery At 6-inch Resolution Source: EVL, UIC
Today’s Aerial Imaging is >500,000 Times More Detailed than Landsat7 Shane DeGross SDSU Campus 30 meter pixels Source: Eric Frost, SDSU 4 centimeter pixels Laurie Cooper, SDSU
iGrid 2005 UCSD 4K Digital Media Ultra High Definition Digital Communications • NTT, Japan NTT’s digital communications using SHD transmits extra-high-quality, digital, full-color, full motion images. 4k pixels horizontal, 2k vertical 4 * HDTV – 24 * DVD . www.onlab.ntt.co.jp/en/mn/shd
High-Performance Digital Media For Interactive Remote Visualization (2006) • Interactive visualization coupled with computing resources and data storage archives over optical networks enhance the study of complex problems, such as the modeling of black holes and other sources of gravitational waves. • HD video teleconferencing is used to stream the generated images in real time • Center nter for or Computat putatio ion n and d Te Tech chno nolo logy gy, , Lou ouisiana isiana St State te Unive vers rsity ity (LSU SU), ), USA SA from Baton Rouge to Brno and other • Northwester rthwestern n Unive versity rsity locations • MCNC, USA SA • NCSA SA, USA SA • Lawrence awrence Be Berkeley keley Natio iona nal l Lab abor orato atory, ry, USA SA • Masaryk saryk Unive vers rsity ity/CES ESNET, ET, Czech ch Re Repub public ic • Zu Zuse se Inst stitute itute Be Berlin, in, German many • Vr Vrije je Unive versiteit, rsiteit, NL www.cct.lsu.edu/Visualization/iGrid2005 http://sitola.fi.muni.cz/sitola/igrid/
OptIPuter JuxtaView Software for Viewing High Resolution BioImages on Tiled Displays 30 Million Pixel Display Source: David Lee, Jason Leigh, EVL, UIC NCMIR Lab UCSD
Components Comprising Environment • Overall Architecture • Compute Nodes • Storage Performance • Network Architecture, Protocols, Performance and Technology Note=>Ultra High Performance Networks Can Make Remote Data Appear To Be Local • Proof of Concept – Large Scale Testbed/Prototype • Integration With Emerging Technologies, e.g., Massive Multicore, FPGAs, Customized Integrated Components, etc.
Architecture (1) Super-Computer Model: • Expensive Data • IO is a bottleneck Storage Compute Alternative Model: • Inexpensive, • Parallel data IO • Examples: Hadoop • Sphere/Sector Source: NCDM, UIC
Architecture (2) Parallel/Distributed Programming With MPI, etc.: • Flexible and powerful. • BUT Very Complicated • No Data Locality Sector/Sphere Model: • Very Simple to Apply UDF to All Data in Parallel; • Exploits Data Locality • Limited to Certain Data Parallel Applications. Source: NCDM, UIC
What is Sector/Sphere? • Sector/Sphere = Wide Area Cloud Providing On-Demand Computing Capacity • Sector: Distributed Storage System • Sphere: Run-time Middleware Applies User Defined Functions (UDF) to Sector Datasets. • Open Source Software, GPL/LGPL, written in C++. • Initiated 2006, Current Version 1.19 • http://sector.sf.net Source: NCDM, UIC
Sector • Sector: Provides Long Term Persistent Storage to Large Datasets Managed as Distributed Indexed Files. • File Segments Are Placed Throughout Distributed Storage Managed by Sector. • Sector Generally Replicates Data To • Ensure Longevity, • Decrease the Latency When Retrieving It, • Provide Opportunities for Parallelism. • Sector is Designed to Take Advantage of Wide Area High Performance Networks When Available. • Sector Can Address Issues Of Extremely Large Data Sets, Including Very Large Scale Science Data Sets Source: NCDM, UIC
Sphere • Sphere: Designed To Execute User Defined Functions (UDF) In Parallel Using a Stream Processing Pattern for The Data That Is Managed By Sector • UDFs Are Applied To Every Data Record In a Data Set Managed by Sector • Each Data Segment Is Processed Independently Providing a Natural Parallelism • The Sector/Sphere Design Results in Allowing Data To Be Frequently Processed in Place Without Moving It • If Data Must be Moved, It Can Be Transported Over High Performance Channels With High Performance Protocols Source: NCDM, UIC
Recommend
More recommend