Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing - PowerPoint PPT Presentation

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing Grid/Cloud Computing over Optical Networks over Optical Networks over Optical Networks over Optical Networks - Opportunities & Research Issues Opportunities & Research Issues Chunming Chunming Qiao Qiao Lab for Advanced Network Design, Evaluation and Lab for Advanced Network Design, Evaluation and Research (LANDER) Research (LANDER) Research (LANDER) Research (LANDER) University at Buffalo (SUNY) University at Buffalo (SUNY)

Outline  Optical Grid Computing for Petascale Science  Federated Computing and Networking as Next Generation Cloud Computing as Next Generation Cloud Computing

Petascale Science  Sharing of large amounts of data (in PB range) generated by big experiment ) t d b bi i t instruments and observatories  Supporting thousands of collaborators worldwide worldwide  Distributed data processing  Distributed simulation, visualization, and computational steering computational steering  Distributed data management

Petascale Science q1 Science Areas End2End Connectivity Today 5 years Network Services / Facilities Reliability • DOE sites • Guaranteed bandwidth Advanced Light - 1 TB/day 5 TB/day Source • US Universities • PKI / Grid 300 Mbps 1.5 Gbps • Industry Industry • DOE sites • Guaranteed bandwidth Bioinformatics - 625 Mbps 250 Gbps • US Universities • High-speed multicast • DOE sites • Guaranteed bandwidth Chemistry / - - 10s of Combustion Gigabits per • US Universities • PKI / Grid second • Industry • DOE sites • Guaranteed bandwidth Climate Science - - 5 PB per year • US Universities • PKI / Grid 5 Gbps • International • US Tier1 (DOE) • Guaranteed bandwidth High Energy 99.95+% 10 Gbps 100 Gbps Physics (LHC) • US Universities • Traffic isolation (Less (30-40 Gbps than 4 per US Tier1) • International • PKI / Grid hrs/year)

Current, Near- and Long-term Requirements Today 5 years 5-10 Years Science Areas Remarks End2End End2End End2End End2End End2End Throughput Throughput high bulk High Energy 10 Gb/s 100 Gb/s 1000 Gb/s throughput and Nuclear Physics sporadic p Climate (Data high bulk & 0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s throughput Computation) Genomics (Data 0.091 Gb/s 1000 Gb/s + high throughput and & 100s of users (1 TB/day) QoS for control steering Computation) remote control and SNS SNS Not yet Not yet 1000 Gb/s + 1000 Gb/s + 1 G / 1 Gb/s time critical i i i NanoScience started QoS for control throughput 0.066 Gb/s 0.198 Gb/s time critical Fusion Energy gy (500 MB/s ( N x 1000 Gb/s (500MB/20 sec.) (500MB/20 sec ) throughput throughput b burst) ) computational 0.013 Gb/s Astrophysics N*N multicast 1000 Gb/s steering and (1 TB/week) collaborations

A Composable Data Transfer Framework Framework  Dynamic reconfiguration capabilities – to support different objectives such as burst, to support different objectives such as burst scheduled, and streaming delivery  Automatic detection of scenarios and use of appropriate/ available – transport media (e.g., (circuit-based WDM, VLANs, SONET etc) SONET, etc), and and protocols, such as (TCP-variants, UDP-variants, – InfiniBand, SCSI, etc.)  Capability of one-to-many, and many-to-many data transfers, – via Application Level Multicast or peer-to-peer i A li ti L l M lti t t approach

Federated Computing and Networking (FCN) Networking (FCN)  A FCN system consists of computing facilities (e.g., A FCN system consists of computing facilities (e.g., clusters, data centers) interconnected with wide-area WDM networks  A FCN service provider uses its own computing and A FCN service provider uses its own computing and WDM networking resources (or resources that belong to a third party for which it is a broker)  FCN: the next generation of Cloud Computing – Interact directly with the WDM networks – Integrate a larger scale of computing and networking Integrate a larger scale of computing and networking resources – Provide stronger Service Level Agreements (SLAs) g g ( ) including high availability and robustness than e.g., Amazon’s EC2

VI Job & WF Job  Two general types of distributed jobs / apps  Virtual Infrastructure (VI) – specifies a set of computing resources (e.g., processing clusters), and their connectivity (in terms of topology, l t ) d th i ti it (i t f t l bandwidth, and delay) for a specific period of time – Typically represented using a general directed graph Typically represented using a general directed graph  Workflow (WF) – involves large data sets to be distributed among many involves large data sets to be distributed among many sites – Represented using a directed acyclic graph, or DAG, where directed edges imply precedence among the tasks

Support VI/WF Jobs in FCNS: the ASAP Platform the ASAP Platform  Provision Application-Specific, Agile, and Private (ASAP) platform ate ( S ) p at o – Given: a VI or WF job request, – Determine: the mapping of the tasks to computing Determine: the mapping of the tasks to computing facilities, and the routes as well as wavelengths to be used for connections over the WDM networks, – Objective: to satisfy the job’s requirements with some optimization goals

Illustration of FCNS

Example Research Issues  Advanced Network Provisioning Ad d N t k P i i i Technologies Technologies – enable dynamic, multi-layer, end-to-end, circuit- based services across federated networks – Extensions of existing control plane technologies such as (GMPLS, MPLS, etc.) to accommodate ( ) scheduling, and reservation – unified control plane technologies, path computations, and traffic engineering for multi-layer and multi-domain networks offering hybrid best-effort IP burst and switched circuit services IP, burst and switched circuit services

Example Research Issue II p  Resource co-scheduling to improve data transfer or data analysis job performance : transfer or data analysis job performance : – Offline/online provisioning of data transfer request(s) request(s) • Optimal co-scheduling of computing resources (e.g., storage/caching) and network resources g g) – Offline/online provisioning of data analysis job(s) j ( ) • Decide the execution host(s) for the job(s), and establish network paths to stage missing input files l locally ll

Example Research Issue III p  Fault Diagnosis and Tolerance – Dynamic performance monitoring over heterogeneous multi-domain networks – Fault location and diagnosis – Protection/Restoration approaches to survive – Protection/Restoration approaches to survive various failure scenarios – Proactive replication to increase the Proactive replication to increase the availability of data – Network coding to reduce storage and N t k di t d t d bandwidth requirements

Research Issue IV  SLA-driven, cost-effective algorithms for provisioning ASAP platforms provisioning ASAP platforms, – addressing the optimal joint task assignment & scheduling and lightpath establishment (as well as scheduling and lightpath establishment (as well as traffic grooming) problems, – subject to heterogeneous computing resources and subject to heterogeneous computing resources and limited optical networking resources  Robust and resilient approaches to survivable  Robust and resilient approaches to survivable ASAP platforms – considering tradeoffs involving SLA guarantee and considering tradeoffs involving SLA guarantee and resource usage, under various failure scenarios

Previous Results  “Performance Comparison of Optical Circuit and Burst Switching for Distributed Computing Burst Switching for Distributed Computing Applications” - OFC 2008  “Survivable Optical Grids” - OFC 2008  “Task Scheduling and Lightpath Establishment in Optical Grids” - INFOCOM 2008 Mini-Conference p

Recent Works  Maximizing the Revenues for Distributed Computing Applications over WDM Networks - OFC 2009, OMG2  “Survivable Logical Topology Design for Distributed Computing in WDM Networks” - OFC 2009 OMO3  “Robust Application Specific and Agile Private (ASAP) Networks Withstanding Multi-layer Failures” - OFC 2009 OWY 1 (Wed) OFC 2009 OWY 1 (W d)  “Online Job Provisioning for Large Scale Science Experiments over an Optical Grid Infrastructure” Experiments over an Optical Grid Infrastructure - HSN 2009 in conjunction with INFOCOM 2009  “Application-Specific Agile and Private (ASAP) Application-Specific, Agile and Private (ASAP)  Platforms for Federated Computing Services over WDM Networks - in INFOCOM 2009 Mini-Conference

Thank you! Thank you! Thank you! Thank you!

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing - PowerPoint PPT Presentation

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing Grid/Cloud Computing over Optical Networks over Optical Networks over Optical Networks over Optical Networks - Opportunities & Research Issues Opportunities & Research

Clo cks 1 Goals of the lecture Logical Clo cks (Lamp o rt's clo cks)

Clo cks [Contd.] 1 Goals of the lecture Direct dep endency clo cks Pred and

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Towards ds a self elf auto tomated CE CERN Clo Cloud Jos Castro Len CERN Cloud

Base Realig lignme ment and Clo losure (a (aka BRA BRAC) Two Su Successes and a Clo

-Sandeep Palur and Ajay Anthony (A20302187) (A20306352) 1 Int ntrodu roducti ction n

Repairing Four-Atom Conjecture Ting-Ting Nan Advisor: Nigel Boston SP Coding and Information

Cloud Ser Clo Service Prov Provider(CSP) for for the Government Priv Gov Private Cl Cloud

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

Im Improving g Res esource av availa labi bili lity ty i in CER ERN C N Clo loud ud

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Using Derivatives in an Economics Set- ting. LectroCopy makes photocopy machines, and

MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with a Matrix

Finding and Understanding Bugs in Software Model Checkers Chengyu Zhang , Ting Su, Yichen Yan,

Improving Twitter Retrieval by Exploiting Structural Information Zhunchen Luo, Miles

Ti ank God for Second Presbyterian Church! Authentic Worship Gracious Relationships Serious

Some Facts and Figures Manfred Nagl Informatics Europe, RWTH Aachen University ECSS 2012,

rs ts t

Hierarchical interac,ons between Ethereum m sma mart contracts across Te Testnets Yao-Chieh

Sambuz

Useful Links

Newsletter

Mail Us