Architecting for the cloud: lessons learned from 100 CloudStack deployments Sheng Liang CTO, Cloud Platforms, Citrix
CloudStack History 2008 2009 2010 2012 2011 Sept 2008: Nov 2009: May 2010: July 2011: April 2012: CloudStack Cloud.com Citrix Apache VMOps Founded 1.0 GA Launch & Acquires CloudStack CloudStack Cloud.com 2.0 GA
The inventor of IaaS cloud – Amazon EC2 Amazon eCommerce Platform Amazon eCommerce Platform EC2 API EC2 API Amazon Proprietary Orchestration Software Amazon Proprietary Orchestration Software Open Source Xen Hypervisor Open Source Xen Hypervisor Commodity Networking Storage Servers
CloudStack is inspired by Amazon EC2 Amazon eCommerce Platform Amazon eCommerce Platform CloudPortal EC2 API EC2 API Cloud APIs Amazon Proprietary Orchestration Software Amazon Proprietary Orchestration Software CloudStack Open Source Xen Hypervisor Open Source Xen Hypervisor XenServer ESX Hyper-V KVM OVM Commodity Networking Storage Servers
There will be 1000s of clouds SP Owner | Operator Data center mgmt Desktop and automation Cloud IT Horizontal Vertical General Purpose Special Purpose
Learning from 100s of CloudStack deployments Web 2.0 Service Providers Enterprise
What is the biggest difference between traditional-style data center automation and Amazon-style cloud?
How to handle failures
• Server failure comes from: 8 % � 70% - hard disk � 6% - RAID controller � 5% - memory � 18% - other factors • Application can still fail for Annual Failure Rate of servers other reasons: � Network failure � Software bugs � Human admin error Kashi Venkatesh Vishwanath and Nachiappan Nagappan, Characterizing Cloud Computing Hardware Reliability, SoCC’10 11
Internet Core Routers … Access Routers Aggregation Switches Load Balancers … Top of Rack Switches Servers
•Bugs in failover 40 % mechanism •Incorrect configuration •Protocol issues such as TCP back-off, Effectiveness of network timeouts, and spanning redundancy in reducing failures tree reconfiguration Phillipa Gill, Navendu Jain & Nachiappan Nagappan, Understanding Network Failures in Data Centers: Measurement, Analysis and Implications , SIGCOMM 2011 13
A. Promise users VM, storage, and networking will never fail -- no strategy to handle failures B. Backup VM for users and restore for users when failure happens C.Tell users to expect failure. Users to backup VM and handle failure themselves
zCloud zCloud West East Zone AWS Zone AWS West East Zone Zone
zCloud zCloud West East Zone AWS Zone Design for AWS West East Failure Zone Zone
Cloud workloads Traditional-Style Amazon-Style Tell users to expect failure. Reliable hardware, backup entire cloud, and restore for users when Users to build apps that can failure happens withstand infrastructure failure Link aggregation VM backup/snapshots Storage multi-pathing Ephemeral resources VM HA, fault tolerance Chaos monkey VM live migration Multi-site redundancy Strong consistency Eventual consistency
Designing a zone for a traditional workload Hypervisor Traditional-Style Availability Zone vSphere or XenServer Enterprise vCenter/XenCenter vCenter/XenCenter Storage SAN Enterprise Networking (e.g., VLAN) Enterprise Networking (e.g., VLAN) Networking Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor L2 VLANs Cluster Cluster Cluster Cluster Cluster Cluster Network Services Enterprise Storage (e.g., SAN) Enterprise Storage (e.g., SAN) Load Balancing VPN Multi-tier Apps Ent App Mgmt
Designing a zone for an Amazon-style workload Amazon-Style Availability Zone Software Defined Networks Software Defined Networks Hypervisor (e.g., Security Groups, EIP, ELB,...) (e.g., Security Groups, EIP, ELB,...) XenServer or KVM Server Server Server Server Server Server Server Server Storage Racks Racks Racks Racks Racks Racks Racks Racks Local EBS Server Server Server Server Server Server Server Server Networking Racks Racks Racks Racks Racks Racks Racks Racks Elastic IP L3 SDN based L2 Server Server Server Server Server Server Server Server Network Services Racks Racks Racks Racks Racks Racks Racks Racks Security Groups ELB GSLB Elastic Block Storage Elastic Block Storage Multi-tier Apps 3 rd Party Tools (e.g., 3 rd Party Tools (e.g., RightScale, enStratus) RightScale, enStratus)
Object store is critical for Amazon-style cloud Availability Zone 2 NetScaler ELB/ Storage Cloud ? GSLB NetScaler Availability Zone 2 Users NetScaler Availability Zone 1
Same Cloud can Support Both Styles Apache CloudStack Mgmt Server Traditional Traditional Traditional Traditional AWS-style AWS-style AWS-style AWS-style AWS-style AWS-style Style Style Style Style Availability Availability Availability Availability Availability Availability Availability Availability Availability Availability Zone Zone Zone Zone Zone Zone Zone Zone Zone Zone Object Storage Object Storage Replication/DR
Tests for a “true” cloud app • Does it require SAN or VLAN? • Does it run in multiple data centers? • Does it involve a distributed object store? • Is there a single point of failure?
Learning from 100s of CloudStack deployments Web 2.0 Service Providers Enterprise Mostly traditional style Traditional-style Mostly Amazon-style
Standby CloudStack Mgmt Server Cluster CloudStack Admin Internet Availability Zone 2 Primary CloudStack Mgmt Server Cluster Router Primary MySQL Load Balancer Backup MySQL L3 Core Switch Top of Rack Switch Object Store … Servers … … … … Availability Zone 1 Pod 3 Pod 1 Pod 2 Pod N
Layer 3 cloud networking (security groups) Web Web DB DB Web Web VM VM VM VM VM VM Web DB Security Security Web Web Group Web Web Group DB DB VM VM VM VM VM VM … … … Web Web Web Web VM VM VM VM
Layer 2 VLAN networking User User 1 1 User User 1 1 User User 1 1 User User User User 2 2 1 1 User User 2 2 … … …
OVS networking GRE Key 1 GRE Key 2 User User OVS 1 1 User User OVS 1 1 User User 1 1 User User User User OVS OVS 2 2 OVS 1 1 User User 2 2 … … …
Multi-tier virtual networking Internet Internet Network Services Public VLAN • IPAM IPSec VPN Customer Customer • DNS Premises Premises Virtual Router NetScaler VPX • LB [intra] MPLS VLAN • S-2-S VPN GRE Key 2 • Static Routes GRE Key 1 App VM 1 • ACLs Web VM 1 • NAT, PF App VM GRE Key 3 Web VM • FW [ingress & egress] 2 2 • BGP Web VM DB VM 1 3 Web VM 4 DB Subnet Web subnet App subnet 10.1.1.0/24 10.1.3.0/24 10.1.2.0/24
Network flexibility Network Services Network Isolation Service Providers � � � L2 connectivity Virtual appliances No isolation � Hardware firewalls � � IPAM VLAN isolation � LB appliances � � DNS SDN overlays � SDN controllers � � Routing L3 isolation � IDS /IPS � ACL appliances � Firewall � VRF � � NAT Hypervisor � VPN � LB � IDS � IPS
“The Apache Way” • Collaborative software development • Commercial-friendly standard license • Consistently high quality software • Respectful, honest, technical-based interaction • Faithful implementation of standards • Security as a mandatory feature
Apache CloudStack Community Pre Apache June Move (Jan Actuals 2012) # of companies 1 68 endorsing project # of companies 10 140 participating # of developers 40 238 working on project
Apache CloudStack community projects • SDN • Smart Storage � Nicira � Hadoop + S3 API for object store � Midokura � NetApp (FlexPod, object store) � Big Switch Networks � Basho RIAK CS � Stratosphere � Caringo object store � Cloudian S3 • Backup/DR • PaaS � Sungard � CloudFoundry implementation through • Networking IronFoundry and Stackato teams � Cisco � Engine Yard � Brocade (ADX) � Cumulogic � GigaSpaces
Workload requirements drive cloud architecture There is real demand for SDN in cloud infrastructure Open source developers drive cloud adoption
More info http://cloudstack.org
Recommend
More recommend