Edge Resource Management Systems: From Today to Tomorrow November 2018 Berlin OpenStack Summit
Who are we? Sandro Mazziotta Abdelhadi Chari Adrien Lebre Cloud/NFV innovation Director Product Management Professor (HdR) at IMT Atlantique Openstack NFV FEMDC SiG Co-chair (2016-2018) Project manager Discovery PI abdelhadi.chari@orange.com http://beyondtheclouds.github.io smazziot@redhat.com 2 OpenStack Summit Berlin
Multiple use cases are triggering Edge... Source: https://wiki.akraino.org/display/AK/Akraino+Edge+Stack 3 OpenStack Summit Berlin
Edge from the Infrastructure viewpoint? A set of independent computing sites that should be seen as a global and unique infrastructure Form Factors National Core Multiple racks Regional Core 1 rack Edge Less than 10 servers Far Edge 1 to 3 servers Far Edge 4 OpenStack Summit Berlin
New constraints on the resource mgmt system Latency / Bandwidth / intermittent Network consumer ⇔ service service ⇔ service control plane ⇔ resources Geo-Distribution Scale Resilience Regulations Need to deploy and From a few regional sites make edge sites keep sensitive data perform the lifecycle to a large number of autonomous, minimize on-site / within regulatory management of remote resources failure domain to 1 site region distributed systems And many others... 5 OpenStack Summit Berlin
Orange: Edge through the NFV perspective
NFV hosting infrastructure needs: lower and lower Distribution level Regional PoPs Country /International Central Offices Local PoPs Data Centers Data Plane Focus Control Plane Focus Mobile Core: Mobile Core GiLAN vEPC MVNO vMME MVAS vPCRF vWiFi access control vSBC vCDN control Gateway vIMS Backbone network RAN Virtual DU 4G vBBU 2G/3G vRNC vCDN vCPE vBOX (MAC/RLC) vBNG vSSL/IPSec Gateway vOLT MEC DSLAM / OLT vEPC4Business 100 - 250 KM 5 - 50 KM 300 - 500 KM 1000 + KM Few hundreds Few- Tens of Few tens Few per country thousands 7 OpenStack Summit Berlin
What do we expect from this distributed infrastructure ? ● Simply speaking: we should be able to ‘’play’’ with this distributed cloud infrastructure as if it was located in a single data center ● Not so easy! ○ Scalability ○ Lifecycle of control components ○ Networking: especially the interactions with the WAN ○ On-site operations for initial setup, hardware upgrade/troubleshooting ○ How to architecture the control plane components for better resiliency/efficiency? ○ Can we really share the infrastructure among a large variety of NFV functions? (mission critical and best effort) 8 OpenStack Summit Berlin
What do we expect from this distributed infrastructure ? ● And additional requirements will appear, driven by the nature of the NFV functions themselves: ○ Performance and real time constraints (e.g. for Higher Phy vRAN functions) ○ Mix of Workloads to be supported (VMs + Containers) ○ Location awareness for resources’ allocations/reconfigurations/interconnections Orange as other Global Telcos, is strongly interested in preparing different scenarios of how to use Openstack to address these requirements and work/support OpenStack evolutions for this purpose 9 OpenStack Summit Berlin
Can we operate such a topology with OpenStack?
Edge: Envisioned Topology Central / Regional Site Large Data Center Embedded DC Medium/Micro DCs WAN Mobile Edge Site WAN Edge Site E: Edge Site Controller R: Regional Site Distributed DC Compute/Storage Node C: Central Site Constrained Edge Site Wired Wireless The Akraino View Another possible View Source: https://wiki.akraino.org/display/AK/Akraino+Edge+Stack 11 OpenStack Summit Berlin
Can we operate such a topology with OS? Central Site Data Center ● Two questions to address: ○ How does openstack behave in each of Embedded DC Micro DC WAN these deployment scenarios? (i.e., what are the challenges of each scenarios?) ○ How can we make each openstack Public Transport WAN Regional Site collaborate with the others? Controller Distributed DC Compute/Storage Node Customer Premises Wired Equipment Wireless 12 OpenStack Summit Berlin
Edge: Envisioned Topology Central / Regional Site y l i t Data Center b i a a l s c Embedded DC Micro DC o n t i WAN a i z o n h r c n S y Footprint Mobile Edge Site WAN Edge Site Controller Network Specifics Distributed DC Compute/Storage Node Constrained Edge Site Wired Wireless 13 OpenStack Summit Berlin
Edge: Envisioned Topology Central / Regional Site y l i t b i a Data Center a l s c A few challenges (scalability, latency, intermittent network connectivity, deployment, etc.) Embedded DC Inria, Orange, and Redhat have been investigating the distributed Micro DC o n t i a WAN i z o n h r c n S y DC scenarios for the two last years. Footprint Mobile Edge Site Edge Site Network Specifics Controller Distributed DC Compute/Storage Node Constrained Edge Site Wired Wireless 14 OpenStack Summit Berlin
Let’s focus on Distributed DC
Distributed DC => Distributed Compute Nodes Features: Central Location 1 shared (Openstack) cluster ➢ ➢ 1 central control Plane and N remote TripleO Openstack Controllers (LifeCycle Mgmt) sites with Compute nodes ➢ Each Remote Site is an AZ WAN A few performance studies Remote Remote Remote Remote ➢ Evaluating OpenStack WANWide Sites 1 Sites 2 Sites 3 Sites N Compute Compute Compute Compute (See FEMDC openstack wikipage) ➢ Supported by Redhat since Newton 16 OpenStack Summit Berlin
Lessons learnt ● The testing effort was focused on clarifying limitations and expectations in Newton ○ Latency ■ At 50ms roundtrip latency the testing started producing errors and timeouts. ■ Beyond this is not generally supported, but the testing was focused on the infrastructure ■ Once our testing reached 300ms roundtrip latency, the errors increased to the point where service communication will fail. ○ Size of Images ■ We had our initial tests to validate the size of the images and the impact to deploy the first time ■ Beyond 2GB images, deployments failed when deploying 1000 VMs over 10 compute nodes ○ Bandwidth ■ This is highly dependant on the environment [Size of Images, Unique Images, App Needs]. ■ Since images are cached, the most bandwidth is needed when sending a unique image the first time. 17 OpenStack Summit Berlin
Upstream agenda ● In Queens/Rocky ○ Compute Node with Ephemeral Storage ○ Director Deployment ○ Split Stack => Distributed Deployment ○ (L3 support)/Multi-subnet configurations ● In Stein ○ Compute Node with local (persistent) Storage ○ Distributed Ceph Support: A single Ceph cluster across the central and remote sites ○ Glance Images on multi-store See Presentation on Thursday 3PM ○ Enable HCI Node (Converged Compute, Storage) Using Prometheus Operator ... ○ Ceph Cluster on remote sites (min of 3 servers) ● In Train ○ Advanced monitoring capabilities to collect data and distribute them to the central location. 18 OpenStack Summit Berlin
Are there other DCN challenges? ● Ongoing activities ○ New performance evaluations based on Queen (Redhat) ○ Impact of remote failures/network disconnections (under investigations at Inria/Orange) http://beyondtheclouds.github.io/blog/ ○ Qpid router as an alternative to Rabbit: a few studies have been performed since 2017 ■ PTG in Dublin / Boston Presentation https://www.openstack.org/videos/vancouver-2018/openstack-internal-messaging-at-the-edge-in-depth-evaluation ■ Berlin Presentation https://www.openstack.org/summit/berlin-2018/rabbitmq-or-qpid-dispatch-router-pushing-openstack-to-the-edge ● Challenges n e l i o n s o e i d v e e ○ SDN solution with DCN S ■ Not all SDN solutions will work with DCN ■ In particular, need to take into account the interface with the WAN ○ Lifecycle of remote resources ○ Impact throughout the whole infrastructure (control and data planes) 19 OpenStack Summit Berlin
Let’s consider several openstack control planes
We need several control planes Central Site Data Center Embedded DC Micro DC WAN Public Transport WAN Regional Site Controller Distributed DC Compute/Storage Node Customer Premises Wired Equipment Wireless 21 OpenStack Summit Berlin
Several control planes - academic investigations Several control planes: academic investigations OpenStack Instances ● Collaborations between/for all services [3, 9, 45] ● A major constraint/objective: do not modify the code ● Two alternative approaches: ○ 1. One ring to rule them all GALERA ■ A global AMQP bus and a global shared DB ■ A few presentations have been performed (see FEMDC openstack wiki page) https://wiki.openstack.org/wiki/Fog_Edge_Massively_Distributed_Clouds#Achieved_Actions Almost straightforward (integration at the oslo level) Scalability / Partitioning / Versioning Collaboration is not only sharing states (a few services have to be extended) 22 OpenStack Summit Berlin
Recommend
More recommend