ganeti the new and arcane
play

Ganeti, the New and Arcane ganeti's best kept secrets, and exciting - PowerPoint PPT Presentation

Ganeti, the New and Arcane ganeti's best kept secrets, and exciting new developments Ganeti Eng Team - Google LinuxCon Japan 2014 - 2 Feb 2014 Introduction to Ganeti A cluster virtualization manager, in one slide What is Ganeti? Manage


  1. Ganeti, the New and Arcane ganeti's best kept secrets, and exciting new developments Ganeti Eng Team - Google LinuxCon Japan 2014 - 2 Feb 2014

  2. Introduction to Ganeti A cluster virtualization manager, in one slide

  3. What is Ganeti? · Manage clusters 1-200 of physical machines, divided in nodegroups · Deploy Xen/KVM/LXC virtual machines on them - Live migration - Resiliency to failure (DRBD, Ceph, SAN/NAS, ...) - Cluster balancing - Ease of repairs and hardware swaps · Controlled via command line, REST, web interfaces 4/53

  4. Newest features Development status

  5. 2.10 The very stable release · Improved upgrade procedure "gnt-cluster upgrade" · CPU Load in hail/hbal (GSOC project) · Hotplug support (KVM) · RBD storage direct access (KVM) · Better Openvswitch support (GSOC project) 6/53

  6. 2.11 The latest stable release · Faster instance moves · GlusterFS support · hsqueeze (achieve maximum cluster compaction) 7/53

  7. 2.12 and future The next stable release(s) · Jobs as processes · New install model · More secure master candidates · Better container support (GSOC) · Resource reservation/Extra parallelization · Generic conversion between disk templates (GSOC) 8/53

  8. Monitoring daemon What's going on in your cluster?

  9. Monitoring a cluster The old school way Other Systems Cluster Monitoring NICs System Instance Master Node Storage 10/53

  10. Monitoring a cluster Using the monitoring daemon Other Systems Cluster Monitoring System Monitoring Daemons 11/53

  11. What is the monitoring daemon? Provides information: · about the cluster state/health · live · read-only design doc: design-monitoring-agent.rst 12/53

  12. More details · HTTP daemon · Replying to REST-like queries · Actually, GET only · Providing JSON replies · Easy to parse in any language · Already used in all the rest of Ganeti · Running on every node (Not: only master-candidates, VM-enabled) · Additionally: mon-collector : quick 'n dirty CLI tool 13/53

  13. Data collectors · provide data to the deamon · one collector, one report · one collector, one category: - storage, hypervisor, daemon, instance · two kinds: performance reporting, status reporting · new feature: stateful data collectors 14/53

  14. Data collectors What data can be retrieved right now? Now: · instance status (Xen only) (category: instance) · diskstats information (storage) · LVM logical volumes information (storage) · DRBD status information (storage) · Node OS CPU load average (no category, default) Soon(-ish): · instance status for KVM (instance) · Ganeti daemons status (daemon) · Hypervisor resources (hypervisor) · Node OS resources report (default) 15/53

  15. The report format JSON { "name" : "TheCollectorIdentifier", "version" : "1.2", "format_version" : 1, "timestamp" : 1351607182000000000, "category" : null, "kind" : 0, "data" : { "plugin_specific_data" : "go_here" } } · name: the name of the plugin. Unique string. · version: the version of the plugin. A string. · format_version: the version of the data format of the plugin. Incremental integer. · timestamp: when the report was produced. Nanoseconds. Can be zero- padded. 16/53

  16. Status reporting collectors: report They introduce a mandatory part inside the data section. JSON "data" : { ... "status" : { "code" : <value> "message: "some summary goes here" } } · <value>: by increasing criticality level · 0: working as intended · 1: temporarily wrong. Being auto-repaired · 2: unknown. Potentially dangerous state · 4: problems. External intervention required 17/53

  17. How to use the daemon? · Accepts HTTP connections on node.example.com:1815 · Not authenticated: read only · Just firewall, or bind on local address only · GET requests to specific addresses · Each address returns different info according to the API / (return the list of supported protocol version) /1/list/collectors /1/report/all /1/report/[category]/[collector_name] 18/53

  18. Configuration Daemon (confd) How's your cluster supposed to look like?

  19. Before confd · Configuration only available on master candidates · Few selected values replicated with ssconf · Small pieces of config in text files on all the nodes · Doesn't scale · Need for a way to access config from other nodes · Scalable · No single point of failure (so, no RAPI) 20/53

  20. What does confd do? · Provides information from config.data · Read-only · Distributed · Multiple daemons running on master candidates · Accessible from all the nodes through confd protocol · Resilient to failures · Optional 21/53

  21. What info does it provide? Replies to simple queries: · Ping · Master IP · Node role · Node primary IP · Master candidates primary IPs · Instance IPs · Node primary IP from Instance primary IP · Node DRBD minors · Node instances 22/53

  22. confd protocol General description · UDP (port 1814) · keyed-Hash Message Authentication Code (HMAC) authentication · Pre-shared, cluster wide key · Generated at cluster-init · Root-only readable · Timestamp · Checked (± 2.5 mins) to prevent replay attacks · Used as HMAC salt · Queries made to any subset of master candidates · Timeout · Maximum number of expected replies 23/53

  23. Confd protocol Request/Reply request request request request request 24/53

  24. Confd protocol Request/Reply timeout reply (v: 57) reply (v: 57) reply (v: 57) (enough replies) reply (v: 56) 25/53

  25. confd protocol Request CONFD plj0{ "msg": "{\"type\": 1, \"rsalt\": \"9aa6ce92-8336-11de-af38-001d093e835f\", \"protocol\": 1, \"query\": \"node1.example.com\"}\n", "salt": "1249637704", "hmac": "4a4139b2c3c5921f7e439469a0a45ad200aead0f" } · plj0: fourcc detailing the message content (PLain Json 0) · hmac: HMAC signature of salt+msg with the cluster hmac key 26/53

  26. confd protocol Request CONFD plj0{ "msg": "{\"type\": 1, \"rsalt\": \"9aa6ce92-8336-11de-af38-001d093e835f\", \"protocol\": 1, \"query\": \"node1.example.com\"}\n", "salt": "1249637704", "hmac": "4a4139b2c3c5921f7e439469a0a45ad200aead0f" } · msg: JSON-encoded query · protocol: confd protocol version (=1) · type: What to ask for ( CONFD_REQ_* constants) · query: additional parameters · rsalt: response salt == UUID identifying the request 27/53

  27. confd protocol Reply CONFD plj0{ "msg": "{\"status\": 0, \"answer\": 0, \"serial\": 42, \"protocol\": 1}\n", "salt": "9aa6ce92-8336-11de-af38-001d093e835f", "hmac": "aaeccc0dff9328fdf7967cb600b6a80a6a9332af" } · salt: the rsalt of the query · hmac: hmac signature of salt+msg 28/53

  28. confd protocol Reply CONFD plj0{ "msg": "{\"status\": 0, \"answer\": 0, \"serial\": 42, \"protocol\": 1}\n", "salt": "9aa6ce92-8336-11de-af38-001d093e835f", "hmac": "aaeccc0dff9328fdf7967cb600b6a80a6a9332af" } · msg: JSON-encoded answer · protocol: protocol version (=1) · status: 0=ok; 1=error · answer: query-specific reply · serial: version of config.data 29/53

  29. Ready-made clients The protocol is simple, but clients are simpler · Ready to use confd clients · Python · lib/confd/client.py · Haskell · Since Ganeti 2.7 · src/Ganeti/ConfD/Client.hs · src/Ganeti/ConfD/ClientFunctions.hs 30/53

  30. Expanding confd capabilities · Currently not so many queries are supported · Easy to add new ones · Just add a new query type in the constants list · ...and extend the buildResponse function ( src/Ganeti/Confd/Server.hs to reply to it in the appropriate way 31/53

  31. Ganeti and Networks How do your instances talk to the world? · Some slides contributed by Dimitris Aragiorgis <dimara@grnet.gr>

  32. current nics: MAC + IP + link + mode NIC configuration · mode=bridged uses brctl addif · Hooks can deal with firewall rules, and more · External systems needed for DHCP, IPv6, etc. Management · Which VMs are on the same collision domain? · Which IP is free for a new VM to use? 33/53

  33. gnt-network overview · manage collision domains for your instances · easy way to assign IPs to instances - If resources are shared in multiple clusters, allocation must be done externally · keep existing per-nic flexibility · hide underlying infrastructure · better networking overview 34/53

Recommend


More recommend