magic castle
play

Magic Castle Terraforming the Cloud for HPC Flix-Antoine Fortin, - PowerPoint PPT Presentation

Magic Castle Terraforming the Cloud for HPC Flix-Antoine Fortin, FOSDEM20 Why are there more wizards in Harry Potter than in Lord of the Rings? Context Canada Digital Research Infrastructure Education and Training in Compute Canada Over


  1. Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20

  2. Why are there more wizards in Harry Potter than in Lord of the Rings?

  3. Context

  4. Canada Digital Research Infrastructure

  5. Education and Training in Compute Canada Over 150 workshops / year ● Most workshops use the ● HPC software environment HPC clusters require an ● account Account creation process ● can take a few days Could we replicate the HPC environment for training?

  6. So what is the difgerence between HP and LotR? ?

  7. So what is the difgerence between HP and LotR? Wizardry Schools

  8. Proposal

  9. HPC Wizard Tower by Simon Guilbault

  10. demo

  11. CC Wizard: Magic Castle Voice Assistant

  12. CC Wizard: Magic Castle Voice Assistant

  13. Magic Castle Open source project that instantiates a Compute Canada cluster replica in any major cloud with Terraform and Puppet Create instances ● Management nodes ○ Login nodes ○ Compute nodes ○ Create volumes, network, network acls ● Create certificates, dns records, passwords ● Configuration done via input parameters ● https://github.com/computecanada/magic_castle

  14. Terraform Puppet Tool for building, Config management tool ● ● changing, and versioning used for deploying, infrastructure configuring and managing Infrastructure is servers. ● described using a Define configurations ● high-level configuration for each host syntax. Continuously check ● Create resources that whether the required ● can then be setup by a configuration is in config management tool. place and is not altered

  15. Overview of a Magic Castle Release cloud-init mgmt.yaml Magic Castle puppet.yaml infrastructure.tf provider* data.tf variables.tf main.tf output.tf provider.tf *could be any in [aws, azure, gcp, openstack, ovh]

  16. Infrastructure

  17. Overview of a Magic Castle Release cloud-init mgmt.yaml Magic Castle puppet.yaml infrastructure.tf provider* data.tf variables.tf main.tf output.tf provider.tf *could be any in [aws, azure, gcp, openstack, ovh]

  18. Architecture

  19. Architecture - login nodes

  20. Architecture - management nodes

  21. Architecture - compute nodes

  22. Main Interface

  23. Overview of a Magic Castle Release cloud-init mgmt.yaml Magic Castle puppet.yaml infrastructure.tf provider* data.tf variables.tf main.tf output.tf provider.tf *could be any in [aws, azure, gcp, openstack, ovh]

  24. Magic Castle Terraform Main Module 4 sections 1. Cloud provider selection 2. Infrastructure customization 3. Cloud Provider specifics inputs 4. DNS Configuration (optional)

  25. MC Module - 1. source source = "./provider"

  26. MC Module - 2.1 Infrastructure customization cluster_name = "fosdem" domain = "computecanada.dev" image = "CentOS-7-x64-2019-07" nb_users = 100 public_keys = [file("~/.ssh/id.pub")]

  27. MC Module - 2.2 Instance definition instances = { mgmt = { type = "p4-6gb", count = 1 }, login = { type = "p2-3gb", count = 1 }, node = { type = "p2-3gb", count = 1 } }

  28. MC Module - 2.3 Storage definition storage = { type = "nfs" home_size = 100 project_size = 50 scratch_size = 50 }

  29. MC Module - 3. Cloud Provider Specific Inputs Examples: ● OpenStack list of floating ips ● Google GPU attachment for compute nodes ● AWS / Azure / Google Cloud region

  30. MC Module - 4. DNS Configuration (optional) source = "./dns/cloudflare" name = module.provider.cluster_name domain = module.provider.domain email = "you@example.com" public_ip = module.provider.ip rsa_public_key = module.provider.rsa_public_key sudoer_username = module.provider.sudoer_username

  31. Apply Plan $ terraform apply Apply complete! Resources: 30 added, 0 changed, 0 destroyed. Outputs: admin_username = centos guest_passwd = **redacted** guest_usernames = user[01-10] hostnames = [pirate.calculquebec.cloud, pirate1.calculquebec.cloud] public_ip = [206.12.90.97]

  32. Challenges: Infrastructure as Code Designing the main user interface that would limit ● the references to a provider specific implementation / API. Terraform configuration language tends to favor ● repetition over re-use of code. Regrouping every components that are common amongst ● providers

  33. Provisioning

  34. Overview of a Magic Castle Release cloud-init mgmt.yaml Magic Castle puppet.yaml infrastructure.tf provider* data.tf variables.tf main.tf output.tf provider.tf *could be any in [aws, azure, gcp, openstack, ovh]

  35. Bootstrap Puppet 1. Inject data from TF 2. Upgrade CentOS 3. Install Puppet rpms 4. Configure Puppet certificates 5. Setup host configuration

  36. Provisioning with Puppet and Consul t 1 g m m 1 g i n l o 6 e d o n node3 node5 node1 node4 node2

  37. Challenges: Provisioning Every steps of the provisioning need to work ● without human intervention. Once provisioned, the cluster need to stay healthy ● on itself - users are not necessarily sys admins. Provisioning both master and slave services without ● proper syncing mechanism.

  38. Software

  39. Batteries Included FreeIPA ● Kerberos ○ BIND ○ 389 DS LDAP ○ NFS ● Slurm ● Globus Endpoint ● JupyterHub with BatchSpawner ● Compute Canada CVMFS ● LMOD ●

  40. Compute Canada Software Stack - CVMFS CernVM File System (CVMFS) provides a scalable, ● reliable and low-maintenance software distribution service; Compute Canada CVMFS repo: ● 600+ scientific applications ○ 4,000+ permutations of ○ version/arch/toolchain All compiled with EasyBuild ○ Available from anywhere ● PEARC19 paper ●

  41. Key Takeaways 1. Terraform can be used to build complex things and modules simplify that complexity. 2. Magic Castle is a teaching and development meta-platform for HPC.

  42. Magic Castle Replicates a Compute Canada Cluster in 20 min.

  43. Questions ?

Recommend


More recommend