reliable network booting of cluster computers
play

Reliable network booting of cluster computers Matthew Steggink July - PowerPoint PPT Presentation

Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Reliable network booting of cluster computers Matthew Steggink July 2nd, 2008 Matthew Steggink Reliable network booting


  1. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Reliable network booting of cluster computers Matthew Steggink July 2nd, 2008 Matthew Steggink Reliable network booting of cluster computers

  2. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Matthew Steggink Reliable network booting of cluster computers

  3. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Network booting ◮ Booting off the network instead of local disk Matthew Steggink Reliable network booting of cluster computers

  4. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Network booting ◮ Booting off the network instead of local disk ◮ Easily deploy new computers; Matthew Steggink Reliable network booting of cluster computers

  5. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Network booting ◮ Booting off the network instead of local disk ◮ Easily deploy new computers; ◮ Centralized image management; Matthew Steggink Reliable network booting of cluster computers

  6. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Network booting ◮ Booting off the network instead of local disk ◮ Easily deploy new computers; ◮ Centralized image management; ◮ Possibility of diskless computers; Matthew Steggink Reliable network booting of cluster computers

  7. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Network booting ◮ Booting off the network instead of local disk ◮ Easily deploy new computers; ◮ Centralized image management; ◮ Possibility of diskless computers; ◮ Involves DHCP, ARP and TFTP ◮ Currently used for network booting: PXELinux Matthew Steggink Reliable network booting of cluster computers

  8. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions The setup Matthew Steggink Reliable network booting of cluster computers

  9. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Research question When booting a large number of clients, some will not complete the boot process ◮ An analysis of the failing points; ◮ Determine the cause of the failing clients; ◮ Search for a solution; Matthew Steggink Reliable network booting of cluster computers

  10. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Testing Matthew Steggink Reliable network booting of cluster computers

  11. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Shape the traffic ◮ Limit the traffic to simulate network characteristics ◮ Two options to shape the traffic Matthew Steggink Reliable network booting of cluster computers

  12. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Shape the traffic ◮ Limit the traffic to simulate network characteristics ◮ Two options to shape the traffic 1. VMWare Teams 2. Traffic Control in Linux: Token Bucket Filter Matthew Steggink Reliable network booting of cluster computers

  13. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Shape the traffic ◮ Limit the traffic to simulate network characteristics ◮ Two options to shape the traffic 1. VMWare Teams 2. Traffic Control in Linux: Token Bucket Filter ◮ Limit traffic and set the rates lower to find a failing point Matthew Steggink Reliable network booting of cluster computers

  14. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Observations - Traffic control ◮ VMware teaming does not shape accurately ◮ TC shapes more reliable Figure: VMWare versus tc traffic control Matthew Steggink Reliable network booting of cluster computers

  15. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Observations - Fail point ◮ Too much packet loss and not enough bandwidth Matthew Steggink Reliable network booting of cluster computers

  16. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Identified problems ◮ DHCP ◮ No DHCP Offers, No boot file Matthew Steggink Reliable network booting of cluster computers

  17. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Identified problems ◮ DHCP ◮ No DHCP Offers, No boot file ◮ ARP ◮ ARP Timeout Matthew Steggink Reliable network booting of cluster computers

  18. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Identified problems ◮ DHCP ◮ No DHCP Offers, No boot file ◮ ARP ◮ ARP Timeout ◮ TFTP ◮ TFTP Timeout, Read timeout, illegal operation, server does not support tsize Matthew Steggink Reliable network booting of cluster computers

  19. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Identified problems ◮ DHCP ◮ No DHCP Offers, No boot file ◮ ARP ◮ ARP Timeout ◮ TFTP ◮ TFTP Timeout, Read timeout, illegal operation, server does not support tsize ◮ During downloading (TFTP) ◮ Loading vmlinuz... boot failed Matthew Steggink Reliable network booting of cluster computers

  20. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Booting by TCP / HTTP using gPXE ◮ gPXE is an open source project ◮ TCP has delivery reliablity because of re-transmissions with acknowledgments ◮ Two deployment methods Matthew Steggink Reliable network booting of cluster computers

  21. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Booting by TCP / HTTP using gPXE ◮ gPXE is an open source project ◮ TCP has delivery reliablity because of re-transmissions with acknowledgments ◮ Two deployment methods 1. gPXE flashed into the boot ROM 2. gPXE used as a second stage loader Matthew Steggink Reliable network booting of cluster computers

  22. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions gPXE results ◮ gPXE is easy to use, only a few extra lines of code ◮ No alterations to the clients are needed ◮ It was compatible with mainstream boot ROM’s (Tested: Intel, Broadcom, Nvidia) ◮ Connections are more reliable; no connections have been aborted during testing ◮ Disadvantage at this point: ◮ Introduces a second DHCP transaction Matthew Steggink Reliable network booting of cluster computers

  23. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Situations compared Matthew Steggink Reliable network booting of cluster computers

  24. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Conclusion ◮ gPXE is ready to deploy with only minor alterations; ◮ The current setup should not use TFTP; ◮ Connections are more reliable with gPXE and TCP/HTTP; ◮ Results: ◮ DHCP is still the bottleneck ◮ TFTP bottlenecks have been solved Matthew Steggink Reliable network booting of cluster computers

  25. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Future work ◮ Take out the second DHCP session ◮ There might be a better performing DHCP server Matthew Steggink Reliable network booting of cluster computers

  26. Outline Theory Research question Test methods Observations Alternative booting Conclusion and future work Questions Questions ◮ Matthew Steggink matthew.steggink@os3.nl Matthew Steggink Reliable network booting of cluster computers

Recommend


More recommend