storage as a first class citizen in hpc environments
play

Storage as a First Class Citizen in HPC Environments. James S. - PowerPoint PPT Presentation

Storage as a First Class Citizen in HPC Environments. James S. Plank University of Tennessee CCGSC September 9, 2010 A Personal Historical Perspective Me Erasure codes Y'all HPC A Personal Historical Perspective Jim - 1987 A


  1. Storage as a First Class Citizen in HPC Environments. James S. Plank University of Tennessee CCGSC September 9, 2010

  2. A Personal Historical Perspective Me – Erasure codes Y'all – HPC

  3. A Personal Historical Perspective Jim - 1987

  4. A Personal Historical Perspective LINDA: Parallel computing with a “tuple space.” Data tuples Jim - 1987 Gel er n t er Processing tuples Tuple Space

  5. A Personal Historical Perspective LINDA: Parallel computing with a “tuple space.” - “Linda processes aspire to know as little about each other as possible. Jim - 1987 - They never interact Gel er n t er directly with each other; - they only deal with tuple space.”

  6. A Personal Historical Perspective Jim - 1988

  7. A Personal Historical Perspective SSLS: Shared Single Level Store Jim - 1988 N augh t o n Gigantic shared, persistent address space

  8. A Personal Historical Perspective SSLS: Shared Single Level Store Jim - 1988 N augh t o n Gigantic shared, persistent address space

  9. A Personal Historical Perspective SVM: Shared Virtual Memory Li Jim - 1989 Really big Gigantic shared, persistent address space

  10. A Personal Historical Perspective SVM: Shared Virtual Memory Li Jim - 1989 Really big Gigantic shared, persistent address space

  11. A Personal Historical Perspective HeNCE: Heterogeneous Network Computing Environment. Gr an d Fr o mage Jim - 1990 Functional Dataflow DAG Processing System

  12. A Personal Historical Perspective Jim - 1991-98

  13. A Personal Historical Perspective Mr. Checkpointing: Jim - 1991-98

  14. A Personal Historical Perspective There are two major difficulties with checkpointing: What did I learn: 1. Fighting the OS / Getting it to work. Jim - 1991-98 2. Mitigating the overhead of getting all those bytes to disk. Everything else (synchronization, consistency, Lamport time, etc, etc) is in the noise.

  15. A Personal Historical Perspective Where's the research? Getting it to work. Synchronization, consistency, Mitigating the Lamport time, etc, etc. overhead of getting all those Jim - 1991-98 bytes to disk. 1 2 3

  16. A Personal Historical Perspective C o s t o f R e lia b ility (a t 1 [Elnozahy/Plank 2004] 1 0 0 % 8 0 % O v 6 0 % e Jim - 1991-98 Overhead r 4 0 % h e 2 0 % a d 0 % 0 .3 0 .6 1 .2 2 .4 4 .8 9 .6 1 9 .2 M T B F (in d Checkpoint H o u rly E v e ry 2 h o u r s E v e ry 6 h o u rs D a il Interval

  17. A Personal Historical Perspective Jim – 1999

  18. A Personal Historical Perspective G-Commerce: Brief Foray into Grid Computing Jim – 1999

  19. A Personal Historical Perspective IBP: Internet Backplane Protocol (Logistical Networking) w/ Micah Beck Client malloc() Jim – 1999 - 2005

  20. A Personal Historical Perspective IBP: Internet Backplane Protocol (Logistical Networking) w/ Micah Beck - Best effort - Time limited Client malloc() - Location specific Client Jim – 1999 - 2005 - Which supported third-party transfers.

  21. A Personal Historical Perspective IBP gave data a place to “live” on the network, perhaps moving from site to site. Jim – 1999 - 2005 eXnode

  22. A Personal Historical Perspective Into the land of erasure coding. I won't bore you with it. Jim – 2005 - ???

  23. A Personal Historical Perspective But there's more... Jim – 2010

  24. A Personal Historical Perspective 2010 Meeting on Staging for HPC Jim – 2010 “Staging” The Big Iron The Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  25. A Personal Historical Perspective Code Coupling Checkpointing Caching Alternative Post Processing Representations Pre Processing Jim – 2010 The Big Iron The Disks . . . . . . . . . . . . . . . . . . . . . . . . Oh my . . . . . . . . . . . . . . . . . . . . . . . .

  26. What do we make of all this?

  27. What do we make of all this? 1. Checkpointing Sucks. - Slow - Inelegant - Swamps disks and networks to store gigantic files that are almost never read. - Enables you to perform “bad fault-tolerance.” - Is a manifestation that something is wrong.

  28. What do we make of all this? 2. Band-Aids Are Only Temporary Solutions - Non-reusable - Cover the wounds but don't address the root cause - Are a manifestation that something is wrong.

  29. What do we make of all this? 3. Saving State Sure is Attractive - Lets you reason about programs - (In theory) lets balance load - Allows fault tolerance to fall out naturally - However, it's really difficult to do. - This is why the MPI model throws it in the trash can.

  30. What do we make of all this? 4. I Still Think IBP is Pretty Cool & That There Are Lessons To Be Learned From It - Why do we constrain our view of storage as either the file or the memory segment? - Why is storage either permanent or limited by program lifetime? - Why do we jettison best-effort storage resources? - Why don't we manage the location of storage?

  31. What do we make of all this? Why are storage and processing not equal first-class citizens in HPC?

  32. When I Close My Eyes and Dream ...... The Big Iron looks like this. And program state is represented explicitly And these guys: in here! are promoted to first class citizens.

  33. When I Close My Eyes and Dream ...... And these guys compose seamlessly. And program state is represented explicitly Over in here! extremely wide areas ....

  34. When I Close My Eyes and Dream ...... And the Eagles win the Super Bowl... Every Year... And I retire to that mansion in Capri...

  35. And then I wake up and go back to studying erasure codes. Me – Erasure codes Y'all – HPC

  36. Storage as a First Class Citizen in HPC Environments. James S. Plank University of Tennessee CCGSC September 9, 2010

Recommend


More recommend