Swarm “Transparently distributed computation in the cloud” Ian Clarke ian@uprizer.com Sunday, September 13, 2009
Swarm “Transparently Distributed Computation in the cloud” Ian Clarke ian.clarke@gmail.com Sunday, September 13, 2009
About me • Degree in AI and Comp Sci from Edinburgh University, Scotland (1995-1999) • Designer and co-ordinator of Freenet, the first decentralized P2P architecture (1999-present) • Designed P2P video streaming system that later became part of “Joost” (2003-2004) • Founder and Chief Scientist of Revver (2004-2006) • CEO of Uprizer Labs (2007-present) Sunday, September 13, 2009
The Problem Sunday, September 13, 2009
Building a web-app? You want your development process to be: • Cheap and fast to implement • Scalable in the event of success Sunday, September 13, 2009
Building a web-app? You want your development process to be: Pick One! • Cheap and fast to implement • Scalable in the event of success Sunday, September 13, 2009
Examples Sunday, September 13, 2009
Sunday, September 13, 2009
Sunday, September 13, 2009
• 22 hour outage after IPO in 1999 • Estimated cost: Over $2M Sunday, September 13, 2009
• 22 hour outage after IPO in 1999 • Estimated cost: Over $2M Sunday, September 13, 2009
• 22 hour outage after IPO in 1999 • Estimated cost: Over $2M • Periodic outages since it started, most recently August ’09 • Forced fundamental rearchitecture • Aside: Started with Ruby on Rails, now using Scala Sunday, September 13, 2009
Sunday, September 13, 2009
How is this solved today? Sunday, September 13, 2009
Database Architecture MySql Cache Cache Cache WebNode WebNode WebNode Sunday, September 13, 2009
Replicate databases MySql MySql MySql Cache Cache Cache WebNode WebNode WebNode Sunday, September 13, 2009
Map Reduce • Certain problems may be broken into “map” and “reduce” operations • Interesting because the data stays still, the computation moves • Good at things like distributed sort, distributed grep, etc • Not general-purpose Sunday, September 13, 2009
Our Proposal: Swarm Sunday, September 13, 2009
But first... Some background Sunday, September 13, 2009
Scala Sunday, September 13, 2009
Scala • Compiles to Java bytecode • so its fast and widely supported Sunday, September 13, 2009
Scala • Compiles to Java bytecode • so its fast and widely supported • Supports closures, and type-inference • so it solves most of Java’s problems Sunday, September 13, 2009
Scala • Compiles to Java bytecode • so its fast and widely supported • Supports closures, and type-inference • so it solves most of Java’s problems • The upcoming Scala 2.8 supports “portable continuations” Sunday, September 13, 2009
Continuations Sunday, September 13, 2009
What do continuations do? • Store the state of a computer program • Like saving your position in a video game • Resume execution at some point in the future Sunday, September 13, 2009
Scala 2.8’s continuations support Sunday, September 13, 2009
Scala 2.8’s continuations support • “Delimited” Sunday, September 13, 2009
Scala 2.8’s continuations support • “Delimited” • Portable Sunday, September 13, 2009
Scala 2.8’s continuations support • “Delimited” • Portable • Implemented through a code transformation Sunday, September 13, 2009
Scala 2.8’s continuations support • “Delimited” • Portable • Implemented through a code transformation • Complicated! Sunday, September 13, 2009
The Solution Sunday, September 13, 2009
? What if we could distribute data and computation across multiple computers such that the programmer need not think about it? Sunday, September 13, 2009
But how? Sunday, September 13, 2009
But how? • Move the computation, not the data Sunday, September 13, 2009
But how? • Move the computation, not the data • Handle this transparently within the framework Sunday, September 13, 2009
But how? • Move the computation, not the data • Handle this transparently within the framework • Arrange the data to minimize movement of the computation Sunday, September 13, 2009
How does it work? b a Program: c 1. print a 2. print b 3. print c Sunday, September 13, 2009
How does it work? b a Program: c 1. print a 2. print b 3. print c Sunday, September 13, 2009
How does it work? b a Program: c 1. print a 2. print b 3. print c Sunday, September 13, 2009
How does it work? b a Program: c 1. print a 2. print b 3. print c Sunday, September 13, 2009
Arranging data with graph clustering Sunday, September 13, 2009
Forcing Swarm to migrate the continuation Sunday, September 13, 2009
Sunday, September 13, 2009
Forced remote variable Sunday, September 13, 2009
Sunday, September 13, 2009
What next? • Just a simple prototype • Many interesting sub-problems • Open source • Need your help! Sunday, September 13, 2009
Storage • How do we arrange the data for optimal efficiency? • What about concurrency? • Software transactional memory • Replication and redundancy • Garbage collection Sunday, September 13, 2009
A “universal” codebase • Swarm requires that every node has the same binary • We could use the JVM’s classloader mechanism to retrieve binaries as needed from a global namespace • Will need to address issues of versioning and security Sunday, September 13, 2009
“Swarm aware” libraries • Need “Swarm” aware collections classes like Map, List, and Set • Develop a storage system with capabilities similar to a relational database • The creation of a web framework around Swarm (similar to “Rails” or “LiftWeb”) Sunday, September 13, 2009
Swarm tools • Continuations plugin imposes restrictions on the code that can be migrated • “foreach” • Serializable • A Scala compiler plugin that understood these limitations would be very useful Sunday, September 13, 2009
Interested in helping? http://code.google.com/p/swarm-dpl/ ian@uprizer.com Sunday, September 13, 2009
Recommend
More recommend