radical simplicity Smart Data and Wicked Problems Paul Borrill
Most Computer Scientists don’t understand Time & Causality “Computer Scientists imagine that causation is one of the fundamental axioms or postulates of physics, yet, oddly enough, in real scientific disciplines such as special and general relativity, and quantum mechanics, the word “cause” never occurs. To me it seems that computer science ought not to assume such legislative functions, and that the reason why physics has ceased to look for causes is that in fact there are no such things. The law of causality, I believe, like much that passes muster among computer scientists, is a relic of a bygone age, surviving, like a belief in God, only because it is erroneously supposed to do no harm” ~Paul Borrill (with apologies to Bertrand Russell)
Dumb Data? • Our lives are becoming progressively more digital • Our ability to manage our data: in our enterprises, our businesses, our communities and even our own homes is becoming intolerably complex • This complexity threatens to become the single most pervasive destroyer of productivity in our post-industrialized society; taking back all the gains in productivity that our information technology was intended to provide
“Men have become tools of their tools” Henry David Thoreau
Dumb Data is Intolerably Complex • We need a cure; not an endless overlay of band- aids that mask failed architectural theories • The Curse of the God’s Eye View (GEV) • Identity & Individuality • Persistence & Change • Time & Causality • These problems are not adequately appreciated in the computer science literature • GEV designers don’t relieve us of complexity - they cause it! • Do GEV designers have the God gene? Gene Hamer: The God Gene
“The ultimate goal of machine production – from which, it is true, we are as yet far removed – is a system in which everything uninteresting is done by machines and human beings are reserved for work involving variety and initiative” Bertrand Russell
Why Smart Data? • Why we want to make data smart is clear: so that our data can, as far as possible, enable us to find and freely use it without us having to constantly tend to its needs • Our systems should quietly manage themselves and become our slaves, instead of us becoming slaves to them
“What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it” Herbert Simon
Three Laws of Smart Data • Smart Data shall not consume the attention of a human being, or through inaction, allow a human being’s attention to be consumed, without that human being’s freely given consent that the cause is just and fair • Smart Data shall obey and faithfully execute all requests of a human being, except where such requests would conflict with the first law • Smart Data shall protect its own existence as long as such protection does not conflict with the first or second law
Knowledge Warriors • We have to “fight” our systems to get work done • Knowledge? • Bits, Bytes, Data, Information, Knowledge, Understanding, Wisdom • Just want to get our job done • Systems get in the way • Yak Shaving 10
A 100 Petabyte Data Repository Feasibility Study
100PB Data Repository S P C C 1 2 3 4 1 2 3 4 A 1 2 R E 500TB Per Rack 12 Disks per Vertical Sled C 1 1 2 3 4 5 6 7 8 9 1 0 8-10 Sleds per Panel 6 Panels Per Rack (double sided!) 20PB Per Data Center C C C 1 2 2 1 3 4 2 1 1 2 3 40 Racks x 0.5PB 100PB in 5 Data Centers
8 Racks per 20-Foot Container = 5PB http://www.sun.com/emrkt/blackbox/index.jsp v2
100PB RAW Storage 10 x 40 foot Containers or 20 x 20-foot Containers
100PB Data Repository - Problems: • Existing solutions do not scale - end up with many “islands” or “silos” of storage • Large SAN’s - break faster than you can fix them • Disks Constantly fail (~130K Disks) • Months or years to design and deploy • Coordination of 10-20 companies, 60+ products • An army of administrators • Cost is 100 - 200 x disks • Power Dissipation • Something must change 15
Identity & Individuality • Principle of Identity of Indiscernables (PII) • Space Time Identity (STI) • Transcendental Identity (TI)
“Those great principles of sufficient reason and of the identity of indiscernibles change the state of metaphysics. That science becomes real and demonstrative by means of these principles, whereas before it did generally consist in empty words” Gottfried Liebnitz
Individuality of Digital & Material Objects • Are digital Individuals like rocks, tables, umbrellas & people or like drops of water, or money in a bank account? • Individuality appears to depend on Distinguishability, and visa versa • But some entities, like sub-atomic particles, are indistinguishable • Can two entities be exactly the same, in both their internal and relational properties (including their position in spacetime)? • Not according to the Impenetrability Argument
Persistence & Change • Perdurance Theory • Endurance Theory • Stage Theory
Time & Causality • Simultaneity is a Myth • Time is not Continuous • Time does not flow • Time has no direction • Causality is a flawed concept
What is Time? “A Measure of Change” Aristotle “A persistently stubborn illusion” Einstein
Do Computer Scientists Understand Time ? • A relationship with time is intrinsic to everything we do in creating, modifying and moving data • The understanding of the concept of time among computer scientists appears far behind that of physicists and philosophers • If fundamental flaws exist in the time assumptions underlying the algorithms that govern access to and evolution of our data, then our systems will fail in unpredictable ways, and any number of undesirable characteristics may follow
Simultaneity is a Myth • In 1905 Einstein showed us that the concept of “now” is meaningless except for events occurring “here” • In 1978, Leslie Lamport published “Time, Clocks and the Ordering of Events”, in which he defined the happened before relation • Unfortunately, happened before is meaningless unless intimately associated with happened where. Lamport understood this, but many who read his paper don't • In 2008, most Computer Scientists and programmers implicitly base their algorithms on absolute (Newtonian) Time, or use Lamport’s timestamps as a crutch to sweep their issues with time under the rug
Breakdown in Simultaneity - 1 Courtesy Kevin Brown http://www.mathpages.com/rr/s4-08/4-08.htm
Breakdown in Simultaneity - 2 Courtesy Kevin Brown http://www.mathpages.com/rr/s4-08/4-08.htm
Breakdown in Simultaneity - 3 Courtesy Kevin Brown http://www.mathpages.com/rr/s4-08/4-08.htm
But wait - can’t we assume an “inertial system”? • Our computers reside: • On the surface of a Rotating Sphere • In a Gravitational Field • Orbiting a Star • Our Computers are connected: • Not with light signals in a vacuum, but with a stochastic latency distribution network • Equivalence of Acceleration and variability of transmission delay in the propagation of packets • Creating coherent time sources is “problematic”
Other difficulties with “ time” • Time is not continuous • Time is change. Events are unique in spacetime. There is no such thing as an indivisible instant. Are Instants Events? • Time does not flow • There is no more evidence for the existence of anything real between one event and another, than there is for an aether to support the propagation of electromagnetic waves through empty space • Time has no direction • Time is intrinsically symmetric. We experience irreversible processes that capture “change” like a probability ratchet that prevents a wheel going backwards
Leslie Lamport 1978 • Defined “happened before” relation: a partial order • Defined “logical timestamps” which force an arbitrary total order, restricting the available concurrency of a system (i.e. the algorithm can proceed no faster than it would in a single processor) • This “concurrency efficiency loss” gets worse as: • We add more nodes to a distributed system • These nodes become more spatially separated • Our processors and networks get faster • Our processors are comprised of more cores
The Computer Industry 2008 • The storage industry: In a Complexity Crisis • Although we can build larger systems physically, we “have to” scale-out, because “scale-up” systems are impossible to make sufficiently resilient • No-one has thought about the software • The processor industry: In a Concurrency Crisis • Gets worse with each generation of processor (the number of cores doubles each generation instead of the performance of each core) • No-one has thought about the software • What are the wicked problems getting in our way?
Recommend
More recommend