Taking Time Seriously
Bryan O’Sullivan
Twitter: @bos31337
Monday, June 18, 12
Taking Time Seriously Bryan OSullivan Twitter: @bos31337 Monday, - - PowerPoint PPT Presentation
Taking Time Seriously Bryan OSullivan Twitter: @bos31337 Monday, June 18, 12 Lets talk about mental effort Our attention is a fragile thing. Humans are easily distracted. Lets illustrate. Try this, while I continue to talk:
Taking Time Seriously
Bryan O’Sullivan
Twitter: @bos31337
Monday, June 18, 12Let’s talk about mental effort
Our attention is a fragile thing. Humans are easily distracted. Let’s illustrate. Try this, while I continue to talk:
Doing math in your head
For most of us, multiplying multiple-digit numbers is an effortful task. Chances are:
walking together,
That bit of mental arithmetic
In case you couldn’t get there (due to my deliberate distractions):
By the way...
If you did the arithmetic exercise in your head... ...can you tell me what kind of creature was on the screen two slides ago? When you’re focused on one thing, it’s easy to miss something else that would normally be obvious.
Monday, June 18, 12The alternative to attention
We know from experience that sustained concentration is draining. Your brain greatly prefers to operate in a less demanding mode, making rapid intuitive judgments. As designers and programmers, we need to keep this intuitive mode of thinking in mind.
Monday, June 18, 12Imperative programming
I often hear traditional imperative programming described as “intuitive”. Let’s have the horizontal line represent its difficulty.
Monday, June 18, 12Functional programming
I’m just as likely to hear experienced programmers call functional programming “non-intuitive”. Again, the length of the horizontal line marks its difficulty.
Monday, June 18, 12Which horizontal line is longer?
Monday, June 18, 12They’re the same lengths!
Many of you will already know the famous optical illusion on the previous screen. It’s called the Müller-Lyer illusion. This illusion sheds a little light on how easily our intuitive judgment can go wrong.
Monday, June 18, 12For what it’s worth...
I divide my professional life between imperative programming (C, C++, Python) and functional programming (Haskell, Lisp). I’d be hard pressed to call either one inherently intuitive. (Or, for that matter, one more intuitive than the other.)
Monday, June 18, 12What has this to do with shipping code?
As programmers, we of course work with our brains all the time. We’re often aware of when we switch between cognitive
The perils of programming
Intuition mode usually serves us well, which is why it’s our default.
that we don’t notice. Attention mode is brought to bear when the going gets tough.
Intuitive-mode programming in MySQL
typedef char my_bool; my_bool check_scramble(const char *scramble_arg, const char *message, const uint8 *hash_stage2) { uint8 hash_stage2_reassured[SHA1_HASH_SIZE]; /* ... uninteresting stuff ... */ return memcmp(hash_stage2, hash_stage2_reassured, SHA1_HASH_SIZE); }
Monday, June 18, 12That code looks... perfectly reasonable?
And yet the line containing a call to memcmp is wrong. An attacker who can talk to a vulnerable MySQL server can authenticate as any user in ~256 attempts. Without knowing any passwords. How did this happen?
Monday, June 18, 12How did this come about?
The FreeBSD / Mac OS X man page for memcmp:
The memcmp() function returns zero if the two strings are identical, otherwise returns the difference between the first two differing bytes (treated as unsigned char values, so that '\200' is greater than '\0', for example).
Clear as mud, right? (Linux, POSIX, and ISO word it better.)
Monday, June 18, 12My guess
The author made an intuitive-mode assumption that the implicit cast from int to char would be safe. In all likelihood, they did not even know that they were making this assumption. Ever made this kind of programming mistake? I make them all the time, after 20 years of experience.
Monday, June 18, 12Air France 447
On May 31, 2009, this Airbus 330 was lost without trace
Not until 23 months later was the cockpit voice recorder recovered. The reconstruction of events on board the flight is haunting for what it tells us about decisions made under pressure.
Monday, June 18, 12Modern flight control systems
The A330 is a heavily automated fly-by-wire aircraft. At the time its troubles began, 447 was on autopilot, flying into a tropical squall that most aircraft in the area were carefully avoiding. An autopilot depends on a device called a pitot tube to report the plane’s air speed. Pitot tubes are vulnerable to moisture and ice build-up, so passenger planes tend to fly with three, for redundancy.
Monday, June 18, 12The ice problem
All three of 447’s pitot tubes iced over at more or less the same time. The autopilot disengaged due to loss of airspeed data. Minutes beforehand, the captain had left the flight deck for a nap, leaving his two second officers in charge. One of the second officers responded to the autopilot alarm by increasing thrust and pulling the plane’s nose up.
Monday, June 18, 12The last three minutes
With the nose up at high altitude, the plane quickly lost airspeed, stalled, and entered a rapid descent, losing 10,000 feet per minute. One second officer seemed to be confused by the plane’s behaviour and kept the nose up, even though this was precisely the wrong thing to do. He failed to tell his colleagues what he was doing, and they did not ask.
Monday, June 18, 12What do these episodes have in common?
Each was an elementary misjudgment, by an experienced individual, with severe consequences. We routinely make quick decisions in complex situations. Being under pressure magnifies the likelihood of snap misjudgments. Luckily, we often get to walk away from our mistakes more
The startup environment
As a startup founder, you’re routinely under pressure from investors and customers. You’re probably creating entirely new software that is quite unfamiliar. Your product has to ship yesterday, and impress right away. You can afford mistakes, but they’d better be few in number and cheap to rectify.
Monday, June 18, 12I built my startup’s software in Haskell
This seemingly unconventional decision was driven by one major consideration. I wanted to ship reliable code quickly. At the time, this did feel like a slightly sketchy decision.
In retrospect, it was a lot of work, but still a great idea.
Monday, June 18, 12The importance of a good type system
For me, the most important “I need to ship stuff” aspect of Haskell is its type system. The type checker tells me about many (usually simple) logical errors in my code before I can even execute it.
Monday, June 18, 12Type safety and password checking
As a trivial example, that MySQL password bug can’t happen in Haskell. Why not? Ordered comparisons return results of type Ordering
There’s no automatic promotion (casting) of values.
A more realistic example of types at work
My company’s product consisted of client (C#) and server (Haskell) components. They needed to be able to communicate, so I wanted to go down the typical REST/JSON road. One problem with the Haskell ecosystem at the time was that I didn’t much like the existing JSON libraries. So I wrote one.
Monday, June 18, 12The anatomy of a typical JSON library
The jansson library is a clean JSON library, written in C. It has a classic OO-in-C architecture. Its core public type is abstract: typedef struct { json_type type; size_t refcount; } json_t;
Monday, June 18, 12What’s json_type?
The oft-seen “represent the type via an enum in a tag” pattern.
typedef enum { JSON_OBJECT, JSON_ARRAY, JSON_STRING, JSON_INTEGER, JSON_REAL, JSON_TRUE, JSON_FALSE, JSON_NULL } json_type;
Monday, June 18, 12Inspection
How does a user tell what kind of value inhabits a JSON structure? #define json_is_object(json) \ (json && \ json_typeof(json) == JSON_OBJECT)
(Incidentally, can you spot a problem with this cpp macro?)
Monday, June 18, 12Construction
What if I need to build a value?
json_t *json_object(void); int json_object_set_new(json_t *object, const char *key, json_t *value); int json_object_del(json_t *object, const char *key); int json_object_clear(json_t *object); int json_object_update(json_t *object, json_t *other);
This is only about a quarter of the API for just the JSON_OBJECT type.
Monday, June 18, 12Using jansson
If I use this library, I’m responsible for a lot of stuff.
macros, and which are functions.
Par for the course for a C library, in other words.
Monday, June 18, 12Enter Aeson
Aeson is the Haskell JSON library I wrote. It has a simple, clean API, and it’s fast. For its speed, aeson depends on a parsing library I wrote, attoparsec.
(attoparsec also has a sweet API, but under the hood it’s a scary
How aeson manages types
type JsonObject = HashMap Text Value type JsonArray = Vector Value data Value = Object JsonObject | Array JsonArray | String Text | Number JsonNumber | Bool Bool | Null deriving (Eq, Show, Typeable)
Monday, June 18, 12What’s the difference?
In both Haskell and C, we must associate two concepts.
In C, type and payload are tied together by convention.
In Haskell, the type system ties the two together.
compile.
Continuations
Internally, attoparsec is continuation-driven. When a function f has computed a result, normally it will return it. In continuation-land, f instead takes as a parameter a function g and calls that, passing its result along. This is a simple explanation, but continuations are notoriously tricky to work with.
Monday, June 18, 12Incremental input
One of the nicer features of attoparsec is that you can run a parser on incomplete input.
read a block from disk, and don’t know if you’ve received a complete message. If attoparsec can’t complete a parse for lack of input, it returns a function. Feed that function more input, and it will resume the parse.
Monday, June 18, 12Book keeping
To support incremental input, attoparsec needs to track three pieces of data.
it couldn’t continue.
the end of all available input.
Monday, June 18, 12Parsing with continuations, the scary version
(If this makes your brain hurt, sorry!)
plus x y = Parser $ \i0 a0 m0 kf ks -> let kf' i1 a1 m1 _ _ = addS i0 a0 m0 i1 a1 m1 (\ i2 a2 m2 -> runParser y i2 a2 m2 kf ks) in noAdds i0 a0 m0 (\ i2 a2 m2 -> runParser x i2 a2 m2 kf' ks)
Don’t sweat the details: just count the number of names starting with a, i, and m.
Monday, June 18, 12Death by a thousand details
On that previous slide, the function plus is a combinator for two parsers. It executes the first, and if that fails, executes the second. Simple, but there’s a fiendish amount of stuff in flight. This is where having a static type system becomes invaluable.
Don’t cross the streams!
The names i and a have the same type (user input).
To reduce this risk, we tell the compiler to treat them as distinct, incompatible types.
newtype Input = WrapInput { unwrapInput :: ByteString } newtype Added = WrapAdded { unwrapAdded :: ByteString }
Monday, June 18, 12Parsing with continuations
Internally, every attoparsec parser uses two continuations. A function to call if the current step fails.
type Failure r = Input -> Added -> More
A function to call if the current step succeeds.
type Success a r = Input -> Added -> More -> a
The cost of this complexity ... is near zero
All of these details are hidden from users of the library.
The benefits are largely visible only to me.
A personal observation
I’ve worked a lot with event-driven (aka continuation- passing) code bases, both large and small. I find it very easy to make mistakes with hand-written event-driven code, even in a fairly small code base. These errors are particularly tricky for me to debug. In general, I avoid heavily event-driven code (e.g. non- blocking completion-based APIs).
Monday, June 18, 12Types and debate I
This response is surprisingly common: “But I tried Java/C++, and hated it!” (I’ve rarely found it useful to argue points like this. They’re more like marks of tribal affiliation than actual arguments.)
Monday, June 18, 12Types and debate II
Another common rejoinder: “With strong static types, there exist programs I cannot even express!” This is absolutely true.
even be correct.
Monday, June 18, 12How to respond?
It’s much more fun to debate a point that’s undeniably true! Type safety involves complex tradeoffs, not a smooth continuum of choices. I’m happy to trade some expressiveness for some automated assurance. And I’m usually happy with the compromises that Haskell makes.
Monday, June 18, 12Where’s this risk-related focus come from?
It would be easy to interpret this talk of risk uncharitably. “So you want to take the fun out of programming, Bryan?” Not exactly. Let me talk a little about my motivations.
Monday, June 18, 12One of my favourite things to do
Photo credit: Knut Pohl
Monday, June 18, 12And another thing
I love climbing and off-piste skiing. Each is exciting, sometimes scary, and undoubtedly risky. My plan is to grow old doing them as safely as I can, trying to minimise the number and severity
Photo credit: Drew Brayshaw
Monday, June 18, 12Unit testing
Another aspect of jansson that I like is its test suite.
static void test_insert(void) { json_t *array, *five; array = json_array(); five = json_integer(5); if(!array) fail("unable to create array"); if(!five) fail("unable to create integer"); if(!json_array_insert(array, 1, five)) fail("able to insert value out of bounds"); /* ...and much more... */ }
Monday, June 18, 12The limits of unit testing
Unit testing is well known to have limitations. To my mind, the most important is the finite attention it can demand of a test suite’s author.
fatigue sets in?” Importantly, we often can’t tell when we’re running out of steam, or when another author did.
Monday, June 18, 12Multiplying the effect of our efforts
In the jansson test suite, test_insert checks for the ability to insert three different integers into an array. It’s fair to have some confidence that this is “good enough”, but can we do better? If we generated other JSON values, and our test inserted those, we’d have a little more assurance that array insertion worked for values of all types, not just integers.
Monday, June 18, 12Text processing: the old days
For many years, Haskell was dogged by (true!) accusations that its string handling was inefficient. A string is a generic singly linked list, with each element pointing to a separately allocated character, and you can see where the problem with this method might lie. On the other hand, using a singly linked list lets us process infinitely long input in constant space, which is very attractive.
Monday, June 18, 12Text processing elsewhere
More or less every other programming language manages strings as a packed array. For strings that fit into memory, this approach is vastly more efficient to work with. Nevertheless, we Haskell programmers like to be able to stream our data.
Monday, June 18, 12Modern text processing in Haskell
The imaginatively named text library is now the preferred way to process strings in Haskell. For inputs that fit in memory, it uses packed arrays of characters. For streaming, it uses singly linked lists ... of packed arrays of characters.
Monday, June 18, 12Thinking about invariants
Elements of the singly linked list can be any length except empty. What does this restriction buy us?
Photo credit: Trey Ratcliff
Monday, June 18, 12Boundary conditions
If a chunk can be empty, we have three cases when nearing a boundary:
By requiring non-empty chunks, that last constraint vanishes, and most functions become simpler.
Monday, June 18, 12Trouble in paradise
The problem is, having a boundary condition at all makes the streaming text code vastly harder to reason about. In this kind of situation, unit tests tend to fail hard.
edge cases; worse, this activity tires us quickly.
The result? Misplaced confidence that our code is solid.
Monday, June 18, 12Enter QuickCheck
This is a wonderful testing library, originally developed by John Hughes and Koen Claessen. Its key insight is to separate the propositions we’re testing from how we create the data we’re testing them on. We hand over responsibility for generating the data to:
What does text testing look like?
Generate a random character. Pack a series of random characters into an array of random length. Test a packed-string function on this input. Break the array into non-empty chunks of random length. Test that the streaming-string function gives the same result as the packed-string function.
Monday, June 18, 12What’s so special about QuickCheck?
We’ve automated the generation of test cases that we would be unlikely to ever come up with. These tests are typically more compact, and less work to write, than comparable unit tests. And they tend to find more bugs, and nastier ones. Pretty sweet, right?
Monday, June 18, 12Some QuickCheck-revealed bugs
A quick perusal of the text library’s history reveals:
Model based random testing
The QuickCheck approach is not limited to in-memory data structures. Take something more complex and messy, like a database
In parallel, build a simple in-memory model of its behaviour. Use QuickCheck to perform random valid actions on both the real code and the model, and that the two agree.
Monday, June 18, 12Is there anything unique to Haskell about this?
Sort of... This approach has much in common with protocol fuzzers, which have been around for a long time. Haskell’s type system and syntax make testing in this style particularly succinct and pleasant. Nevertheless, QuickCheck has been ported to many languages, and even has commercial support.
Monday, June 18, 12What else helps?
New GHC feature: defer type errors until runtime.
Code coverage down to the expression level.
Run test suites in parallel (thanks to purity!).
Factors I haven’t touched on
Immutable data structures. Algebraic data types and pattern matching. A lively open source community. Excellent benchmarking and performance monitoring. Freely available books.
Monday, June 18, 12Getting started
Want to give it a try?
Free online books:
Thanks!
Monday, June 18, 12