Taking Time Seriously Bryan OSullivan Twitter: @bos31337 Monday, - - PowerPoint PPT Presentation

taking time seriously
SMART_READER_LITE
LIVE PREVIEW

Taking Time Seriously Bryan OSullivan Twitter: @bos31337 Monday, - - PowerPoint PPT Presentation

Taking Time Seriously Bryan OSullivan Twitter: @bos31337 Monday, June 18, 12 Lets talk about mental effort Our attention is a fragile thing. Humans are easily distracted. Lets illustrate. Try this, while I continue to talk:


slide-1
SLIDE 1

Taking Time Seriously

Bryan O’Sullivan

Twitter: @bos31337

Monday, June 18, 12
slide-2
SLIDE 2

Let’s talk about mental effort

Our attention is a fragile thing. Humans are easily distracted. Let’s illustrate. Try this, while I continue to talk:

  • Mentally compute the product 76 × 12
Monday, June 18, 12
slide-3
SLIDE 3

Doing math in your head

For most of us, multiplying multiple-digit numbers is an effortful task. Chances are:

  • If I asked you to perform that calculation while we were

walking together,

  • You’d need to stop for a few moments to complete it.
Monday, June 18, 12
slide-4
SLIDE 4

That bit of mental arithmetic

In case you couldn’t get there (due to my deliberate distractions):

76 × 12 = 912

Monday, June 18, 12
slide-5
SLIDE 5

By the way...

If you did the arithmetic exercise in your head... ...can you tell me what kind of creature was on the screen two slides ago? When you’re focused on one thing, it’s easy to miss something else that would normally be obvious.

Monday, June 18, 12
slide-6
SLIDE 6

The alternative to attention

We know from experience that sustained concentration is draining. Your brain greatly prefers to operate in a less demanding mode, making rapid intuitive judgments. As designers and programmers, we need to keep this intuitive mode of thinking in mind.

Monday, June 18, 12
slide-7
SLIDE 7

Imperative programming

I often hear traditional imperative programming described as “intuitive”. Let’s have the horizontal line represent its difficulty.

Monday, June 18, 12
slide-8
SLIDE 8

Functional programming

I’m just as likely to hear experienced programmers call functional programming “non-intuitive”. Again, the length of the horizontal line marks its difficulty.

Monday, June 18, 12
slide-9
SLIDE 9

Which horizontal line is longer?

Monday, June 18, 12
slide-10
SLIDE 10

They’re the same lengths!

Many of you will already know the famous optical illusion on the previous screen. It’s called the Müller-Lyer illusion. This illusion sheds a little light on how easily our intuitive judgment can go wrong.

Monday, June 18, 12
slide-11
SLIDE 11

For what it’s worth...

I divide my professional life between imperative programming (C, C++, Python) and functional programming (Haskell, Lisp). I’d be hard pressed to call either one inherently intuitive. (Or, for that matter, one more intuitive than the other.)

Monday, June 18, 12
slide-12
SLIDE 12

What has this to do with shipping code?

As programmers, we of course work with our brains all the time. We’re often aware of when we switch between cognitive

  • modes. Some examples of each mode:
  • Intuition mode: the code just comes flowing out
  • Attention mode: fixing a bug, designing a new algorithm
Monday, June 18, 12
slide-13
SLIDE 13

The perils of programming

Intuition mode usually serves us well, which is why it’s our default.

  • As we’ve seen, it can lead us to efficiently make mistakes

that we don’t notice. Attention mode is brought to bear when the going gets tough.

  • It’s slow going, often limited in power, and easily fatigued.
Monday, June 18, 12
slide-14
SLIDE 14

Intuitive-mode programming in MySQL

typedef char my_bool; my_bool check_scramble(const char *scramble_arg, const char *message, const uint8 *hash_stage2) { uint8 hash_stage2_reassured[SHA1_HASH_SIZE]; /* ... uninteresting stuff ... */ return memcmp(hash_stage2, hash_stage2_reassured, SHA1_HASH_SIZE); }

Monday, June 18, 12
slide-15
SLIDE 15

That code looks... perfectly reasonable?

And yet the line containing a call to memcmp is wrong. An attacker who can talk to a vulnerable MySQL server can authenticate as any user in ~256 attempts. Without knowing any passwords. How did this happen?

Monday, June 18, 12
slide-16
SLIDE 16

How did this come about?

The FreeBSD / Mac OS X man page for memcmp:

The memcmp() function returns zero if the two strings are identical, otherwise returns the difference between the first two differing bytes (treated as unsigned char values, so that '\200' is greater than '\0', for example).

Clear as mud, right? (Linux, POSIX, and ISO word it better.)

Monday, June 18, 12
slide-17
SLIDE 17

My guess

The author made an intuitive-mode assumption that the implicit cast from int to char would be safe. In all likelihood, they did not even know that they were making this assumption. Ever made this kind of programming mistake? I make them all the time, after 20 years of experience.

Monday, June 18, 12
slide-18
SLIDE 18

Air France 447

On May 31, 2009, this Airbus 330 was lost without trace

  • r emergency call over the mid Atlantic.

Not until 23 months later was the cockpit voice recorder recovered. The reconstruction of events on board the flight is haunting for what it tells us about decisions made under pressure.

Monday, June 18, 12
slide-19
SLIDE 19

Modern flight control systems

The A330 is a heavily automated fly-by-wire aircraft. At the time its troubles began, 447 was on autopilot, flying into a tropical squall that most aircraft in the area were carefully avoiding. An autopilot depends on a device called a pitot tube to report the plane’s air speed. Pitot tubes are vulnerable to moisture and ice build-up, so passenger planes tend to fly with three, for redundancy.

Monday, June 18, 12
slide-20
SLIDE 20

The ice problem

All three of 447’s pitot tubes iced over at more or less the same time. The autopilot disengaged due to loss of airspeed data. Minutes beforehand, the captain had left the flight deck for a nap, leaving his two second officers in charge. One of the second officers responded to the autopilot alarm by increasing thrust and pulling the plane’s nose up.

Monday, June 18, 12
slide-21
SLIDE 21

The last three minutes

With the nose up at high altitude, the plane quickly lost airspeed, stalled, and entered a rapid descent, losing 10,000 feet per minute. One second officer seemed to be confused by the plane’s behaviour and kept the nose up, even though this was precisely the wrong thing to do. He failed to tell his colleagues what he was doing, and they did not ask.

Monday, June 18, 12
slide-22
SLIDE 22

What do these episodes have in common?

Each was an elementary misjudgment, by an experienced individual, with severe consequences. We routinely make quick decisions in complex situations. Being under pressure magnifies the likelihood of snap misjudgments. Luckily, we often get to walk away from our mistakes more

  • r less unscathed.
Monday, June 18, 12
slide-23
SLIDE 23

The startup environment

As a startup founder, you’re routinely under pressure from investors and customers. You’re probably creating entirely new software that is quite unfamiliar. Your product has to ship yesterday, and impress right away. You can afford mistakes, but they’d better be few in number and cheap to rectify.

Monday, June 18, 12
slide-24
SLIDE 24

I built my startup’s software in Haskell

This seemingly unconventional decision was driven by one major consideration. I wanted to ship reliable code quickly. At the time, this did feel like a slightly sketchy decision.

  • The Haskell ecosystem was exciting, but not so mature.

In retrospect, it was a lot of work, but still a great idea.

Monday, June 18, 12
slide-25
SLIDE 25

The importance of a good type system

For me, the most important “I need to ship stuff” aspect of Haskell is its type system. The type checker tells me about many (usually simple) logical errors in my code before I can even execute it.

Monday, June 18, 12
slide-26
SLIDE 26

Type safety and password checking

As a trivial example, that MySQL password bug can’t happen in Haskell. Why not? Ordered comparisons return results of type Ordering

  • This is distinct from numeric types, unlike my_bool.

There’s no automatic promotion (casting) of values.

  • We can’t implicitly convert from Ordering to Bool.
Monday, June 18, 12
slide-27
SLIDE 27

A more realistic example of types at work

My company’s product consisted of client (C#) and server (Haskell) components. They needed to be able to communicate, so I wanted to go down the typical REST/JSON road. One problem with the Haskell ecosystem at the time was that I didn’t much like the existing JSON libraries. So I wrote one.

Monday, June 18, 12
slide-28
SLIDE 28

The anatomy of a typical JSON library

The jansson library is a clean JSON library, written in C. It has a classic OO-in-C architecture. Its core public type is abstract: typedef struct { json_type type; size_t refcount; } json_t;

Monday, June 18, 12
slide-29
SLIDE 29

What’s json_type?

The oft-seen “represent the type via an enum in a tag” pattern.

typedef enum { JSON_OBJECT, JSON_ARRAY, JSON_STRING, JSON_INTEGER, JSON_REAL, JSON_TRUE, JSON_FALSE, JSON_NULL } json_type;

Monday, June 18, 12
slide-30
SLIDE 30

Inspection

How does a user tell what kind of value inhabits a JSON structure? #define json_is_object(json) \ (json && \ json_typeof(json) == JSON_OBJECT)

(Incidentally, can you spot a problem with this cpp macro?)

Monday, June 18, 12
slide-31
SLIDE 31

Construction

What if I need to build a value?

json_t *json_object(void); int json_object_set_new(json_t *object, const char *key, json_t *value); int json_object_del(json_t *object, const char *key); int json_object_clear(json_t *object); int json_object_update(json_t *object, json_t *other);

This is only about a quarter of the API for just the JSON_OBJECT type.

Monday, June 18, 12
slide-32
SLIDE 32

Using jansson

If I use this library, I’m responsible for a lot of stuff.

  • Be careful about which API entry points are CPP

macros, and which are functions.

  • Manually check types of inputs, and results of functions.
  • Correctly follow the refcount rules.

Par for the course for a C library, in other words.

Monday, June 18, 12
slide-33
SLIDE 33

Enter Aeson

Aeson is the Haskell JSON library I wrote. It has a simple, clean API, and it’s fast. For its speed, aeson depends on a parsing library I wrote, attoparsec.

(attoparsec also has a sweet API, but under the hood it’s a scary

  • beast. We’ll get back to that!)
Monday, June 18, 12
slide-34
SLIDE 34

How aeson manages types

type JsonObject = HashMap Text Value type JsonArray = Vector Value data Value = Object JsonObject | Array JsonArray | String Text | Number JsonNumber | Bool Bool | Null deriving (Eq, Show, Typeable)

Monday, June 18, 12
slide-35
SLIDE 35

What’s the difference?

In both Haskell and C, we must associate two concepts.

  • Represent “what type of JSON data is this?”
  • Store “what’s the payload associated with this?”
Monday, June 18, 12
slide-36
SLIDE 36

In C, type and payload are tied together by convention.

  • We have to use the API correctly to get things right.

In Haskell, the type system ties the two together.

  • If we get the associations wrong, our program won’t

compile.

  • We’ve made invalid associations unrepresentable.
Monday, June 18, 12
slide-37
SLIDE 37

Continuations

Internally, attoparsec is continuation-driven. When a function f has computed a result, normally it will return it. In continuation-land, f instead takes as a parameter a function g and calls that, passing its result along. This is a simple explanation, but continuations are notoriously tricky to work with.

Monday, June 18, 12
slide-38
SLIDE 38

Incremental input

One of the nicer features of attoparsec is that you can run a parser on incomplete input.

  • Very useful if you’ve received a packet off the wire or

read a block from disk, and don’t know if you’ve received a complete message. If attoparsec can’t complete a parse for lack of input, it returns a function. Feed that function more input, and it will resume the parse.

Monday, June 18, 12
slide-39
SLIDE 39

Book keeping

To support incremental input, attoparsec needs to track three pieces of data.

  • i — the input currently remaining to be consumed.
  • a — any additional input it was fed upon reporting that

it couldn’t continue.

  • m — a marker that records whether it was told it had hit

the end of all available input.

Monday, June 18, 12
slide-40
SLIDE 40

Parsing with continuations, the scary version

(If this makes your brain hurt, sorry!)

plus x y = Parser $ \i0 a0 m0 kf ks -> let kf' i1 a1 m1 _ _ = addS i0 a0 m0 i1 a1 m1 (\ i2 a2 m2 -> runParser y i2 a2 m2 kf ks) in noAdds i0 a0 m0 (\ i2 a2 m2 -> runParser x i2 a2 m2 kf' ks)

Don’t sweat the details: just count the number of names starting with a, i, and m.

Monday, June 18, 12
slide-41
SLIDE 41

Death by a thousand details

On that previous slide, the function plus is a combinator for two parsers. It executes the first, and if that fails, executes the second. Simple, but there’s a fiendish amount of stuff in flight. This is where having a static type system becomes invaluable.

  • Even in a small code base.
Monday, June 18, 12
slide-42
SLIDE 42

Don’t cross the streams!

The names i and a have the same type (user input).

  • It’s easy to see how we could accidentally mix them up.
  • That could cause us to yield nonsensical results.

To reduce this risk, we tell the compiler to treat them as distinct, incompatible types.

newtype Input = WrapInput { unwrapInput :: ByteString } newtype Added = WrapAdded { unwrapAdded :: ByteString }

Monday, June 18, 12
slide-43
SLIDE 43

Parsing with continuations

Internally, every attoparsec parser uses two continuations. A function to call if the current step fails.

type Failure r = Input -> Added -> More

  • > [String] -> String -- error info
  • > Result r

A function to call if the current step succeeds.

type Success a r = Input -> Added -> More -> a

  • > Result r
Monday, June 18, 12
slide-44
SLIDE 44

The cost of this complexity ... is near zero

All of these details are hidden from users of the library.

  • They get a simple DSL that’s easy to learn.

The benefits are largely visible only to me.

  • Using continuations lets me write a library that’s fast.
  • Fine-grained types let me use continuations safely.
Monday, June 18, 12
slide-45
SLIDE 45

A personal observation

I’ve worked a lot with event-driven (aka continuation- passing) code bases, both large and small. I find it very easy to make mistakes with hand-written event-driven code, even in a fairly small code base. These errors are particularly tricky for me to debug. In general, I avoid heavily event-driven code (e.g. non- blocking completion-based APIs).

Monday, June 18, 12
slide-46
SLIDE 46

Types and debate I

This response is surprisingly common: “But I tried Java/C++, and hated it!” (I’ve rarely found it useful to argue points like this. They’re more like marks of tribal affiliation than actual arguments.)

Monday, June 18, 12
slide-47
SLIDE 47

Types and debate II

Another common rejoinder: “With strong static types, there exist programs I cannot even express!” This is absolutely true.

  • Also, some of these inexpressible programs are interesting.
  • Not only that, some interesting, inexpressible programs will

even be correct.

Monday, June 18, 12
slide-48
SLIDE 48

How to respond?

It’s much more fun to debate a point that’s undeniably true! Type safety involves complex tradeoffs, not a smooth continuum of choices. I’m happy to trade some expressiveness for some automated assurance. And I’m usually happy with the compromises that Haskell makes.

Monday, June 18, 12
slide-49
SLIDE 49

Where’s this risk-related focus come from?

It would be easy to interpret this talk of risk uncharitably. “So you want to take the fun out of programming, Bryan?” Not exactly. Let me talk a little about my motivations.

Monday, June 18, 12
slide-50
SLIDE 50

One of my favourite things to do

Photo credit: Knut Pohl

Monday, June 18, 12
slide-51
SLIDE 51

And another thing

I love climbing and off-piste skiing. Each is exciting, sometimes scary, and undoubtedly risky. My plan is to grow old doing them as safely as I can, trying to minimise the number and severity

  • f my mistakes.

Photo credit: Drew Brayshaw

Monday, June 18, 12
slide-52
SLIDE 52

Unit testing

Another aspect of jansson that I like is its test suite.

static void test_insert(void) { json_t *array, *five; array = json_array(); five = json_integer(5); if(!array) fail("unable to create array"); if(!five) fail("unable to create integer"); if(!json_array_insert(array, 1, five)) fail("able to insert value out of bounds"); /* ...and much more... */ }

Monday, June 18, 12
slide-53
SLIDE 53

The limits of unit testing

Unit testing is well known to have limitations. To my mind, the most important is the finite attention it can demand of a test suite’s author.

  • “How many possible problems can I cover before

fatigue sets in?” Importantly, we often can’t tell when we’re running out of steam, or when another author did.

Monday, June 18, 12
slide-54
SLIDE 54

Multiplying the effect of our efforts

In the jansson test suite, test_insert checks for the ability to insert three different integers into an array. It’s fair to have some confidence that this is “good enough”, but can we do better? If we generated other JSON values, and our test inserted those, we’d have a little more assurance that array insertion worked for values of all types, not just integers.

Monday, June 18, 12
slide-55
SLIDE 55

Text processing: the old days

For many years, Haskell was dogged by (true!) accusations that its string handling was inefficient. A string is a generic singly linked list, with each element pointing to a separately allocated character, and you can see where the problem with this method might lie. On the other hand, using a singly linked list lets us process infinitely long input in constant space, which is very attractive.

Monday, June 18, 12
slide-56
SLIDE 56

Text processing elsewhere

More or less every other programming language manages strings as a packed array. For strings that fit into memory, this approach is vastly more efficient to work with. Nevertheless, we Haskell programmers like to be able to stream our data.

Monday, June 18, 12
slide-57
SLIDE 57

Modern text processing in Haskell

The imaginatively named text library is now the preferred way to process strings in Haskell. For inputs that fit in memory, it uses packed arrays of characters. For streaming, it uses singly linked lists ... of packed arrays of characters.

Monday, June 18, 12
slide-58
SLIDE 58

Thinking about invariants

Elements of the singly linked list can be any length except empty. What does this restriction buy us?

Photo credit: Trey Ratcliff

Monday, June 18, 12
slide-59
SLIDE 59

Boundary conditions

If a chunk can be empty, we have three cases when nearing a boundary:

  • End of stream
  • Next chunk has data
  • Next chunk is empty

By requiring non-empty chunks, that last constraint vanishes, and most functions become simpler.

Monday, June 18, 12
slide-60
SLIDE 60

Trouble in paradise

The problem is, having a boundary condition at all makes the streaming text code vastly harder to reason about. In this kind of situation, unit tests tend to fail hard.

  • Humans generally aren’t good at thinking up devious

edge cases; worse, this activity tires us quickly.

  • However, we tend to think we’re doing a good job.

The result? Misplaced confidence that our code is solid.

Monday, June 18, 12
slide-61
SLIDE 61

Enter QuickCheck

This is a wonderful testing library, originally developed by John Hughes and Koen Claessen. Its key insight is to separate the propositions we’re testing from how we create the data we’re testing them on. We hand over responsibility for generating the data to:

  • The type system
  • A random number generator
Monday, June 18, 12
slide-62
SLIDE 62

What does text testing look like?

Generate a random character. Pack a series of random characters into an array of random length. Test a packed-string function on this input. Break the array into non-empty chunks of random length. Test that the streaming-string function gives the same result as the packed-string function.

Monday, June 18, 12
slide-63
SLIDE 63

What’s so special about QuickCheck?

We’ve automated the generation of test cases that we would be unlikely to ever come up with. These tests are typically more compact, and less work to write, than comparable unit tests. And they tend to find more bugs, and nastier ones. Pretty sweet, right?

Monday, June 18, 12
slide-64
SLIDE 64

Some QuickCheck-revealed bugs

A quick perusal of the text library’s history reveals:

  • Streaming data out of a buffer too late when reading lines
  • Corruption on write-then-read of a file
  • Breakage in lazy (streaming) string splitting
  • Buffer mis-sized when converting from String to Text
  • Uses of min instead of max (!)
  • Incorrect handling of end-of-input when splitting lazy string
  • Missed a small subrange of random characters to generate (yes, tests have bugs, too!)
  • Use of wrong variable when checking for end-of-string
Monday, June 18, 12
slide-65
SLIDE 65

Model based random testing

The QuickCheck approach is not limited to in-memory data structures. Take something more complex and messy, like a database

  • r cluster election protocol.

In parallel, build a simple in-memory model of its behaviour. Use QuickCheck to perform random valid actions on both the real code and the model, and that the two agree.

Monday, June 18, 12
slide-66
SLIDE 66

Is there anything unique to Haskell about this?

Sort of... This approach has much in common with protocol fuzzers, which have been around for a long time. Haskell’s type system and syntax make testing in this style particularly succinct and pleasant. Nevertheless, QuickCheck has been ported to many languages, and even has commercial support.

Monday, June 18, 12
slide-67
SLIDE 67

What else helps?

New GHC feature: defer type errors until runtime.

  • Lets us do some sanity checking part-way through writing or
  • refactoring. Gives a feel more like writing Python.

Code coverage down to the expression level.

  • Tell which code isn’t tested (and hence risky).

Run test suites in parallel (thanks to purity!).

  • Save time, or dial the number of random tests higher.
Monday, June 18, 12
slide-68
SLIDE 68

Factors I haven’t touched on

Immutable data structures. Algebraic data types and pattern matching. A lively open source community. Excellent benchmarking and performance monitoring. Freely available books.

Monday, June 18, 12
slide-69
SLIDE 69

Getting started

Want to give it a try?

  • Haskell Platform: “batteries included” binary distribution

Free online books:

  • realworldhaskell.org
  • learnyouahaskell.com
Monday, June 18, 12
slide-70
SLIDE 70

Thanks!

Monday, June 18, 12