A short history of the automobile Why Modern Programming Languages Matter Luxury Hybrid Comfort Mark P Jones, Portland State University Dodge D200 Camper (1974) Toyota Prius (1997) Ford Thunderbird (1955) Recreation Ford Model T Fins Personality Winter 2017 Cadillac Eldorado Seville (1959) Ford Model A Deluxe (1931) Volkswagen Beetle (2002) DeLorean DMC-12 (1981) Time Morris Mini (1959) Travel Compact Power Tesla Model S (2012) Ford Model T Pickup (1922) Electric Utility Ferrari 348 (1989) Volkswagen Type 2 (1949) Speed Capacity Ford Mustang Coupe (1965) 1900 1920 1940 1960 1980 2000 2020 (Images via Wikipedia, subject to Creative Commons and Public Domain licenses) 1 2 A short history of the automobile A short history of programming languages Luxury Simula Hybrid Lisp Smalltalk • Modern cars are: Comfort Dodge D200 Camper (1974) Toyota Prius (1997) Ford Thunderbird (1955) Recreation Ford Model T •More efficient •Faster Fins Personality BASIC An early systems programming •More reliable •Safer language, sometimes described Cadillac Eldorado Seville (1959) C Fortran Ford Model A Deluxe (1931) •More capable •More as “portable assembler” Volkswagen Beetle (2002) DeLorean DMC-12 (1981) comfortable •… Time COBOL Morris Mini (1959) Travel Compact • Unsurprisingly, most drivers today drive Pascal modern cars Power Tesla Model S (2012) Ford Model T Pickup (1922) Electric Utility Ferrari 348 (1989) Volkswagen Type 2 (1949) Speed Capacity Ford Mustang Coupe (1965) 1900 1920 1940 1960 1980 2000 2020 1955 1965 1975 1985 1995 2005 2015 (Images via Wikipedia, subject to Creative Commons and Public Domain licenses) 2 3
A short history of programming languages A short history of programming languages Simula Java Simula Java Lisp Clojure Lisp Clojure Smalltalk Smalltalk • Modern programming languages are: •Higher-level •Less error prone Scala Scala BASIC Haskell BASIC Haskell Rust Rust Still the most widely used Still the most widely used •Feature rich •Well-designed systems programming language, systems programming language, C C Fortran Fortran •Type safe •Well-defined 45 years later! 45 years later! JavaScript JavaScript C++ C++ Swift Swift •Memory safe •… COBOL COBOL It’s as if everyone It’s as if everyone Python C# Python C# • Surprisingly, most systems programmers is still driving a is still driving a Pascal Pascal today are still using C … Go Go Ford Model T! Ford Model T! Ada Ada F# F# PHP PHP 1955 1965 1975 1985 1995 2005 2015 1955 1965 1975 1985 1995 2005 2015 3 3 C is great … what more could you want? • Programming in C gives systems developers: • Good (usually predictable) performance characteristics Could a different language • Low-level access to hardware when needed make it impossible to write programs with errors • A familiar and well-established notation for writing like these ? imperative programs that will get the job done • What can you do in modern languages that you can’t already do with C? • Do you really need the fancy features of newer object- oriented or functional languages? • Are there any downsides to programming in C? 4 5
The Habit programming language Division • “a dialect of Haskell that is designed to meet the needs of • You can divide an integer by an integer to get an integer result high assurance systems programming” • In Habit: “has type” 1 st arg 2 nd arg result Habit = Ha skell + bit s div :: Int ⟶ Int ⟶ Int • Habit, like Haskell, is a functional programming language • This is a lie! • Correction : You can divide an integer by a non-zero • For people trained in using C, the syntax and features of integer to get an integer result Habit may be unfamiliar • In Habit: • I won’t assume familiarity with functional programming here div :: Int ⟶ NonZero Int ⟶ Int • We’ll focus on how Habit uses types to detect and • But where do NonZero Int values come from? prevent common types of programming errors 6 7 Where do NonZero values come from? Examples using NonZero values • Option 1 : Integer literals - numbers like 1 , 7 , 63 , and 128 • Calculating the average of two values: are clearly all NonZero integers ave :: Int ⟶ Int ⟶ Int a non zero literal ave n m = (n + m) `div` 2 • Option 2 : By checking at runtime • Calculating the average of a list of integers: nonzero :: Int ⟶ Maybe (NonZero Int) average :: List Int ⟶ Maybe Int average nums = case nonzero (length nums) of Values of type Maybe t are either: Just d ⟶ Just (sum nums `div` d) • Nothing • Just x for some x of type t Nothing ⟶ Nothing checked! • These are the only two ways to get a NonZero Int ! • Key point: If you forget the check, your code will not compile! • NonZero is an abstract datatype 8 9
Null pointer dereferences Pointers and reference types • In C, a value of type T* is a pointer to an object of type T • Lesson learned: we should distinguish between • References (of type Ref a ): guaranteed to point to values • But this may be a lie! of type a • A null pointer has type T* , but does NOT point to an • Pointers (of type Ptr a ): either a reference or a null object of type T • These types are not the same: Ptr a = Maybe (Ref a) • Attempting to read or write the value pointed to by a null pointer is called a “ null pointer dereference ” and often • You can only read or write values via a reference results in system crashes, vulnerabilities, or memory corruption • Code that tries to read from a pointer will fail to compile! • Described by Tony Hoare (who introduced null pointers in the ALGOL W language in 1965) as his “billion dollar mistake” • Goodbye null pointer dereferences! 10 11 Arrays and out of bounds indexes: Array bounds checking • Arrays are collections of values stored in contiguous locations • Arrays are collections of values stored in contiguous locations • Arrays are collections of values stored in contiguous locations • Arrays are collections of values stored in contiguous locations • The designers of C knew that this was a potential problem … in memory in memory in memory in memory but chose not to address it in the language design: offset i • We would need to store a length field in every array • We would need to check for valid indexes at runtime pointer to start of array a • The designers of Java knew that this was a potential problem • Address of a[i] = start address of a + i *(size of element) • Address of a[i] = start address of a + i *(size of element) • Address of a[i] = start address of a + i *(size of element) • Address of a[i] = start address of a + i *(size of element) … and chose to address it in the language design: • Simple, fast, … • Simple, fast, … and dangerous! • Simple, fast, … and dangerous! • Simple, fast, … and dangerous! • Store a length field in every array • If i is not a valid index (an “out of bounds index”), then an • If i is not a valid index (an “out of bounds index”), then an • Check for valid indexes at runtime attempt to access a[i] could lead to a system crash, memory attempt to access a[i] could lead to a system crash, memory corruption, … corruption, buffer overflows, … • Performance OR Safety … pick one ! • A common path to “arbitrary code execution” 12 13
Arrays in Habit Bit twiddling • Key idea: make array size part of the array type, do not allow • Given two 32 bit input values: arbitrary indexes: • base: index element address start address Each box is one nibble (4 bits), (@) :: Ref (Array n t) ⟶ Ix n ⟶ Ref t least significant bits on the right • limit: 0 0 0 a[i] is written array length, as guaranteed to be • Calculate a 64 bit descriptor: as a@i in Habit part of the type ≥ 0 and < n 5 3 2 • Fast, no need for a runtime check, no need for a stored length high low • Needed for the calculation of “Global Descriptor Table • Ix n is another abstract type: (GDT) entries” on the x86 maybeIx :: Int ⟶ Maybe (Ix n) modIx :: Int ⟶ Ix n incIx :: Ix n ⟶ Maybe (Ix n) 14 15 In assembly In C base limit base limit 0 0 0 0 0 0 high low mov mov %eax %ebx movl base, %eax 5 3 2 0 0 0 movl limit, %ebx shl 16 %eax low = (base << 16) // purple mov %eax, %edx mov 0 0 0 0 shl $16, %eax | (limit & 0xffff); // blue mov %bx, %ax movw %edx %eax movl %eax, low high = (base & 0xff000000) // pink low shr 16 | (limit & 0xf0000) // green %edx %ecx shr $16, %edx mov mov %edx, %ecx 0 0 0 0 and 0xf0000 0 0 0 0 | ((base >> 16) & 0xff) // yellow andl $0xff, %ecx xor and 0xff %edx %ecx xorl %ecx, %edx | 0x503200; // white 0 0 0 0 0 0 0 0 0 0 0 0 shl $16,%edx orl %ecx, %edx shl 16 %edx • Examples like this show why we use high-level languages andl $0xf0000, %ebx 0 0 0 0 0 0 orl %ebx, %edx instead of assembly! orl $0x503200, %edx %edx or %ebx movl %edx, high 0 0 0 0 0 0 0 0 0 0 0 • But let’s hope we don’t get those offsets and masks wrong … or %edx • And there is no safety net if we get the types wrong … 0 0 0 low or 0x503200 %edx high 5 3 2 5 3 2 high 16 17
Recommend
More recommend