Delimiters in various languages C & Scheme Spaces, tab, newlines, carriage returns, form feeds are examples of whitespaces Java Literal newline characters may not occur inside a literal string (must use \n ). Otherwise, similar to C & Scheme. Python Leading tabs are not whitespaces because they have a clear syntactic function: They denote nesting level. Compiler Construction 26 / 177 Mayer Goldberg \ Ben-Gurion University
Concrete vs Abstract syntax Artifacts of the Concrete Syntax structring mechanisms (e.g., begin...end ) function, and notice what’s gone! Compiler Construction 27 / 177 ▶ Delimiters & whitespaces ▶ Parentheses, brackets, braces, and other grouping, nesting, and ☞ Re-examine the concrete and abstract syntax for the factorial Mayer Goldberg \ Ben-Gurion University
Concrete vs Abstract syntax ( continued ) The concrete syntax (define fact (lambda (n) (if (zero? n) 1 (* n (fact (- n1)))))) The abstract syntax Compiler Construction 28 / 177 Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) Basic concepts Compiler Construction 29 / 177 🗹 Concrete syntax 🗹 Abstract syntax 🗹 Abstract Syntax-Tree (AST) 🗹 Token 🗹 Delimiter 🗹 Whitespace Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) Question Which of the following statements is correct? Compiler Construction 30 / 177 👏 Every token becomes a vertex in the AST 👏 Every AST is a binary tree 👏 ASTs can contain cycles 👏 Comments are a part of the abstract syntax 👎 ASTs contain type tags Mayer Goldberg \ Ben-Gurion University
More on parsing To parse computer programs in a given language, we rely on: etc) BNF, EBNF, etc.) Parser generator: Takes a description of the grammar for a language L, and generates a parser for L. For example, yacc , bison , nearly , etc. Compiler Construction 31 / 177 ▶ Grammars with which to express the syntax of the language ▶ There are difgerent kinds of grammars (CFG, CSG, two-level, ▶ There are difgerent languages for expressing the grammar (e.g., ▶ Algorithms for parsing programs as per kind of grammar ▶ Techniques (e.g., parsing combinators, DCGs) Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) parser can avoid re-identifying and re-building complex tokens Compiler Construction Scanning such as numbers, strings, etc 32 / 177 characters numbers, strings, etc. ▶ Going from characters to tokens ▶ Identifying & grouping characters into tokens for words, ▶ Parsing over tokens is more effjcient than parsing over ☞ As the parser examines various ways to parse the code, the Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) for the code Compiler Construction Reading itself. capabilities of refmection, i.e., code examining and working with 33 / 177 ▶ In LISP/Prolog, the parser is split into two components: ▶ The reader, or the parser for the data language ▶ The tag-parser, or the parser for the source code ▶ In LISP/Scheme/Racket/Clojure/etc, the abstract syntax for the data is the concrete syntax for the code ▶ In Prolog, the abstract syntax for the data is the abstract syntax ▶ Prolog is the programming language with the most powerful Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) data Compiler Construction Reading — Summary more on this, later) 34 / 177 the syntax of data, things are a bit more complex: of characters part of the syntax of data, concrete syntax is given as a stream ▶ In programming languages in which the syntax of code is not a ▶ In programming languages in which the syntax of code is part of ▶ The concrete syntax of data is a stream of characters ▶ The concrete language of code is the abstract syntax of the ▶ In Scheme, the language of data is called S-expressions (sexprs, Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) for] expressions Compiler Construction Tag-Parsing valid sexpr 35 / 177 ▶ The tag-parser takes sexprs and returns [ASTs for] exprs ▶ Languages other than from the LISP & Prolog families do not split parsing into a reader & tag-parser ▶ In such languages, parsing goes directly from tokens to [ASTs ☞ Every valid program “used to be” [i.e., before tag-parsing] a ☞ Not every valid sexpr is a valid program! Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) Question A parser should: array-index errors, etc.) specifjcation Compiler Construction 36 / 177 👏 Perform optimizations 👏 Evaluate expressions 👏 Raise type-mismatch errors 👏 Find potential runtime errors (null-pointer dereferences, 👎 Validate the structure of input programs against a syntactic Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) Question Using an AST, it is impossible to: input program (code generation) Compiler Construction 37 / 177 👏 Perform code reformatting/beautifjcation/style-checking 👏 Perform optimizations 👏 Output a new program which is semantically equivalent to the 👏 Refactor the input program 👎 Generate a list of all the comments in the code Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) Semantic Analysis Compiler Construction 38 / 177 ▶ Annotate the ASTs ▶ Compute addresses ▶ Annotate tail-calls ▶ Type-check code ▶ Perform optimizations Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University
The pipeline of the compiler ( continued ) Code Generation Compiler Construction 39 / 177 ▶ Generate a stream of instructions in ▶ assembly language ▶ machine language ▶ Build executable ▶ some other target language… ▶ Perform low-level optimizations Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University
The compiler for the course Our compiler project What our project shall lack Compiler Construction 40 / 177 ▶ Written in ocaml ▶ Supports a subset of Scheme + extensions ▶ Supports two, simple optimizations ▶ Compiles to x86/64 ▶ Runs on linux ▶ Support for the full language of Scheme ▶ Support for garbage collection ▶ The ability to compile itself Mayer Goldberg \ Ben-Gurion University
S-expressions Scheme Python, and many other languages and there’s a tricky relationship between the two. about data Compiler Construction 41 / 177 ▶ We’re going to learn about syntax by studying the syntax of ▶ After all, we’re writing a Scheme compiler… ▶ It’s relatively simple, compared to the syntax of C, Java, ▶ It comes with some interesting twists ▶ Scheme comes with two languages: ▶ A language for code ▶ A language for data ▶ The key to understanding the syntax of Scheme, is to think Mayer Goldberg \ Ben-Gurion University
The Language of Data What is a language of data? — A language in which to Compiler Construction 42 / 177 ▶ Describe arbitrarily-complex data ▶ Possibly multi-dimensional, deeply nested ▶ Polymorphic ▶ Possibly circular ▶ Access components easily and effjciently Mayer Goldberg \ Ben-Gurion University
The Language of Data ( continued ) Today many languages of data are known: Compiler Construction 43 / 177 ▶ S-expressions (the fjrst: 1959) ▶ Functors (1972) ▶ Datalog (1977) ▶ SGML (1986) ▶ MS DDE (1987) ▶ CORBA (1991) ▶ MS COM (1993) ▶ MS DCOM (1996) ▶ XML (1996) ▶ JSON (2001) Mayer Goldberg \ Ben-Gurion University
The Language of Data ( continued ) What makes S-expressions and Functors unique? languages Scheme & Racket of data Compiler Construction 44 / 177 ▶ They’re the fjrst… 😊 ▶ They’re supported natively, as part of specifjc programming ▶ S-expressions are supported by LISP-based languages, including ▶ Functors are supported by Prolog-based languages ☞ The language of programming is a [strict] subset of the language Mayer Goldberg \ Ben-Gurion University
The Language of Data ( continued ) Think for a moment about the language of XML: Compiler Construction This would be cumbersome, and weird! </package> </class> </method> ... <method name="goo"> <class name="Foo"> <package name="Foo"> libraries <something>...</something> , etc 45 / 177 ▶ It’s not supported natively by any programming language ▶ Most modern languages (Java, Python, etc) support it via ▶ No programming language has XML for its concrete syntax: Mayer Goldberg \ Ben-Gurion University
The Language of Data ( continued ) However, if some programming language both Then a parser for XML could also read programs written in that language: have been much simpler! Compiler Construction 46 / 177 ▶ Supported XML as its data language ▶ Were itself written in XML ▶ Writing interpreters, compilers, and other language-tools would ▶ Refmection (code examining code) would be simple Mayer Goldberg \ Ben-Gurion University
The Language of Data ( continued ) This is the case with S-expressions: Compiler Construction and data booleans , the empty list , etc. 47 / 177 much simpler than in other languages Scheme ▶ They are the data language for LISP-based languages, including ▶ LISP-based languages are written using S-expressions ▶ Writing interpreters and compilers in LISP-based languages is ▶ Computational refmection was invented in LISP! ▶ This is the real reason behind all these parentheses in Scheme: ▶ A very simple language ▶ Supports core types: pairs , vectors , symbols , strings , numbers , ▶ A syntactic compromise that is great for expressing both code Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) concerned itself with numbers Compiler Construction usually with arrays of characters and/or integers… working with non-numeric data types was diffjcult Back to S-expressions 48 / 177 expressions ▶ S-expressions were invented along with LISP, in 1959 ▶ S-expressions stand for Symbolic Expressions ▶ The term is intended to distinguish itself from numerical ▶ Before LISP (and long after it was invented), most computation ▶ Computers languages were great at “crunching numbers”, but ▶ String libraries were non-standard and uncommon ▶ Polymorphic data was unheard of ▶ Nested data structured needed to be implemented from scratch, Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) booleans and…) Compiler Construction Back to S-expressions 49 / 177 Then S-expressions were invented as part of a very dynamic programming language (LISP): ▶ Working with data structures became considerably simpler ▶ Trivially allocated (no pointer-arithmetic) ▶ Polymorphic (lists of lists of numbers and strings and vectors of ▶ Easy to access sub-structures (no pointer arithmetic) ▶ Easy to modify (in an easy-going, functional style) ▶ Easy to examine (they’re just made up of primitive types) ▶ Easy to redefjne ▶ Automatically deallocated ( garbage collection ) ▶ Treating code as data became considerably simpler Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Several fjelds were invented using LISP and its tools: Mathematica ) Compiler Construction 50 / 177 ▶ Symbolic Mathematics ( Macsyma , a precursor to Wolfram ▶ Artifjcial Intelligence ▶ Computer adventure-game generation-languages (MDL, ZIL) Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Defjnition: S-expressions The language is made up of Compiler Construction 51 / 177 ▶ The empty list: () ▶ Booleans: #f , #t ▶ Characters: #\a , #\Z , #\space , #\return , #\x05d0 , etc ▶ Strings: "abc" , "Hello\nWorld\t\x05d0;hi!" , etc ▶ Numbers: -23 , #x41 , 2/3 , 2-3i , 2.34 , -2.34+3.5i ▶ Symbols: abc , lambda , define , fact , list->string ▶ Pairs: (a . b) , (a b c) , (a (2 . #f) "moshe") ▶ Vectors: #() , #(a b ((1 . 2) #f) "moshe") Traditionally, non-pairs are known as atoms. Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Proper & improper lists cdr . For all x , y : Compiler Construction 52 / 177 ▶ The name LISP comes from LISt Processing. ▶ In fact, LISP has no direct support for lists. ▶ LISP has ordered pairs ▶ Ordered pairs are created using cons ▶ The fjrst and second projections over ordered pairs are car and ▶ (car (cons x y)) ≡ x ▶ (cdr (cons x y)) ≡ y ▶ The ordered pair of x and y can be written as (x . y) Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) The dot rules Two rules govern how ordered pairs are printed: which looks like a list of 1 element. printed as (E1 E2 — ) Compiler Construction 53 / 177 ▶ Rule 1: For any E , the ordered pair (E . ()) is printed as (E) , ▶ Rule 2: For any E1 , E2 , …, the ordered pair (E1 . (E2 — )) is ▶ These rules just efgect how pairs are printed ▶ These rules give us a canonical representation for pairs Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Example Compiler Construction 54 / 177 ▶ The pair (a . (b . c)) is printed as (a b . c) PAIR CAR CDR SYMBOL PAIR a CAR CDR SYMBOL SYMBOL b c Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Example Compiler Construction 55 / 177 printed as ((a b) (c d)) ▶ The pair ((a . (b . ())) . ((c . (d . ())))) is PAIR CAR CDR PAIR PAIR CAR CDR CAR CDR SYMBOL PAIR PAIR NIL a CAR CDR CAR CDR SYMBOL SYMBOL PAIR NIL b c CAR CDR SYMBOL NIL d Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) improper lists. > (length '(a b . c)) Exception in length: (a b . c) is not a proper list Type (debug) to enter the debugger. Compiler Construction 56 / 177 ▶ Lists in Scheme can come in two forms, proper lists and ▶ When we just speak of lists, we usually mean proper lists. ▶ Most of the list processing functions ( length , map , etc) take only proper lists: Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Proper lists is the empty list (aka nil ) predicate pair? pairs, until it reaches their rightmost cdr . This is done by means of the builtin predicate list? Compiler Construction 57 / 177 ▶ Proper lists are nested ordered pairs the rightmost cdr of which ▶ Testings for pairs is cheap, and is done by means of the builtin ▶ Testing for lists is expensive, since it traverses nested, ordered Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Proper lists Here’s a defjnition for list? : ( define list ? ( lambda (e) ( or ( null ? e) ( and (pair? e) ( list ? (cdr e)))))) Compiler Construction 58 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Improper lists work over improper lists could be written as follows: ( define improper-list? ( lambda (e) ( and (pair? e) ( not ( list ? (cdr e)))))) Compiler Construction 59 / 177 ▶ Pairs that are not proper lists are improper lists. ▶ Improper lists end with a rightmost cdr that is not nil ▶ List-processing procedures such as length , map , etc., do not ▶ There is no builtin procedure for testing improper lists, but it Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Self-evaluating forms Booleans, numbers, characters, strings are self-evaluating forms. You can evaluate them directly at the prompt: > 123 123 > "abc" "abc" > #t #t > #\m #\m Compiler Construction 60 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions ( continued ) Other forms The empty list, pairs, and vectors cannot be evaluated directly at the prompt: prompt generates a run-time error. > (a b c) Exception: variable b is not bound Type (debug) to enter the debugger. Compiler Construction 61 / 177 ▶ Entering an empty list or a vector or an improper list at the ▶ Entering a symbol at the prompt causes Scheme to attempt to evaluate a variable by the same name ▶ Entering a proper list, that is not the empty list, at the prompt causes Scheme to attempt to evaluate an application: Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends the second Compiler Construction when you type 'abc at the Scheme prompt, you get back abc the variable abc To evaluate S-expressions that are not self-evaluating, we must use 62 / 177 the form quote : ▶ The special form quote can be written in two ways: ▶ '<sexpr> ▶ (quote <sexpr>) Both forms are equivalent, but Scheme will convert the fjrst into ▶ When you type abc at the Scheme prompt, you’re evaluating ▶ When you type 'abc at the Scheme prompt, you’re evaluating the literal symbol abc ▶ The value of the literal symbol abc is just itself, which is why Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends application with no function and no arguments! This is a syntax-error! literal empty list when you type '() at the Scheme prompt, you get back () Compiler Construction 63 / 177 ▶ When you type () at the Scheme prompt, you’re evaluating an ▶ When you type '() at the Scheme prompt, you’re evaluating a ▶ The value of the literal empty list is just itself, which is why Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends redundant: Compiler Construction 5 > (+ '2 '3) 2 > '2 64 / 177 back (a b c) . why when you type '(a b c) at the Scheme prompt, you get evaluating the literal list (a b c) and c , which are variables evaluating the application of the procedure a to the arguments b ▶ When you type (a b c) at the Scheme prompt, you’re ▶ When you type '(a b c) at the Scheme prompt, you’re ▶ The value of the literal list (a b c) is just (a b c) , which is ▶ Quoting a self-evaluating S-expression is possible, and Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends So what does quote do? syntactic function of braces { ... } in C in defjning literal data: const int A[] = {4, 9, 6, 3, 5, 1}; Compiler Construction 65 / 177 ▶ The quote form does nothing ▶ It is not a procedure ▶ It doesn’t take an argument ▶ It delimits a constant, literal S-expressions ▶ The syntactic function of quote in Scheme is the same as the Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Meet quasiquote ways: the second Compiler Construction 66 / 177 ▶ Simlarly to quote , the form quasiquote can be written in two ▶ `<sexpr> ▶ (quasiquote <sexpr>) Both forms are equivalent, but Scheme will convert the fjrst into ▶ quasiquote is also used to defjne data: ▶ `abc is the same as 'abc ▶ `(a b c) is the same as '(a b c) ▶ But quasiquote has two neat tricks! Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends into the second Compiler Construction quasiquote -expressions, to mix in dynamic and static data into the second Both forms are equivalent, but Scheme will convert the fjrst Meet quasiquote 67 / 177 Both forms are equivalent, but Scheme will convert the fjrst quasiquote -expression: ▶ The following two forms may occur within a ▶ The unquote form: ▶ ,<sexpr> ▶ (unquote <sexpr>) ▶ The unquote-splicing form: ▶ ,@<sexpr> ▶ (unquote-splicing <sexpr>) ▶ Both unquote & unquote-splicing are used within Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends (a 6 b) Compiler Construction (a x y z w b) > `(a ,@(append '(x y) '(z w)) b) (a (x y z w) b) > `(a ,(append '(x y) '(z w)) b) > `(a ,(+ 1 2 3) b) Meet quasiquote (a (+ 1 2 3) b) > `(a (+ 1 2 3) b) (a ,(+ 1 2 3) b) > '(a ,(+ 1 2 3) b) (a (+ 1 2 3) b) > '(a (+ 1 2 3) b) 68 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Meet quasiquote equivalent to (cons 'a (cons (append '(x y) '(z w)) '(b))) equivalent to (cons 'a (append (append '(x y) '(z w)) '(b))) Compiler Construction 69 / 177 ▶ The expression `(a ,(append '(x y) '(z w)) b) is ▶ The expression `(a ,@(append '(x y) '(z w)) b) is ▶ The difgerence between unquote & unquote-splicing is that ▶ unquote mixes in an expression using cons ▶ unquote-splicing mixes in an expression using append Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Meet quasiquote known as the quasiquote mechanism or the backquote mechanism template, that is, by specifying the shape of the data immediately into convenient ways to create code applications within programming languages shows us a computation… Compiler Construction 70 / 177 ▶ Together, quasiquote , unquote , & unquote-splicing are ▶ The quasiquote mechanism allows us to create data by ▶ In Scheme, convenient ways to create data translate ▶ Therefore we expect the quasiquote mechanism to have useful ▶ We can turn code that computes something into code that Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Consider the familiar factorial function: ( define fact ( lambda (n) ( if (zero? n) 1 (* n (fact (- n 1)))))) Compiler Construction 71 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends We use the quasiquote mechanism to convert the application (* n ( define fact ( lambda (n) ( if (zero? n) 1 `(* ,n ,(fact (- n 1)))))) Running (fact 5) now gives: > (fact 5) (* 5 (* 4 (* 3 (* 2 (* 1 1))))) Compiler Construction 72 / 177 (fact (- n 1))) into code that describes what factorial does: As you can see, factorial now prints a trace of the computation. Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends We are now going to use the quasiquote mechanism to get Scheme to teach us about the structure of S-expressions. Consider the following code: ( define foo ( lambda (e) ( cond ((pair? e) ( cons (foo (car e)) (foo (cdr e)))) (( or ( null ? e) (symbol? e)) e) ( else e)))) What does this program do? Compiler Construction 73 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Let’s call foo with some arguments: > (foo 'a) a > (foo 123) 123 > (foo '()) () > (foo '(a b c)) (a b c) Compiler Construction 74 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends ( else e)))) Compiler Construction the pair removed the 2nd] Looking over the code again we notice that: (( or ( null ? e) (symbol? e)) e) (foo (cdr e)))) ( cons (foo (car e)) ( cond ((pair? e) ( lambda (e) ( define foo 75 / 177 ▶ The 2nd and 3rd ribs of the cond overlap [we could have ▶ All atoms are left unchanged ▶ All pairs are duplicated, while recursing over the car and cdr of So foo does nothing, though it does it recursively! ☺ Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends We now use the quasiquote mechanism to cause foo to generate a trace: ( define foo ( lambda (e) ( cond ((pair? e) `( cons ,(foo (car e)) ,(foo (cdr e)))) (( or ( null ? e) (symbol? e)) `',e) ( else e)))) Compiler Construction 76 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends 123 Compiler Construction (cons (cons 'c (cons 'd '())) '())) (cons 'a (cons 'b '())) (cons > (foo '((a b) (c d))) > (foo 123) Running foo now gives us some interesting data: (cons 'a (cons 1 (cons 'b (cons 2 '())))) > (foo '(a 1 b 2)) (cons 'a (cons 'b (cons 'c '()))) > (foo '(a b c)) 'a > (foo 'a) 77 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends S-expressions are created using the most basic API Let’s rewrite foo … Compiler Construction 78 / 177 ▶ Using the quasiquote mechanism , we got foo to describe how ▶ We should really add support for proper lists and vectors! ▶ In fact, the name describe is far more appropriate than foo … Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends `( vector Compiler Construction ( else e)))) (( or ( null ? e) (symbol? e)) `',e) (vector-> list e)))) ,@( map describe (( vector ? e) ( define describe ,( describe (cdr e)))) `( cons ,( describe (car e)) ((pair? e) `( list ,@( map describe e))) ( cond (( list ? e) ( lambda (e) 79 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Running describe on various S-expressions is very instructive: > (describe '(a b c)) (list 'a 'b 'c) > (describe '#(a b c)) (vector 'a 'b 'c) > (describe '(a b . c)) (cons 'a (cons 'b 'c)) > (describe ''a) (list 'quote 'a) Wait! What’s with the last example?! Compiler Construction 80 / 177 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Recall what we said about quote , quasiquote , unquote , & unquote-splicing : Now we get to see this happen… Compiler Construction 81 / 177 ▶ '<sexpr> ≡ (quote <sexpr>) ▶ `<sexpr> ≡ (quasiquote <sexpr>) ▶ ,<sexpr> ≡ (unquote <sexpr>) ▶ ,@<sexpr> ≡ (unquote-splicing <sexpr>) Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Now we get to see this happen: > (describe ''<sexpr>) (list 'quote '<sexpr>) > (describe '`<sexpr>) (list 'quasiquote '<sexpr>) > (describe ',<sexpr>) (list 'unquote '<sexpr>) > (describe ',@<sexpr>) (list 'unquote-splicing '<sexpr>) Rule: Every Scheme expression used to be an S-expression when it Compiler Construction 82 / 177 was little! 👷 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Question What is (length '''''''''''''''''moshe) ? Compiler Construction 83 / 177 👏 17 👏 16 👏 Generates an error message! 👏 1 👎 2 Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends Explanation (length '''''''''''''''''moshe) is the same as (length '(quote <something>)) , where <something> is '''''''''''''''moshe , but that really doesn’t matter! We are still computing the length of a list of size 2: Compiler Construction 84 / 177 ▶ The fjrst element of the list is the symbol quote ▶ The second element of the list is '''''''''''''''moshe Mayer Goldberg \ Ben-Gurion University
S-expressions: quote & friends ( continued ) Question The structure of the S-expression ''a in Scheme is: ())) Compiler Construction 85 / 177 👏 Just the symbol a 👏 The proper list (quote . (a . ())) 👏 The proper list (quote . (quote . (a . ()))) 👏 An invalid S-expression 👎 The nested proper list (quote . ((quote . (a . ())) . Mayer Goldberg \ Ben-Gurion University
Tag-Parsing ( continued ) of size 4 Compiler Construction if -expression 'non-zero) is a valid valid if -expression As code of size 4 'non-zero) is also a list valid sexpr As data (S-expressions) LISP (including Scheme): 86 / 177 ▶ In a previous slide, we made the claims that in all descendants of ☞ Every valid program “used to be” [i.e., before tag-parsing] a ☞ Not every valid sexpr is a valid program! ▶ We can now show you some examples ▶ (if if if if) is not a ▶ (if if if if) is a list ▶ (if (zero? n) 'zero ▶ (if (zero? n) 'zero Mayer Goldberg \ Ben-Gurion University
Further reading a compiler, pages 4–11 Computation by Machine, Part I (by John McCarthy, 1960) Compiler Construction 87 / 177 🕯 The Dragon Book (2nd edition): Chapter 1.2 - The structure of 🔘 Recursive Functions of Symbolic Expressionsand Their Mayer Goldberg \ Ben-Gurion University
Chapter 2 Goals Agenda Compiler Construction 88 / 177 🗹 The pipeline of the compiler 🗹 Introduction to syntactic analysis ☞ Further steps in ocaml ☞ Ocaml ▶ Types ▶ References ▶ Modules & signatures ▶ Functional programming in ocaml Mayer Goldberg \ Ben-Gurion University
Introduction to ocaml (2) Still need to cover To program in ocaml efgectively in this course , we still need to learn some additional topics: What we shan’t cover Object Orientation: Once you’re comfortable with the ocaml, you might like to pick up the object-oriented layer. As object-orientation goes, you should fjnd it to be sophisticated and expressive. Compiler Construction 89 / 177 ▶ Defjning new data types ▶ Assignments, side-efgects, Mayer Goldberg \ Ben-Gurion University
Types New types are defjned using the type statement: type fraction = {numerator : int; denominator : int};; consisting of two fjelds: numerator & denominator , both of type int . Compiler Construction 90 / 177 The above statement defjnes a new type fraction as a record Mayer Goldberg \ Ben-Gurion University
Types ( continued ) Once fraction has been defjned, the underlying system recognizes it for all records with these fjelds & types: # {numerator = 2; denominator = 3};; - : fraction = {numerator = 2; denominator = 3} # {denominator = 3; numerator = 2};; - : fraction = {numerator = 2; denominator = 3} the fjelds are accessed through their names, which are converted Compiler Construction 91 / 177 Notice that the order of the fjelds in a record is immaterial, because consistently into ofgsets. Mayer Goldberg \ Ben-Gurion University
Types ( continued ) And of course: Compiler Construction - : fraction = {numerator = 22; denominator = 15} {numerator = 4; denominator = 5};; {numerator = 2; denominator = 3} # add_fractions denominator = d1 * d2};; The type-inference engine in ocaml will correctly infer newly-defjned {numerator = n1 * d2 + n2 * d1; {numerator = n2; denominator = d2} -> | {numerator = n1; denominator = d1}, match f1, f2 with let add_fractions f1 f2 = types: 92 / 177 Mayer Goldberg \ Ben-Gurion University
Types ( continued ) We can defjne disjoint types as follows: type number = | Int of int | Frac of fraction | Float of float;; Compiler Construction 93 / 177 Think of the | as disjunction. The initial | is optional in ocaml. Mayer Goldberg \ Ben-Gurion University
Types ( continued ) We can now defjne a list of numbers as follows: # [Int 3; Frac {numerator = 3; denominator = 4}; Float (4.0 *. atan(1.0))];; - : number list = [Int 3; Frac {numerator = 3; denominator = 4}; Float 3.14159265358979312] Notice that ocaml had no trouble identifying each of the three elements of the list as belonging to type number . Compiler Construction 94 / 177 Mayer Goldberg \ Ben-Gurion University
Types ( continued ) Working with disjoint types Use match to dispatch over the corresponding type constructor, and make sure you handle each and every possibility! let number_to_string x = match x with | Int n -> Format.sprintf "%d" n | Frac {numerator = num; denominator = den} -> Format.sprintf "%d/%d" num den | Float x -> Format.sprintf "%f" x;; Compiler Construction 95 / 177 Mayer Goldberg \ Ben-Gurion University
Types ( continued ) Working with disjoint types ( continued ) And here’s how it looks: # number_to_string (Int 234);; - : string = "234" # number_to_string (Frac {numerator = 2; denominator = 5});; - : string = "2/5" # number_to_string (Float 234.234);; - : string = "234.234000" Compiler Construction 96 / 177 Mayer Goldberg \ Ben-Gurion University
References Let us take another look at the record-type. Recall the defjnition of fraction : # type fraction = {numerator : int; denominator : int};; type fraction = { numerator : int; denominator : int; } In the function add_fractions we used pattern-matching to access the record-fjelds. Compiler Construction 97 / 177 Mayer Goldberg \ Ben-Gurion University
References ( continued ) Ocaml lets you access fjelds directing, using the dot-notation that is # {numerator = 3; denominator = 5}.numerator;; - : int = 3 # {numerator = 3; denominator = 5}.denominator;; - : int = 5 Compiler Construction 98 / 177 familiar from object-oriented programming: Mayer Goldberg \ Ben-Gurion University
References ( continued ) Ocaml ofgers a special record-type known as a reference. # {contents = 1234};; - : int ref = {contents = 1234} # {contents = 1234}.contents;; - : int = 1234 # ! {contents = 1234};; - : int = 1234 Compiler Construction 99 / 177 ▶ References are derived types. For any type α , we can have a type α ref . ▶ References are records with a single fjeld contents ▶ References have a special syntax ! to dereference the fjeld: Mayer Goldberg \ Ben-Gurion University
References ( continued ) - : unit = () Compiler Construction - : int = 4567 # !x;; - : int ref = {contents = 4567} # x;; # x := 4567;; - : int = 1234 # !x;; - : int ref = {contents = 1234} # x;; val x : int ref = {contents = 1234} # let x = ref 1234;; 100 / 177 ▶ References have a special syntax := for assignment ▶ This is how assignments are managed in ocaml Mayer Goldberg \ Ben-Gurion University
Recommend
More recommend