1dl321 kompilatorteknik i compiler design 1
play

1DL321: Kompilatorteknik I (Compiler Design 1) Course home page: - PowerPoint PPT Presentation

Administrivia Lecturer: Kostis Sagonas ( kostis@it.uu.se ) 1DL321: Kompilatorteknik I (Compiler Design 1) Course home page: http://user.it.uu.se/~kostis/Teaching/KT1-12/ Introduction to Programming Assistants: Language Design


  1. Administrivia • Lecturer: – Kostis Sagonas ( kostis@it.uu.se ) 1DL321: Kompilatorteknik I (Compiler Design 1) • Course home page: http://user.it.uu.se/~kostis/Teaching/KT1-12/ Introduction to Programming • Assistants: Language Design and to Compilation – Stavros Aronis ( stavros.aronis@it.uu.se ) – Andreas Löscher ( andreas.loscher@it.uu.se ) – responsible for the lessons and the assignments Course Structure Course Literature • Course has theoretical and practical aspects • Need both in programming languages! • Written examination = theory (4 points) • first exam scheduled for 11th January 2013 • Three assignments = practice (1 point) – Electronic hand-in to the assistants before the corresponding deadline – You can submit one late assignment if you need to but it cannot be later than the deadline of the next assignment (for 1 and 2) or the exam (for 3)

  2. Academic Honesty The Compiler Project • For assignments you are allowed to work in • A follow-up course pairs (but no threesomes/foursomes/...) • that will be taught in period 3 • Don’t use work from uncited sources • and will allow you to see the material you have – Including old assignments learned in KT1 in practice • by building a complete compiler • for a small (toy?) language PLAGIARISM How are Languages Implemented? Language Implementations • Two major strategies: • Batch compilation systems dominate – Interpreters (older, less studied) – gcc – Compilers (newer, much more studied) • Some languages are primarily interpreted • Interpreters run programs “as is” – Java bytecode – Little or no preprocessing – Postscript • Some environments (e.g. Lisp) provide both • Compilers do extensive preprocessing – Interpreter for development – Compiler for production

  3. (Short) History of High-Level Languages FORTRAN I • 1953 IBM develops the 701 • 1954 IBM develops the 704 • John Backus • Till then, all programming done in assembly – Idea: translate high-level code to assembly – Many thought this impossible • Problem: Software costs exceeded hardware • Had already failed in other projects • 1954-7 FORTRAN I project costs! • By 1958, >50% of all software is in FORTRAN • John Backus: “Speedcoding” • Cut development time dramatically – An interpreter – (2 weeks → 2 hours) – Ran 10-20 times slower than hand-written assembly FORTRAN I The Structure of a Compiler • The first compiler 1. Lexical Analysis – Produced code almost as good as hand-written 2. Syntax Analysis – Huge impact on computer science 3. Semantic Analysis 4. IR Optimization • Led to an enormous body of theoretical work 5. Code Generation 6. Low-level Optimization • Modern compilers preserve the outlines of the FORTRAN I compiler The first 3 phases can be understood by analogy to how humans comprehend natural languages (e.g. Swedish or English).

  4. Lexical Analysis More Lexical Analysis • First step: recognize words. • Lexical analysis is not trivial. Consider: – Smallest unit above letters ist his ase nte nce This is a sentence. • Plus, programming languages are typically more cryptic than English: • Note the * p->f ++ = -.12345e-5 – Capital “T” (start of sentence symbol) – Blank “ ” (word separator) – Period “.” (end of sentence symbol) And More Lexical Analysis Parsing • Lexical analyzer divides program text into • Once words are understood, the next step is “words” or “tokens” to understand the sentence structure if (x == y) then z = 1; else z = 2; • Parsing = Diagramming Sentences • Units: – The diagram is a tree if, (, x, ==, y, ), then, z, =, 1, ;, else, z, =, 2, ;

  5. Diagramming a Sentence (1) Diagramming a Sentence (2) T his line is a lo ng e r se nte nc e T his line is a lo ng e r se nte nc e artic le no un ve rb artic le adje c tive no un artic le no un ve rb artic le adje c tive no un no un phrase subje c t o bje c t no un phrase ve rb phrase se nte nc e se nte nc e Parsing Programs Semantic Analysis • Parsing program expressions is the same • Once the sentence structure is understood, we can try to understand its “meaning” • Consider: – But meaning is too hard for compilers I f (x == y) the n z = 1; e lse z = 2; • Diagrammed: • Most compilers perform limited analysis to x == y z = 1 z = 2 catch inconsistencies re latio n assig nme nt assig nme nt • Some optimizing compilers do more analysis pre dic ate the n-stmt e lse -stmt to improve the performance of the program if-the n-e lse

  6. Semantic Analysis in English Semantic Analysis in Programming Languages • Example: • Programming languages define strict rules to Jack said Jerry left his assignment at home. avoid such ambiguities What does “his” refer to? Jack or Jerry? { int Jac k = 3; • This C++ code prints “4”; { • Even worse: the inner definition is Jack said Jack left his assignment at home? int Jac k = 4; used How many Jacks are there? c o ut << Jac k; } Which one left the assignment? } More Semantic Analysis Optimization • Compilers perform many semantic checks • No strong counterpart in English, but akin to besides variable bindings editing • Automatically modify programs so that they – Run faster • Example: – Use less memory/cache/power Arnold left her homework at home. – In general, conserve some resource more economically • A “type mismatch” between her and Arnold; • The compilers project has no optimization we know they are different people component – Presumably Arnold is male – for those interested, there is also the “Advanced Compiler Design (KT2)” course !

  7. Optimization Example Code Generation • Produces assembly code (usually) • A translation into another language X = Y * 0 is the same as X = 0 – Analogous to human translation NO! Valid for integers, but not for floating point numbers Intermediate Languages Intermediate Languages (Cont.) • Many compilers perform translations between • IL’s are useful because lower levels expose successive intermediate forms features hidden by higher levels – All but first and last are intermediate languages – registers internal to the compiler – memory/frame layout – Typically there is one IL – etc. • IL’s generally ordered in descending level of • But lower levels obscure high-level meaning abstraction – Highest is source – Lowest is assembly

  8. Issues Compilers Today • Compiling is almost this simple, but there are • The overall structure of almost every compiler many pitfalls adheres to our outline • Example: How are erroneous programs • The proportions have changed since FORTRAN handled? – Early: • lexical analysis, parsing most complex, expensive – Today: • Language design has big impact on compiler • semantic analysis and optimization dominate all other – Determines what is easy and hard to compile phases; lexing and parsing are well-understood and cheap – Course theme: many trade-offs in language design Current Trends in Compilation Programming Language Economics • Compilation for speed is less interesting. • Programming languages are designed to fill a void However, there are exceptions: – enable a previously difficult/impossible application – scientific programs – orthogonal to language design quality (almost) – advanced processors (Digital Signal Processors, advanced speculative architectures, GPUs) • Programming training is the dominant cost – Languages with a big user base are replaced rarely • Ideas from compilation used for improving – Popular languages become ossified code reliability: – but it is easy to start in a new niche... – memory safety – detecting data races – security properties – ...

  9. Why so many Programming Languages? Topic: Language Design • Application domains have distinctive (and • No universally accepted metrics for design sometimes conflicting) needs • Examples: • “A good language is one people use” – Scientific computing : High performance – Business : report generation • NO ! – Artificial intelligence : symbolic computation – Is COBOL the best language? – Systems programming : efficient low-level access – Other special purpose languages... • Good language design is hard Language Evaluation Criteria History of Ideas: Abstraction • Abstraction = detached from concrete details Characteristic Criteria • Necessary for building software systems Readability Writeability Reliability • Modes of abstraction: YES Simplicity YES YES – Via languages/compilers Data types YES YES YES • higher-level code; few machine dependencies Syntax design YES YES YES – Via subroutines Abstraction YES YES • abstract interface to behavior – Via modules Expressivity YES YES • export interfaces which hide implementation Type checking YES – Via abstract data types Exceptions YES • bundle data with its operations

Recommend


More recommend