8/27/2012 What is a Compiler? A compiler translates a source specification into a target specification. Traditionally, we consider compilers that take a source language and produce CS 1622: target (machine) code. However, there can be many different types of targets. Source Language Target Language Introduction to Compiler Design → C/C++ Machine code → Java Java Bytecode → Perl Perl Bytecode → Java Bytecode Machine code Jonathan Misurda jmisurda@cs.pitt.edu Compilers vs. Interpreters C Compiler Compilation – To translate a source program in one language into an executable program in another language and produce results while executing the new program • gcc Examples: C, C++, FORTRAN Object Interpretation – To read a source program and produce the results while Preprocessed C source files source Executable understanding that program • Examples: BASIC, LISP cpp cc1 ld .c .o Hybrid – Try to use both (such as in Java) Preprocessor Compiler Linker 1. Translate source code to bytecode 2. Execute by interpretation on a JVM or 2. Execute by compilation using a JIT Java Compiler Compilation Executable Java source Class files Compiler Output Source javac .java .class Compiler Data Class files JVM .class Pros: Cons: Virtual Machine • • Fast execution Complexity • • Can exploit machine Must be done before architecture features execution 1
8/27/2012 Interpreter Phases of Compilation Source Code Interpreter Output Source Lexical Analyzer Token Sequence Data Syntax Analyzer Syntax Tree Semantic Analyzer Intermediate Representation Pros: Cons: • • Machine independent Time overhead Code Optimizer • Easy to debug • Space overhead Optimized IR • Flexible to modify Code Generator Assembly/Machine Code Phases Phases Lexical Analysis C ode optimization • Recognize token – smallest stand-alone unit of meaningful information • Modify program representation so that program: • • Analyze input (strings of characters) from source Runs faster • Scan from left to right • Uses less memory • • Report errors Uses less power • In general, reduce the consumed resources Syntax Analysis • Group tokens into hierarchical groups Code generation • Differentiate if-statement, while-statement, ... • Produce target code • Report errors • Instruction selection Semantic Analysis • Memory allocation • Determine the meaning using the structure • Resource allocation — registers, processors, etc. • Checks are performed to ensure components fit together meaningfully • Limited analysis to catch inconsistencies, e.g., type checking • Put semantic meaningful items in the structure • Produce IR (easier to generate optimized machine code from IRs) Lexing Parsing Input: Source program Input: Sequence of tokens Output: Sequence of tokens Output: Abstract Syntax Tree Example: Example: ID(‘x’) > NUM(‘3’) ) { ID(‘y’) IF ( INCREMENT ; } if(x > 3) { if-statement y++; } cond_expr stmt_list ID(‘x’) > NUM(‘3’) ) { ID(‘y’) IF ( INCREMENT ; } > post-inc x 3 y 2
8/27/2012 Code Generation Data Structures for Compilation Input: Intermediate representation Abstract Syntax Tree • Output: Target code Stores the information from the parse and lexing phases • Walk the tree to produce IR or target code Example: Symbol Table slti $t1, 3, $s0 if-statement • Collect and maintain information about identifiers beq $t1, $zero, L1 • Attributes: type, address, scope, size addi $s1, $s1, 1 cond_expr • L1: Used by most compiler passes and phases stmt_list • Some phases add information: > • lexing, parsing, semantic analysis post-inc • Some phases use information: x 3 • Semantic analysis, code optimization, code generation y • Debuggers also can make use of a symbol table • gcc -g keeps a version of the symbol table in the object code Three-pass Compiler Compiler Construction Automatic Generators: IR Machine Code • Lexical Analysis — Lex, Flex, JLex, JFlex Source Code IR Front End Middle Back End • Syntax Analysis — Yacc, Bison, JavaCUP, JavaCC • Semantic Analysis • Code Optimization Error • Code Generation Passes : number of times through a program representation • 1-pass, 2-pass, multi-pass compilation • Language becomes more complex → more passes Phases : conceptual and sometimes physical stages • Symbol table coordinates information between phases • Phases are not completely separate • Semantic phase may do things that syntax phase should do • Interaction is possible 3
Recommend
More recommend