Computer Science 332. Compiler Construction. Chapter 1: Introduction to
Compiling. What is a Compiler? • A compiler is a translator: – Source language S.
What is a Compiler?
Computer Science 332
• A compiler is a translator: – – – –
Compiler Construction Chapter 1: Introduction to Compiling
Source language S Target language T Implementation language I Error messages as side-effect
S
Compiler
T
Error Messages
What is a Compiler? • S is typically higher-level (more abstract) than T. • But not always: e.g., infix postfix translator • For this course: – S = ML – T = MIPS – I = Java
Compiler versus Interpreter S
Compiler
T
Error Messages
S
Interpreter
Error Messages Run-Time Environment*
Answer
*e.g.: JRE shell SPIM
Answer
Analysis: Lexical Analysis The Analysis-Synthesis Model
“let val a = b + c * 5 in a end;”
• Analysis: convert source code into discrete, manageable “chunks” – Program String Tokens (words) – Tokens Hierarchical Data Structures (trees)
• Synthesis: Convert each chunk into a piece of target code – Trees Intermediate Code – Intermediate Code Target Code
lexical analyzer let , val , id1 , =, id2 , + , id3 , * , 5 , in , id1 , end , ;
• Break string into tokens based on whitespace separation • Replace identifier names (a, b,c) with indices (id1, id2 , id3 ) using a symbol table: 1 a • Table entries eventually contain info 2 b on type, value, etc., of identifiers 3 c ...
Analysis: Syntactic Analysis
Analysis: Semantic Analysis let
let , val , id1 , =, id2 , + , id3 , * , 5 , in , id1 , end , ; id1
syntax analyzer
id1
+ id2
let:int
*
let id1
id1
+ id2
semantic analyzer
id3
5
id1 :int
+:int*int id2:int
* id3
int
id1:int
*:int*int int
id3:int
5:int
5
• Tree is abstract: discards redundant keywords like val, in • Note similarity to Scheme: (let ((id1 (+ id2 (* id3 5)))) id1) • Trees allow us to generate code in a more general way.
• Here we have done type inference, but other kinds of semantic analysis are possible: – Detect reference to undeclared identifier – Detect real number used as array index
Synthesis: Intermediate Code Generation let:int id1 :int
+:int*int int id2:int
*:int*int
intermediate code generator
id1:int
int temp1 := 5 temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3
id3:int 5:int
• Three-Address Code: “Portable” assembly-like language – Every memory location can act like a register – At most three operands per instruction
Synthesis: Code Generation temp1 := id3 * 5 id1 := id2 + temp1 lw li mul sw lw lw addu sw
code generator $t1, $t2, $t3, $t3, $t1, $t2, $t3, $t3,
36($sp) 5 $t1, $t2 40($sp) 32($sp) 40($sp) $t1, $t2 28($sp)
# id3 # temp1 # id2 # temp1 # id1
• Presents further opportunities for optimiztion – e.g., peephole optimization : “peep” at more than one intermediate instruction at a time, to produce fewer assembly instructions
Synthesis: Code Optimization temp1 := 5 temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3
code optimizer
temp1 := id3 * 5 id1 := id2 + temp1
• Can often eliminate variables used only for transmitting values between successive lines (temp2) • Lots of related techniques (Ch. 10) : common subexpression elimination, copy propagation, dead-code elimination, constant folding
Compiler Intro: Grouping of Phases • Front and Back Ends – Front End : Phases dependent on source language (roughly, analysis phases) – Back End: Phases dependent on target language (roughly, synthesis phases) • Passes – One pass = read input file, write output file – Each pass may consist of several phases – Fewer passes more memory required
Compiler Intro: Tools • We don't want to have to write a new scanner (lexical analyzer) and parser (syntactic analyzer) for every new language. • Instead, we'd like to specify the rules for building tokens and expressions, and have the scanner and parser code generated automatically. • lex/yacc – traditional C-based tools for scanner/parser generation • We'll use jlex/jcup, for Java.