Compiler Construction Chapter 1: Introduction

35 downloads 8173 Views 251KB Size Report
Compiler Stages. January, 2010. Chapter 1: Introduction. 3. Scanner. Parser. Semantic. Analyzer. Source Code. Optimizer. Code. Generator. Target Code.
Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and Dr. Scherger

Terminology       

  

Compiler Interpreter Translator Assembler Linker Loader Preprocessor Editor Debugger Profiler 2



    

  



Source Language Target Language Target Platform Relocatable Macro substitution IDE Cross Compiler Dissambler Front End Back End

Chapter 1: Introduction

January, 2010

Source Code

Analysys

Compiler Stages

Scanner Tokens Parser Syntax Tree Semantic Analyzer Annotated Tree Source Code Optimizer

Synthesis

Literal Table Symbol Table Error Handler

Intermediate Code Code Generator Target Code Target Code Optimizer

3

Target Code

Chapter 1: Introduction

January, 2010

Files Used by Compilers 

A source code text file (.c, .cpp, .java, etc. file extensions).



Intermediate code files: transformations of source code during compilation, usually kept in temporary files rarely seen by the user.



An assembly code text file containing symbolic machine code, often produced as the output of a compiler (.asm, .s file extensions).

4

Chapter 1: Introduction

January, 2010

Files Used by Compilers (cont.) 

One or more binary object code files: machine instructions, not yet linked or executable (.obj, .o file extensions)



A binary executable file: linked, independently executable (well, not always…) code (.exe, .out extensions, or no extension).

5

Chapter 1: Introduction

January, 2010

Compiler Execution 

What is O() of a compiler?

Extended Example 

Source code: a[index] = 4 + 2





Tokens: ID Lbracket ID Rbracket AssignOp Num AddOp Num





Parse tree (syntax tree with all steps of the parser in gory detail):

7

Chapter 1: Introduction

January, 2010

Parse Tree expression assign-expression expression

=

subscript-expression expression identifier a

8

[

expression identifier index

expression additive-expression

]

expression

+

number 4

Chapter 1: Introduction

expression number 2

January, 2010

Syntax Tree

a "trimmed" version of the parse tree with only essential information: assign-expression

subscript-expression

identifier a

9

identifier index

additive-expression

number 4

Chapter 1: Introduction

number 2

January, 2010

Annotated Syntax Tree (with attributes)

assign-expression integer subscript-expression integer

identifier a array of integer

10

identifier index integer

additive-expression integer

number 4 integer

Chapter 1: Introduction

number 2 integer

January, 2010

Intermediate Code    

Syntax tree very abstract Machine code too specific Something in between may make optimization much easier One such representation is three-address code 

Has only up to three different variables (addresses)

t=4+2 a[index] = t

Target Code Source code: a[index] = 4 + 2 Tokens: ID Lbracket ID Rbracket AssignOp Num AddOp Num

(edited & modified for this presentation): mov eax, 6 mov ecx, DWORD PTR _index$[ebp] mov DWORD PTR _a$[ebp+ecx*4], eax

(Note source level constant folding optimization.) 12

Chapter 1: Introduction

January, 2010

Source Code

Scanner Tokens

a[index] = 4 + 2

The Big Picture

ID Lbracket ID Rbracket AssignOp Num AddOp N

Parser

assign-expression

Syntax Tree Semantic Analyzer Annotated Tree Source Code Optimizer

additive-expression

subscript-expression

Literal Table Symbol

identifier a

identifier index

number 2

assign-expression

Table Error

number 4

integer additive-expression integer

subscript-expression integer

Handler Intermediate Code Code Generator Target Code Target Code Optimizer

Target Code

13

t=4+2 a[index] = t

identifier a array of integer

identifier index integer

number 4 integer

number 2 integer

mov eax, 6 mov ecx, DWORD PTR _index$[ebp] mov DWORD PTR _a$[ebp+ecx*4], eax

Chapter 1: Introduction

January, 2010

Algorithmic Tools 

Tokens: defined using regular expressions. (Chapter 2)



Scanner: 

an implementation of a finite state machine (deterministic automaton) that recognizes the token regular expressions (Chapter 2).

14

Chapter 1: Introduction

January, 2010

Algorithmic Tools (cont.) 

Parser 



A push-down automaton (i.e. uses a stack), based on grammar rules in a standard format (BNF – Backus-Naur Form). (Chapters 3, 4, 5)

Semantic Analyzer and Code Generator: 

Recursive evaluators based on semantic rules for attributes (properties of language constructs). (Chapters 6, 7, 8)

15

Chapter 1: Introduction

January, 2010

Other Phase Features 

Parser and scanner together typically operate as a unit (parser calls scanner repeatedly to generate tokens).



Front end: 



Parser, scanner, semantic analyzer and source code optimizer depend primarily on source language.

Back end: 

code generator and target code optimizer depend primarily on target language (machine architecture).

16

Chapter 1: Introduction

January, 2010

Other Classifications 

Logical unit: phase



Physical unit: separately compiled code file (see later)



Temporal unit: pass 

Passes: trips through the source code (or intermediate code). These are not phases (but they could be).

17

Chapter 1: Introduction

January, 2010

Data Structure Tools 

Syntax tree: 



Literal table:   



see previous pictures.

"Hello, world!", 3.141592653589793, etc. If a literal is used more than once (as they often are in a program), we still want to store it only once. So we use a table (almost always a hash table or table of hash tables).

Symbol table:  

all names (variables, functions, classes, typedefs, constants, namespaces). Again, a hash table or set of hash tables is the most likely data structure.

18

Chapter 1: Introduction

January, 2010

Error Handler     

One of the more difficult parts of a compiler to design. Must handle a wide range of errors Must handle multiple errors. Must not get stuck. Must not get into an infinite loop (typical simple-minded strategy:count errors, stop if count gets too high).

19

Chapter 1: Introduction

January, 2010

Kinds of Errors 

Syntax: 



Semantic: 



iff (x == 0) y + = z + r; }

int x = "Hello, world!";

Runtime: 

 

20

int x = 2; ... double y = 3.14159 / (x - 2);

Chapter 1: Introduction

January, 2010

Errors (cont.) 

A compiler must handle syntax and semantic errors, but not runtime errors (whether a runtime error will occur is an undecidable question).



Sometimes a compiler is required to generate code to catch runtime errors and handle them in some graceful way (either with or without exception handling). 

This, too, is often difficult.

21

Chapter 1: Introduction

January, 2010

Sample Compilers in This Class ("Toys") 

TINY: a 4-pass compiler for the TINY language, based on Pascal (see text, pages 22-26)



C-Minus: A project language given in the text(see text, pages 26-27 and Appendix A). Based on C.



SIL: Simple Island Language:

22

Chapter 1: Introduction

January, 2010

TINY Example read x; if x > 0 then fact := 1; repeat fact := fact * x; x := x - 1 until x = 0; write fact end

23

Chapter 1: Introduction

January, 2010

C-Minus Example int fact( int x ) { if (x > 1) return x * fact(x-1); else return 1; } void main( void ) { int x; x = read(); if (x > 0) write( fact(x) ); } 24

Chapter 1: Introduction

January, 2010

Structure of the TINY Compiler globals.h util.h scan.h parse.h symtab.h analyze.h code.h cgen.h

25

main.c util.c scan.c parse.c symtab.c analyze.c code.c cgen.c

Chapter 1: Introduction

January, 2010

Conditional Compilation Options 

NO_PARSE: 



NO_ANALYZE: 



Builds a scanner-only compiler.

Builds a compiler that parses and scans only.

NO_CODE: 

Builds a compiler that performs semantic analysis, but generates no code.

26

Chapter 1: Introduction

January, 2010

Listing Options (built in - not flags) 

EchoSource: 



TraceScan: 



Displays the syntax tree in a linearlized format.

TraceAnalyze: 



Displays information on each token as the scanner recognizes it.

TraceParse: 



Echoes the TINY source program to the listing, together with line numbers.

Displays summary information on the symbol table and type checking.

TraceCode: 

Prints code generation-tracing comments to the code file. 27

Chapter 1: Introduction

January, 2010

Terminology Review       

  

Compiler Interpreter Translator Assembler Linker Loader Preprocessor Editor Debugger Profiler 28



    

  



Source Language Target Language Target Platform Relocatable Macro substitution IDE Cross Compiler Dissambler Front End Back End

Chapter 1: Introduction

January, 2010