1 .329 Introduction to Compiler. Construction. May - July, 1999. Dr. M.
Maheswaran. Dept. of Computer Science. University of Manitoba.
,QWURGXFWLRQ WR &RPSLOHU &RQVWUXFWLRQ 0D\ -XO\
Dr. M. Maheswaran Dept. of Computer Science University of Manitoba
[email protected] www.cs.umanitoba.ca/~maheswar/cs329
2XWOLQH ◆
◆
1. Introduction ➨ Motivation & Terminology ➨ Compilation vs. Interpretation ➨ Separate Compilation & Linking 2. Scanning ➨ Ad-hoc techniques ➨ FSM based scanning ➨ An automated tool: Lex
1
2XWOLQH ◆
◆
3. Parsing ➨ Grammars, Parsing & Symbol Tables ➨ Recursive Descent Parsing ➨ LR parsing & Operator Precedence parsing ➨ An automated tool: YACC 4. Compilation Basics (back-end) ➨ Semantic Analysis ➨ Code Generation ➨ Introduction to Optimization
2XWOLQH ◆
◆
5. Interpretation of source code ➨ Handling Declarations ➨ Handling Control Structures ➨ Handling Expressions ➨ Handling I/O 6. Linkage Editing ➨ Address Constants ➨ External References and the Linking Process ➨ Object Libraries: static and dynamic linking
2
2XWOLQH ◆
7. Recent Developments (time permitting) ➨ Byte codes ➨ Just In Time compilation ➨ Interaction with non-ISA components
,QWURGXFWLRQ ◆
◆
◆
Compilers, and their near cousins interpreters, are ubiquitous in program development ➨ everyone uses them or their by-product (compiled code) Compilers are widely available and largely ignored ➨ at least until something goes wrong So why worry about compilers?
3
,QWURGXFWLRQ PRWLYDWLRQ ◆
◆
What would life be like without compilers? ➨ Consider programming complex applications in assembly language (or worse, machine code) ➨ “Visual Assembler” anyone? compilers relieve programmers of the tedium and complexity of low-level programming ➨ and hence make programming considerably easier and less error prone
,QWURGXFWLRQ PRWLYDWLRQ ◆
◆ ◆
◆
But, you ask, “Aren’t compilers well understood and simple to produce?” The answer to this is “yes” and “no” Certain, general, compilation techniques are very well understood Unfortunately, computer designs continue to change as do languages ➨ this impacts compilers at both ends
4
,QWURGXFWLRQ PRWLYDWLRQ ◆
◆ ◆
But, you ask, “Isn’t compiler construction an esoteric topic of little use to most programmers?” Again, “yes” and “no” While developing a compiler is a skill that will be used by only a few of our graduates, the techniques used in such development are more widely applicable
,QWURGXFWLRQ PRWLYDWLRQ ◆
Additionally, much of what you will learn in this course will provide you with valuable knowledge you can employ when using compilers and their associated software development tools ➨ e.g. people who understand compilers often debug more easily because the obscure error messages make more sense ➨ also, understanding the linking process is critical to large scale software development
5
,QWURGXFWLRQ PRWLYDWLRQ $OJRULWKP 6RXUFH WH[W (GLWRU 6RXUFH 3URJUDP WH[W ³&RPSLOHU´ $VVHPEO\ /DQJXDJH WH[W $VVHPEOHU 2EMHFW 0RGXOHV FRGH /LQNHU 0DFKLQH &RGH FRGH 0DFKLQH
:KDW LV &RPSLODWLRQ" ◆
Compilation is the process of taking a program specified in a High Level Language (H.L.L.) such as C/C++ and translating it into a low level language (typically assembler, but possible machine code itself) ➨ Compilation typically translates a single HLL statement into many assembly/machine-code statements
6
:KDW LV &RPSLODWLRQ" ◆
◆
Notice that this is a 1:many mapping/ translation ➨ an assembler performs a 1:1 mapping This translation process is surprisingly difficult to do well and is affected by both the source language (HLL) and target machine architecture (machine code)
([DPSOH &RPSLODWLRQV Compiling some HLL statements GOTO statement ◆ GOTO is a control structure most closely resembling an assembly language instruction ◆
GOTO label; statement1 label: statement2
JUMP label code for statement1 label: code for statement2
7
([DPSOH &RPSLODWLRQV IF statement ◆ IF the condition is true, THEN statement is executed. Otherwise, the statement is ignored. IF expresssion THEN statement;
code for expression TEST JUMPF done code for statement
+RZ LV D FRPSLOHU VWUXFWXUHG" ◆
◆
◆
A compiler is typically structured as a number of phases Each phases is executed in turn to gather information about the program being compiled and/or to transform the program in some way Collectively, execution of all phases transforms the source code into a machine code object module
8
+RZ LV D FRPSLOHU VWUXFWXUHG" 6RXUFH &RGH
/H[LFDO $QDO\VLV6FDQQLQJ
)URQW (QG
6\QWD[ $QDO\VLV3DUVLQJ 6HPDQWLF $QDO\VLV 0DFKLQH ,QGHSHQGHQW 2SWLPL]DWLRQ
%DFN (QG
&RGH *HQHUDWLRQ 0DFKLQH 'HSHQGHQW 2SWLPL]DWLRQ
0DFKLQH &RGH
:KDW LV DQ ,QWHUSUHWHU" ◆ ◆
An interpreter is a little different from a compiler Rather than translating a HLL program to machine code an interpreter examines the code and performs the specified computations immediately (without producing code) ➨ it interprets the meaning of the code
9
+RZ LV DQ LQWHUSUHWHU VWUXFWXUHG 6RXUFH &RGH
/H[LFDO $QDO\VLV6FDQQLQJ 6\QWD[ $QDO\VLV3DUVLQJ 6HPDQWLF $QDO\VLV ³([HFXWLRQ´
◆
The interpreter produces results not code to run which will then produce the results ➨ sounds great, but in practice, interpretation is at least an order of magnitude slower than compiled code
6RXUFH &RGH WR ([HFXWLRQ ◆
In a compiled (rather than interpreted) environment, a programmer goes through a typical sequence to get results from a program ➨ compile - produce machine code from HLL ➨ link - combine with library code ➨ go/run/execute - execute the resulting executable file
10
6RXUFH &RGH WR ([HFXWLRQ ◆
◆
This process is fine for small, simple programs ➨ In fact, it is commonly hidden behind a “Make” or “Run” menu item in most modern development environments What about for large, complex software systems though? ➨ Think on the scale of thousands, or even millions of lines of code
6HSDUDWH &RPSLODWLRQ ◆
In large systems, in is impractical to try and write and maintain the code in a single module (i.e. unit) of source code ➨ too large to understand ➨ not logically cohesive ➨ impossible to maintain ➨ too costly to re-compile ● consider fixing a 1 character syntax error in a 50 million line program
11
6HSDUDWH &RPSLODWLRQ ◆
◆
◆
To address this problem, most languages and compilers support separate compilation With separate compilation, a programmer divides up a large program into a number of relatively independent pieces, each of which may be compiled individually/ separately to produce an object module The resulting object modules are then combined by a linker (a.k.a. linkage editor)
6HSDUDWH &RPSLODWLRQ ◆
Consider the following:
+//
FRPSLOHU
2EMHFW &RGH
+//
FRPSLOHU
2EMHFW &RGH
+//
FRPSLOHU
2EMHFW &RGH
/LEUDU\ 2EMHFW &RGH OLQNHU
([HFXWDEOH
12
,QFUHPHQWDO &RPSLODWLRQ ◆
Another way of speeding up the re-compilation process is incremental compilation ➨ Here, the compiler maintains machine code for each unit and changes involve only modifications to the existing machine code rather than complete re-generation
6FRSH RI WKH &RXUVH ◆
◆
◆ ◆
This course will cover basic techniques for front end processing, including a basic discussion of automated tools It will quickly survey back end processing ➨ see 74.429 for more on this topic It then discusses interpreter implementation It will finish by talking about the basics of the linking process
13