3 Feb 2010 ... Compiler Engineering from PathScale, his presentation ppt in ... ORC PACT02
tutorial [9], NVIDA Open64 ppt [6],Smart Memories Group.
Open64 Introduction Yulei Sui February 3, 2010 Abstract This document presents on how to use, develop and research on open64, We do not try to do research on all parts of Open64,and we mainly focus on IPA part of Open64. we use many materials from different resources, Including open64 user guide [4], Fred Chow the Director of Compiler Engineering from PathScale, his presentation ppt in delaware university [3], Overview of Open64 Architecture from Huston University [1], Whirl IR documents from open64 group [7], Whirl Symbol Table [11], SGI whirl Intermediate Language Specification [10],Open64 Developer Guide from AMD [8] , ORC PACT02 tutorial [9], NVIDA Open64 ppt [6],Smart Memories Group ppt from Stanford University [5],University of Alberta Compiler Design and Optimision lecture notes [2]
1 1.1
About Open64 History • Derived from SGI MIPSpro compiler suite • SGI open sourced the MIPSpro compilers in Summer 2000 as the Pro64 compiler suite • University of Delaware took over maintenance of Pro64 compiler suite rename it to ”Open64” • Intel enhanced the Open64 compiler code generator with advanced optimizations for Itanium, Itanium-enhanced Open64 compiler suite released as the Open Research Compiler (ORC) initialised by IntelMicroprocessorResearchLabs(MRL) • Pathscale retargeted Open64 compilers for x86-64 ,Osprey Project leverages effort made by Pathscale heavily – gnu-compatible since it uses the gcc, g++ front-ends – incorporates back-end support for Itanium
1.2
Open64 Active Contributors • HP (Initiated Osprey Project since Nov. 2005) – to productize open64 for Itanium/Linux, – to address the need for high performance production quality open source compiler – Merge various branches (x86, ORC), Itan and x86, follow gcc evolution • Qlogic (aka Pathscale) 1
– Retargeted for x86 family, updated GCC/G++ front end, polished the production quality for x86/Linux • Google – Initiated Go64 project in 2007, to develop performance advisory feature – Static analysis • University of Delaware – Cyclops – Hosts www.open64.net for source tree repository, etc. – Active contributor in HP Osprey Project • SNU – EPS, a global scheduler • University of Houston – OpenMP & Tools • UC Berkeley – UPC • UMN – Speculative Parallel Threading • Tsinghua University and China Academy of Science – Participates in HP Osprey Project – Initiated the Retargetability Project, to enable a new architecture port in a week
2
Overview of Open64 Compiler Architecture
Components and Features of Open64: The Open64 compiler is modularised, with different components that interact via a common IR, WHIRL. Open64 Compilation has server features: Single back-end for multiple front-ends; One IR, multiple levels of representation; Compilation process continuously lowers representation. Open64 also has several Optimisation features: LNO loop-oriented, based on data dependency; WOPT global scalar optimisation based on SSA; IPA inter-procedural, requiring whole-program analysis, include IPL and main IPA phase; CG targetdependent,IR Levels and Architecture are shown as in figure 1 Compilation Model of Open64: A driver controls the execution of Open64 and decide what modules to load and the compilation plan. The driver1 is responsible for invoking the front-ends, the stand-alone procedure inliner, IPA and backend. 1
driver/main.c
2
Figure 1: Open 64 IR Levels and Compiler Architecture
Main Driver
IPA
Frontend Driver IPL
Backend Driver IPA LINK
gfec/gfecc/mfef90 *.o(fake)
PreOpt
AS+Linker
LNO
Wopt
CG
*.N
*.O
*.s
*.I, *G
*.B
Figure 2: Open64 Compilation Model
3
.o and exec
3 3.1
WHIRL IR WHIRL Tree and Node WHIRL Representation • WHIRL node defined in common/com/wn core.h • Symbol Table See common/com/symtab*.h • Each function body represented by one big tree, smallest whirl node is 24 bytes • WHIRL symbol table for declarations, different tables for different declaration constructs • WHIRL file in ELF(Executable and Linkable Format) file format, Unique WHIRL file suffix according to phase: – Front-end cpp: .i (-E option) – Front-end gfec: .B (-keep option) Very High level WHIRL – IPA IPL: .o (-c option) fake o – IPA IPA LINK: .I and symbol table will merge into *.G – LNO: .N – WOPT: .O – CG: .s (-S option) – as: .o – ld: executable file • Suffix of IR Files Between Different Components GNU C/C++ *.B output: *.B
IPA/IPO
output: *.I *.G
Loop Nest Opt output: *.N WOPT output: *.O BE output: *.s GNU IPF AS/LD output: *.o Figure 3: Open64 Compilation Process • Function of Backend component: 4
Table 1: Function of Backend Components: com
Components Description location output be/be/driver *.o file ipa/local/ipl summary (whirl + summary) (ipl.so) ipa/common/ipa main
Functionality phase IPL
IPA
IPA (ipa.so) ipa/common/ipo main
*.I file (merged symtab)
IPO
*.O file
LNO
*.O file
PRE
(ipa.so) be/driver lno/lnodriver LNO (lno.so) be/opt/opt main WOPT (wopt.so) cg/cgdriver cg/cg
MAIN
*.s file
CG
PRE
(cg.so)
5
1. Preparation for IPA (be’s link file to build into ipl.so) 2. Invoke by preopt through be driver 3. Call VHO followed by preopt 4. Summary pre optimized WHIRL information of each PU 1.Linked with GNU ld call IPA LINK which build into ipa.so 2. Build combined global symbol and type table 3. Build call graph; alias analysis; constant propagation,DCE 4. inlining analysis, cloning analysis, structure field reording 1.Reads in and modifies WHIRL build into ipa.so 2.Read pu’s whirl, Common block padding and splitting 3. Global constant propagation, Indirect call to direct call 4.Inlining, PU reording, alias class analysis 1.Loop Peeling (lno/fusion.h) 2. Loop Tiling (lno/tile.h) 3. Loop Fission(lno/fission.h) 4.Loop Fusion(lno/fusion.h) 5.VectorData Prefetching Loop Unroll and Jam (lno/model.h) 6. Loop Interchange (permute.h) 1.PREOPT PHASE: used for PREOPT PHASE 3.PREOPT LNO PHASE: used for LNO PHASE 3.PREOPT DUONLY PHASE: called by LNO, disable optimize 4.PREOPT IPA0 PHASE: called by IPL 5.PREOPT IPA1 PHASE: called by main IPA 6.MAINOPT PHASE: used when optimisation level >= O2 1.CG-expand, CGIR 2. Control Flow Optimisation 3.Software pipelining, loop unrolling 4.Register allocation(global and local) 5. Prolog and Epilog
Design of WHIRL • Very High WHIRL – Preserve abstraction present in the source language – can be translated back to C/F90 – Constructs allowed only in VH WHIRL: * Comma operator * Nested function calls (e.g. f(nestedf());) * C select operator( ? and :) * For F90: triplet, arrayexp, arrsection, where – inliner can work at this level • High WHIRL – Constructs that support loop-level optimizations – Fixed(though not explicit) control flow – Key constructs: * ARRAY * DO loops (unified for and wile) * IF statements * FORTRAN I/O statements – IPA, PREOPT and LNO work at this level – Can be translated back to source language • Mid WHIRL – One-to-one mapping to RISC instructions – Control flow explicit via jumps – WOPT work at this level • Low WHIRL – Final form of WHIRL to facilitate translation to machine instructions in CG – Linkage convention exposed
3.2
A WHIRL Tree Example
We use a motivation c program to illustrate WHIRL tree and the nodes on it. the motivation c program’s source code is shown in Listing 1. And the WHIRL file which is generated by command ir b2a *.B is presented in Listing 2. And then we translate to a WHIRL tree and connected each node one them to describe the WHIRL file as shown in Figure 4. As there are two functions (PU, Program Unit) in the motivation c program, there will be two WHIRL three dumped: one is main as FUNC ENTRY root, another is foo as FUNC ENTRY root.
6
Listing 1: A Motivation C Program 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
# include < s t d i o . h> i n t obj , t ; void foo ( i n t * * , i n t * * ) ; main ( ) { int **x , **y ; int * a , *b , * c , *d , * e ; x=&a ; y =&b ; foo ( x , y ) ; *b = 5; i f ( t ) { x =&c ; y =&e ; } e l s e { x= &d ; y = &d ; } c = &t ; foo ( x , y ) ; *e = 10; } void foo ( i n t * * p , i n t * * q ) { *p = *q ; * q = &o b j ; } Listing 2: The WHIRL File(.B) after Transferring the Motivation C Program
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
====================main Function WHIRL Tree===================== LOC 0 0 s o u r c e f i l e s : 1 ” / opt / open64 / bin / t e s t / l e v . c ” LOC 1 2 void foo ( i n t * * , i n t * * ) ; LOC 1 3 main ( ) { FUNC ENTRY BODY BLOCK END BLOCK BLOCK END BLOCK BLOCK PRAGMA 0 120 0 ( 0 x0 ) # PREAMBLE END LOC 1 4 int **x , **y ; LOC 1 5 int * a , *b , * c , *d , * e ; LOC 1 6 x=&a ; y =&b ; U4LDA 0 T U4STID 0 T U4LDA 0 T U4STID 0 T LOC 1 7 foo ( x , y ) ; U4U4LDID 0 T 7
23 U4PARM 2 T # by val ue 24 U4U4LDID 0 T 25 U4PARM 2 T # by val ue 26 VCALL 126 # f l a g s 0 x7e 27 LOC 1 8 *b = 5; 28 I4INTCONST 5 ( 0 x5 ) 29 U4U4LDID 0 T 30 I4ISTORE 0 T 31 LOC 1 9 i f ( t ) { x =&c ; y =&e ; } 32 IF 33 I4I4LDID 0 T < 4 , . p r e d e f I 4 ,4 > 34 I4INTCONST 0 ( 0 x0 ) 35 I4I4NE 36 THEN 37 BLOCK 38 U4LDA 0 T 39 U4STID 0 T 40 U4LDA 0 T 41 U4STID 0 T 42 END BLOCK 43 ELSE 44 BLOCK 45 LOC 1 10 e l s e { x= &d ; y = &d ; } 46 U4LDA 0 T 47 U4STID 0 T 48 U4LDA 0 T 49 U4STID 0 T 50 END BLOCK 51 END IF 52 LOC 1 11 c = &t ; 53 U4LDA 0 T 54 U4STID 0 T 55 LOC 1 12 foo ( x , y ) ; 56 U4U4LDID 0 T 57 U4PARM 2 T # by val ue 58 U4U4LDID 0 T 59 U4PARM 2 T # by val ue 60 VCALL 126 # f l a g s 0 x7e 61 LOC 1 13 *e = 10; 62 I4INTCONST 10 ( 0 xa ) 63 U4U4LDID 0 T 64 I4ISTORE 0 T 65 RETURN 66 END BLOCK 67 LOC 1 14 68 LOC 1 15 } 69 LOC 1 16 70 ====================foo Function WHIRL Tree==================== 8
71 LOC 1 17 void foo ( i n t * * p , i n t * * q ) { 72 FUNC ENTRY 73 IDNAME 0 74 IDNAME 0 75 BODY 76 BLOCK 77 END BLOCK 78 BLOCK 79 END BLOCK 80 BLOCK 81 PRAGMA 0 120 0 ( 0 x0 ) # PREAMBLE END 82 LOC 1 18 *p = *q ; 83 U4U4LDID 0 T 84 U4U4ILOAD 0 T T 85 U4U4LDID 0 T 86 U4ISTORE 0 T 87 LOC 1 19 * q = &o b j ; 88 U4LDA 0 T 89 U4U4LDID 0 T 90 U4ISTORE 0 T 91 RETURN 92 END BLOCK
4
Symbol Table
5
WN management
6
How to Debug Open64 see the file named HOW-TO-DEBUG-OPEN64
6.1
Open64 Options Optimization Flags vs. Phases Invoked • -O0 (the default under -g): Front-end and code generator, all optimisations disabled • -O1 : Front-end and code generator, local optimisations only • -O2 (the default) : Add WOPT and rest of CG’s optimisations • -O3 : add LNO • -IPA (can be any opt level) • -Ofast (same as -O3 -ipa -OPT:Ofast -fno-math-errno -ffast-math) • -OPT:Ofast (same as -OPT:ro=2:Olimit=0:div split=ON:alias=typed) 9
FUNC ENTRY(main)
STID(x)
STID(y)
BLOCK
BLOCK
BLOCK
VCALL
ISTORE
IF
STID(c)
VCALL
LDA(t) LDA(a)
LDID(b) intconst LDA(b) PARM PARM
LDA(x) LDA(y) BLOCK(then) BLOCK(else)
NE
LDID intconst
STID(x)
STID(y)
LDA(c)
LDA(e)
(a) main Function WHIRL TREE
FUNC ENTRY(foo)
BLOCK
BLOCK
ISTORE
ISTORE
RETURN
LDID(p) ILOAD LDID(q)LDA(obj)
LDID(q) (b) foo Function WHIRL TREE
Figure 4: WHIRL Tree of the Motivation Example
10
RETURN
LDID(e) intconst PARM PARM
LDA(x) LDA(y)
IDNAME(p)IDNAME(q) BLOCK
ISTORE
Option Groups • -LIST:
User listing
• -LANG:
Language features
• -TARG:
Target machine
• -TENV:
Target environment
• -INLINE: • -IPA:
6.2
Inlining and optimisation Inter-procedural analysis and optimisation
• -LNO:
Loop nest optimisation
• -WOPT:
Global scalar optmisation
• -CG:
Code generation
• -OPT:
General Optimisations
Debugging IPA IPA in Open64 • IPA runs before LNO, WOPT and CG • IPA may trigger bugs down stream due to – Change in IR – Change in symbol table attributes Options and Phases’ Compilation Order • opencc -O3 :
cpp—>gfec—>inline—>be(lno,wopt)—>ld
• opencc -OPT:Ofast : • opencc -Ofast : • opencc -IPA : • opencc -OPT : • opencc -inline :
cpp—>gfec—>inline—>be(lno,wopt)—>ld
cpp—>gfec—>ipl—>ipa—>wopt, lno cpp—>gfec—>ipl—>ipa—>wopt, lno cpp—>gfec—>inline—>be(wopt,lno)—>ld cpp—>gfec—>inline—>be(wopt,lno)—>ld
• opencc -WOPT:aggstr=N :
cpp—>gfec—>inline—>be(wopt,lno)—>ld
11
IPA Options • opencc -O3 -IPA file1.c file2.c -o test -keep – if this compilation cannot pass or test fails at runtime • Try -O3(don’t do IPA) – if test passes, problem is NOT in IPA • Try -O0 -IPA – if test passes, problem likely in later phases * * * *
with the -keep option, all intermediate files are saved 1.I, 2.I, ... , n.I (IR files) symtab.G (merged symbol table file) linkopt.cmd, makefile.ipaxxxx (helper files to recompile and generate object and executable files)
– if test fails, problem almost certainly in IPA * * * * *
pinpoint in phase in IPL, IPA LINK(linker, ipa analysis, ipa optimisation) could turn off optimisation one at a time options in config ipa.{cxx,h} Pass options into ipl with -Wj Pass options into ipa with -Wi
• IPA Debug using GDB – Because of dlopen, gdb requires break point after all dlopen done before symbols from other .so visible to gdb – ipl (a.k.a be) must be build debug and ipl.so must be built debug( make BUILD OPTIMIZE=DEBUG) – ipa link(a.k.a new-ld) must be build debug and ipa.so must be built debug( make BUILD OPTIMIZE=DEBUG) ln -s
ln -s IPA LINK
IPL
be
new-ld
dlopen
dlopen ipl.so be.so
cg.so
wopt.so
be
lno.so
(a) debugging IPL
ipa.so (b) debugging IPA LINK
Figure 5: Debugging IPA
12
References [1] Overview of the open64 compiler infrastructure. Technical report, University of Houston Computer Science Department, High Performance Computing Tools Group, http://www2.cs.uh.edu/ dragon/Documents/open64-doc.pdf, 11 2002. [2] Jos´e Nelson Amaral. Compiler design and optimization. Technical report, Department of Computing Science,University of Alberta, http://webdocs.cs.ualberta.ca/ amaral/courses/680/, 2006. [3] fredchow. The open64 compiler-architecture and implementation approach. Technical report, PathScale, 2008. [4] Advanced Micro Devices Inc. and Open64 Developer Community. Using the x86 open64 compiler suite. Technical report, http://developer.amd.com/Assets/x86 open64 user guide.pdf, 2009. [5] Varun Malhotra. Open64 compiler. Technical report, Smart Memories Group Meeting, http://www-vlsi.stanford.edu/smart memories/protected/meetings/summer2003/Open64Compiler.pdf, July 8 2003. [6] Mike Murphy. Tutorial on nvidia’s open64 sources. Technical report, nvopencc tutorial, http://wiki.open64.net/images/1/10/Nvopencc-tutorial.pdf, 11 2006. [7] open64 group. Open64 compiler whirl intermediate representation. Technical report, Open64, http://www.mcs.anl.gov/OpenAD/open64A.pdf, August 2007. [8] Member of AMD Technical Staff Ramshankar Ramanarayanan. Open64 compiler developer guide, 12 2009. [9] Fred Chow Xiaobing Feng William Chen Roy Ju, Sun Chan. Open research compiler (orc) beyond version 1.0. Technical report, ORC, http://ipf-orc.sourceforge.net/ORC-PACT02tutorial.pdf, 9 2002. [10] SGI. Whirl intermediate language specification. Technical report, http://prdownloads.sourceforge.net/open64/whirl.pdf?use mirror=aleron, 2000.
SGI,
[11] SGI. Whirl symbol table specification. Technical report, http://nchc.dl.sourceforge.net/project/open64/open64/Documentation/symtab.pdf, 2000.
13