Extending Sim286 to the Intel386 Architecture with 32 ...

2 downloads 0 Views 266KB Size Report
predecessor of Sim386, Sim286, performed 16-bit processing. Second, Sim386 can accept both COM and. ELF binary le input; the predecessor of Sim386, ...
Extending Sim286 to the Intel386 Architecture with 32-bit processing and Elf Binary input Michael L. Haungs and Brian A. Malloy fmhaungs,[email protected] Dept. of Computer Science Clemson, SC 29634 Abstract

The trend in processor development is that each new processor is soon replaced by a newer and more powerful processor. To facilitate processor development, the design and implementation of a processor is typically paralleled by the design and implementation of a simulator that can be used to avoid errors in the development process. The ideal is that the family of processors should be accompanied by the design and implementation of a family of simulators where each successive simulator can be derived from the previous by an incremental change in both the design and implementation of the simulator. In this paper, we report the addition of Sim386 to a family of simulators, Simx86, for the Intel 80x86 family of processors. The construction of Sim386 involved two important extension over its predecessor. First, Sim386 performs both 16 and 32-bit processing; the predecessor of Sim386, Sim286, performed 16-bit processing. Second, Sim386 can accept both COM and ELF binary le input; the predecessor of Sim386, Sim286, accepted only COM le input. The second extension makes Sim386 a more viable tool, since ELF binaries are more widely accessible than COM les.

1 Introduction The trend in processor development is that each new processor is soon replaced by a newer and more powerful processor. The Intel family of 80x86 processors is illustrative of this trend: most recently, the Pentium has been replaced by the Pentium Pro, and the Pentium Pro has been replaced by the Pentium II. Soon, the Pentium II will be replaced by an even newer and more powerful processor. To facilitate processor development, the design and implementation of a processor is typically paralleled by the design and implementation of a simulator that can be used to avoid errors in the development process. The ideal is that the family of processors should be accompanied by the design and implementation of a family of simulators where each successive simulator can be derived from the previous by an incremental change in both the design and implementation of the simulator. All too often, the family of simulators ideal is not achieved. In this paper, we report on the development of a family of simulators, Simx86, for the Intel 80x86 family of processors. We begin by reviewing previous simulators in the family, Sim8088[7] and Sim286[4]. We then 1

describe the e ort involved in building Sim386, the successor to Sim286. The construction of Sim386 involved two important extension over its predecessor. First,

Sim386

performs both 16 and 32-bit processing; the

predecessor of Sim386, Sim286, performed 16-bit processing. Second,

Sim386

can accept both COM and

ELF binary le input; the predecessor of Sim386, Sim286, accepted only COM le input. The rst extension exposed an important drawback in Sim286: the extension of Sim8088 to Sim286 involved implementation extension but not a corresponding design extension. Thus, the inclusion of 32-bit processing in

Sim386

required a major e ort. The second extension makes Sim386 a more viable tool, since ELF binaries are more widely accessible than COM les. A third contribution of this work is the compendium of information that we provide about the design and layout of ELF binary les. This information can facilitate the work of other developers who use ELF binaries. Finally, we describe approaches for improving the design of Sim386 to make it easier to modify and extend to future processor simulators and we describe an approach to implementing dynamic linking of ELF binaries. To summarize, the contributions of this work include:  The construction of Sim386, a partial simulator of the Intel 80386 processor; Sim386, like its 80386 counterpart, performs both 16 and 32-bit processing.  The incorporation of functionality into Sim386 to accept ELF binary input.  A description of the design and layout of ELF binaries including examples of our incorporation of ELF binary input into Sim386.  A description of an approach to the redesign of Sim386 to make it easier to modify and extend and an approach to implementing dynamic linking of ELF binaries. In the next section we provide background about simulators including Simx86, the family of simulators for the Intel 80x86 architecture, together with background about ELF binaries. In Section 3, we describe our implementation of Sim386 to include 32-bit processing and to accept ELF binary input. In Section 4 we report the results of some experiments comparing the COM le executions of the test suite described in reference [4] using COM and ELF binaries as input. In Section 5 we describe an approach to the redesign of Sim386

and to incorporating dynamic linking of ELF. Finally, in Section 6 we draw some conclusions.

2 Background In this section, we provide background for the Intel family of processors that relate to this work; in particular, we overview the processors ranging from the 8086 to the 80386. We also provide background about previous 2

32 bit processing

24 bit bus

Real/Proteced

Virtual addressing

Mode 16 bit processing

8088/8086 8 bit bus/16 bit bus

Task Management

interrupts

Protection Mode

Segmentation

Prefetcing

80286 Paging

Co-processing

80386

Figure 1: Overview. This gure illustrates an overview of the Intel family of architectures.The solid underlined features are implemented in Sim286, the dash underlined feature is implemented in Sim386. Sim8088 and Sim286 accept COM input les as input; Sim386 accepts both COM and ELF input les. versions of Simx86 simulators including Sim8088[7], and Sim286[4].

2.1 The 80x86 Processor Family The origin of the 80x86 family of processors began in 1978 with the introduction of the 8086 processor. Shortly thereafter, the 8088 processor was added to the family. Both processors have 16-bit registers and use 20 bits (little-endian) to address memory; this permits addressing of a megabyte of memory. Instructions provided for three type of operands: memory, register, and immediate. Instructions could combine these operand types in any manner, except that two memory operands could not be included in the same instruction. The important distinction between the 8086 and the 8088 is that the 8086 processor had a 16-bit external data bus and a 16-bit internal data bus whereas the 8088 processor had an 8-bit external data bus and a 16-bit internal data bus. 3

The 80186/80188 architectures, like the 8086 and 8088, are nearly identical. The only di erence between the 80186 and 80188 is the width of their data buses. The internal registers of the 80186/80188 are identical to the 8086/8088. The only di erence is that the 80186/80188 contains additional reserved interrupt vectors and some very powerful built in I/O features. The 80286 introduced several new features as compared to the 8086. The 80286 has two di erent types of operating mode: real address mode and protected address mode. The real address mode was introduced in the 80286 to permit backward compatibility with the 80186 and the 8086 processors. In real mode the 80286 uses 24 bits to address up to 16 megabytes of memory. In the protected mode the 80286 uses 32 bits to address up to 1 gigabyte of memory. The advanced architectural features and full capabilities of the 80286 are realized in its native protected mode. Among these features are sophisticated mechanisms to support data protection, system integrity, task concurrency, and memory management, including virtual storage. The 80386 added memory paging and introduced 32-bit processing. The size of data registers were increased to 32 bits. No substantial changes to memory addressing and registers have been made in processors that followed the 80386. Rather, subsequent 80x86 processors have concentrated on ne tuning the micro-architecture of the processor to increase performance. The 80486 performs aggressive pipelining and integrated the CPU and FPU ( Floating Point Unit ) on one chip.

2.2 Approaches to Processor Simulation Traditionally, simulators of computer architecture have been trace-driven. Trace-driven simulation has been especially useful to evaluate cache performance. In Trace-driven simulation, a trace of the instructions executed by the processor is recorded in a le and then later interpreted by the simulator. In simulating meaningful programs, the trace les are large and dicult to obtain. Execution-driven simulation provides an alternative to trace-driven simulation. Instead of interpreting the instructions in a trace le, the simulator actually interprets instructions . Since cross-compilers are often readily available, executables are much easier to obtain and store.

4

ProcessorSimulator ProcessorSimulator(char*); // Load program from file int runSim();

CPU

BIU

Bus

Memory

CPU(BIU*);

Memory(char *); Instruction* decodeInstruction(Address); Instruction* readInstruction(Address);

BIU(Bus*);

Bus(Memory*);

Word getFlags(); void setFlags(Word);

Instruction* fetchInstruction(Address);

Instruction* fetchInstruction(Address);

void storeInstruction(Address, Instruction*);

void storeInstruction(Address, Instruction*);

Content readReg(registerName);

Byte fetchDataByte(Address);

Byte fetchDataByte(Address);

void writeInstruction(Address, Instruction*); Byte readDataByte(Address);

void writeReg(registerName, Content&);

void storeDataByte(Address, Byte);

void storeDataByte(Address, Byte);

void writeDataByte(Address, Byte);

Word fetchDataWord(Address); Instruction* fetchInstruction(Address);

void storeDataWord(Address, Word);

void storeInstruction(Address, Instruction*); Byte fetchDataByte(Address); void storeDataByte(Address, Byte); Word fetchDataWord(Address); void storeDataWord(Address, Word); int go(); //Start fetch and execute cycle

Address

Register

2

Byte

Operand

Content read(); void Write(Content);

Word

Instruction

virtual Word read(CPU&) = 0; virtual void write(CPU&, Word) throw(ErrorMsg) = 0; virtual Address getEffectiveAddress(CPU&) throw(ErrorMsg) = 0; Byte getSize();

Instruction(); virtual void execute(CPU&) throw(Halt, ErrorMsg) = 0; Byte getLength(); void setLength(Byte);

Figure 2: Simx86. The original model for the x86 architecture.

2.3 Simulating the x86 Architecture In this section we overview the simulators that preceded

Sim386.

We begin by overviewing Simx86, the

framework of classes that describes an architecture for most modern processors. We then discuss Sim8088, a simulator for the Intel 8088 processor, and Sim286, a simulator for the Intel 80286 processor.

2.3.1

Simx86

Figure 2 shows the basic model for the Simx86 simulator. The essential entities of a processor are represented by classes in the object model and their relations are shown by the lines joining them. These entities are part of each processor in the x86 family. Processors improve their performance by adding to the functionality of these basic entities. Using this basic model, we intend to evolve our simulators, as the processors of the Intel x86 family evolved, by applying Object Oriented techniques such as inheritance, genericity and polymorphism. 5

ProcessorSimulator

Proc8088Simulator

ProcessorSimulator(char*); // Load program from file

Proc8088Simulator(char*); // Load program from file

int runSim();

CPU CPU(BIU*); Word getFlags(); void setFlags(Word);

BIU

Bus

Memory Memory(char *);

BIU(Bus*);

Bus(Memory*);

Instruction* fetchInstruction(Address);

Instruction* fetchInstruction(Address);

void storeInstruction(Address, Instruction*);

void storeInstruction(Address, Instruction*);

Content readReg(registerName); void writeReg(registerName, Content&);

Instruction* decodeInstruction(Address); Instruction* readInstruction(Address); void writeInstruction(Address, Instruction*); Byte readDataByte(Address); void writeDataByte(Address, Byte);

Instruction* fetchInstruction(Address); void storeInstruction(Address, Instruction*); Byte fetchDataByte(Address);

BIU8Bit

void storeDataByte(Address, Byte); Word fetchDataWord(Address);

BIU8Bit(Bus8Bit*);

void storeDataWord(Address, Word);

Byte fetchDataByte(Address);

int go(); //Start fetch and execute cycle

void storeDataByte(Address, Byte); Word fetchDataWord(Address);

Bus8Bit Bus8Bit(Memory*);

Byte fetchDataByte(Address); void storeDataByte(Address, Byte);

void storeDataWord(Address, Word);

Register Content read(); void Write(Content);

Address

Word

2

DataStore Byte

Byte read(Address); void write(Address, Byte);

CodeStore Instruction* read(Address); void write(Address, Instruction*);

Figure 3: Sim8088. This gure illustrates the class diagram for Sim8088, the simulator for the Intel 8088 processor. Figure 3 illustrates the extension of

Simx86

to simulate the 8088 processor.

Sim8088

simulated the

execution of a COM le. The runSim method of the Proc8088Simulator class fetches one instruction and executes it before fetching the next instruction. Thus, all phases of instruction simulation are sequential.

2.3.2

Sim286

The simulator for the Intel 80286, Sim286, can be partitioned into two class frameworks where the rst framework is an extension of the framework for Sim8088 and the second framework incorporates an event list to simulate the parallel execution of events. We adopt the naming convention used in reference [4] to refer to these two frameworks as the Architecture Framework, captured in Figure 4, and the Simulation Framework. The extensions to Sim286 that we report in this paper focus on the Architecture Framework.

The interested reader may consult reference [4] for a description of the simulation framework. The CPU class for Sim286 is extended to include a global descriptor table register, a local descriptor table 6

ProcessorSimulator

CPU

Proc286Simulator

BIU

Memory

Bus

RealModeBIU

VirtualModeBIU Physical Address

Register

PrefetchQueue DataStore

DescriptorTableReg

DescriptorTable

SegRegister

CodeStore

VirtualAddress

SegDescriptor

Figure 4: Architecture framework for Sim286. This gure illustrates the class diagram that describes the architecture of Sim286, the simulator for the Intel 80286 processor. We refer to this framework of classes as the architecture framework. register, an interrupt descriptor table register and segment registers. The CPU class is shown on the left side of Figure 4 with classes DescriptorTableReg, DescriptorTable, and

SegRegister

drawn beneath

CPU.

These classes, together with class Register, are the components of our representation of the Intel processor; thus, they form an aggregation relationship with class CPU as illustrated by the diamond connector in the gure. The gure shows that class Register is templated, a feature that facilitated the extension of Sim286 from 16-bit processing to 32-bit processing. The Sim286 simulator extended Sim8088 to include both real mode and protected mode. This extension is illustrated in Figure 4 with classes RealModeBIU and VirtualModBIU derived from the BIU class; the arrow connector in the gure represents the inheritance relationship[6]. The gure also illustrates that the

BIU

is related to class PrefetchQueue through aggregation. An important feature of Sim286 is the ability to simulate the prefetch and decode of instructions in parallel with other CPU operations. 7

2.4 The Executable and Linking Format (ELF) The executable and linking format (ELF) was originally developed by Unix System Laboratories and is rapidly becoming the standard in le formats[8]. The ELF standard is growing in popularity because it has greater power and exibility than the a.out and COFF binary formats[3]. ELF now appears as the default binary format on operating systems such as Linux, Solaris 2.x, and SVR4. Some of the capabilities of ELF are dynamic linking, dynamic loading, imposing runtime control on a program, and an improved method for creating shared libraries[3]. The ELF representation of control data in an object le is platform independent, an additional improvement over previous binary formats. The ELF representation permits object les to be identi ed, parsed, and interpreted similarly, making the ELF object les compatible across multiple platforms and architectures of di erent size. The three main types of ELF les are executable, relocatable, and shared object les. These le types hold the code, data, and information about the program that the operating system and/or link editor need to perform the appropriate actions on these les. The three types of les are summarized as follows:   

An executable le supplies information necessary for the operating system to create a process image suitable for executing the code and accessing the data contained within the le. A relocatable le describes how it should be linked with other object les to create an executable le or shared library. A shared object le contains information needed in both static and dynamic linking.

In the next section we overview the ELF le format including a detailed description of each of the ve section types that an ELF le might include. These ve types are (1) the ELF header, (2) the program header table, (3) the section header table, (4) the ELF sections, and (5) the ELF segments. In Section 2.4.2, we describe the representation of data in an ELF le. The interested reader may consult reference [8] for additional information about the ELF format.

2.4.1 The ELF File Format There are two views for each of the three le types described in the previous section. These views support both the linking and execution of a program. The two views are summarized in Figure 5 where the view on the left of the gure is the link view and the view on the right of the gure is the execution view. The link view of the ELF object le is partitioned by sections and the execution view of the ELF object le 8

Linking View

Execution View

ELF header

ELF header

Program header table

Program header table

(optional) section 1 Segment 1

... Segment 2

section n

... ...

... ...

Section header table

Section header table (optional)

Figure 5: Linking and Execution Views: This gure illustrates the format of an ELF object le. is partitioned by segments. Thus, the programmer interested in obtaining section information about the program items such as symbol tables, relocation, speci c executable code or dynamic linking information will use the link view; the programmer interested in obtaining segment information such as the location of the text segment or data segment will use the execution view. The ELF access library, libelf, provides a programmer with tools to extract and manipulate ELF object le contents for either view. The ELF header describes the layout of the rest of the object le. It provides information on where and how to access the other sections. The Section Header Table gives the location and description of the sections and is mostly used in linking. The Program Header Table provides the location and description of segments and is mostly used in creating a programs' process image. Both sections and segments hold the majority of data in an object le including: instructions, data, symbol table, relocation information, and dynamic linking information.

The ELF Header The ELF Header is the only section that has a xed position in the object le. It is always the rst section of the le. The other sections are not guaranteed to be in any order or to even be present. The ELF Header describes the type of the object le (relocatable, executable, shared, core), its target architecture, and the version of ELF it is using. The location of the Program Header table, Section Header table, and String table along with associated number and size of entries for each table are also given. Lastly, the ELF Header contains the location of the rst executable instruction. The speci c elds along with their size requirements that are present in the ELF header are shown in Figure 6. 9

#define EI_NIDENT

16

typedef struct { unsigned char Elf32_Half Elf32_Half Elf32_Word Elf32_Addr Elf32_Off Elf32_Off Elf32_Word Elf32_Half Elf32_Half Elf32_Half Elf32_Half Elf32_Half Elf32_Half } Elf32_Ehdr;

e_ident[EI_NIDENT]; e_type; e_machine; e_version; e_entry; e_phoff; e_shoff; e_flags; e_ehsize; e_phentsize; e_phnum; e_shentsize; e_shnum; e_shstrndx;

// // // // // // // // // // // // // //

file identification, interpretation object file type target architecture ELF version virtual address of first executable code file offset to program header table file offset to section header table processor-specific flags the ELF header's size size of one entry in program header table number of entries in program header table size of one entry in section header table number of entries in section header table section header index for string section

Figure 6: The ELF Header typedef struct { Elf32_Word Elf32_Off Elf32_Addr Elf32_Addr Elf32_Word Elf32_Word Elf32_Word Elf32_Word } Elf32_Phdr;

p_type; p_offset; p_vaddr; p_paddr; p_filesz; p_memsz; p_flags; p_align;

// // // // // // // //

type of the segment file offset to segment virtual address of first byte of segment segments' physical address, if relevant size of file image of segment size of memory image of segment segment-specific flags alignment requirements

Figure 7: The Program Header

The Program Header Table Program headers are only important in executable and shared object les. The program header table is an array of entries where each entry is a structure describing a segment in the object le or other information needed to create an executable process image. The size of an entry in the table and the number of entries in the table are given in the ELF header (See Figure 6). Each entry in the program header table (see Figure 7) contains the type, le o set, physical address, virtual address, le size, memory image size, and alignment for a segment in the program. The program header is crucial to creating a process image for the object le. The operating system copies the segment (if it is loadable, i.e., if p type is PT LOAD) into memory according to the location and size information. The p type eld is shown in 10

typedef struct { Elf32_Word Elf32_Word Elf32_Word Elf32_Addr Elf32_Off Elf32_Word Elf32_Word Elf32_Word Elf32_Word Elf32_Word } Elf32_Shdr;

sh_name; sh_type; sh_flags; sh_addr; sh_offset; sh_size; sh_link; sh_info; sh_addralign; sh_entsize;

// // // // // // // // // //

name of section (index into String table) type of the section section-specific attributes memory location of section file offset to section size of section section type dependent extra information (section type dependent) address alignment constraints size of an entry in section

Figure 8: The Section Header Figure 7 as the rst item in the struct.

The Section Header Table All sections in object les can be found using the Section header table. The section header, similar to the program header, is an array of structures. Each entry correlates to a section in the le. The entry provides the name, type, memory image starting address (if loadable), le o set, the sections size in bytes, alignment, and how the information in the section should be interpreted. Figure 8 details the speci c elds of the structure. The name provided in the structure is actually an index into the string table (a section in the object le) where the actual string representation of the name of the section exists. Sections will be discussed further below.

ELF Sections There are a number of types of sections described by entries in the section header table. Sections can hold executable code, data, dynamic linking information, debugging data, symbol tables, relocation information, comments, string tables, and notes. Some sections are loaded into the process image and some provide information needed in the building of a process image while still others are used only in linking object les. Figure 9 displays a list of special sections along with a brief description.

ELF Segments Segments are a way of grouping related sections. For example, the text segment groups executable code, the data segment groups the program data, and the dynamic segment groups information relevant to dynamic loading. Each segment consists of one or more sections. A process image is created by loading and interpreting segments. The operating system logically copies a les segment to a virtual memory 11

Names of sections .bss .comment .data and .data1 .debug .dynamic .dynstr .dynsym .fini .got .hash .init .interp .line .note .plt .relname and .relaname .rodata and .rodata1 .shstrtab .strtab .symtab .text

Description of the section Uninitialized Data present in process image Version control information Initialized data present in process image Information for symbolic debugging Dynamic linking information Strings needed for dynamic linking Dynamic linking symbol table Process termination code Global offset table Symbol hash table Process initialization code Path name for a program interpreter Line number information for symbolic debugging File notes Procedure linkage table Relocation Information Read-only data Section names Usually names associated with symbol table entries Symbol Table Executable instructions

Figure 9: Special Sections. A brief description of sections that can appear in an ELF object le. segment according to the information provided in the program header table. The OS can also use segments to create a shared memory resource. Figure 9 summarizes the sections that might be included in a segment.

2.4.2 ELF Data Representation The ELF control data is represented in a machine-independent format so that it can be accessed and interpreted seamlessly across machines. Figure 10 lists the de nitions for the storage classes of the ELF control data. The remaining data in the object le, the data other than the control data, can be encoded to agree with the byte order, in the way necessary for the target machine. All data structures that the object le format de nes follow the size and alignment guidelines for the relevant storage class[8]. If necessary, data structures are padded to ensure alignment; for example, a data structure might contain explicit padding to ensure 4-byte alignment for 4-byte objects, to force structure sizes to be a multiple of 4[8]. Alignment information is also included in the structures for sections and segments so that these structures, when placed in memory, can be properly aligned. In order to maintain a high level of portability, data elds in structures are expressed in bytes rather than bits since bit manipulation can be machine dependent. The cost of this portability is some wasted space. 12

Name Elf32_Addr Elf32_Half Elf32_Off Elf32_Sword Elf32_Word unsigned char

Size 4 2 4 4 4 1

Alignment 4 2 4 4 4 1

Purpose Unsigned program address Unsigned medium integer Unsigned file offset Signed large integer Unsigned large integer Unsigned small integer

Figure 10: Data representation. This gure illustrates the representation of ELF data. These data descriptions are machine independent so that a data type that is designated as an Elf32 Half will be the same size on all machines. An Elf32 Half might be used to represent an unsigned short or an unsigned char on some machines. The association between language data types and ELF data types is made in the le . < sys=elf types:h >

3 The Implementation of Sim386 In this section, we present our design and implementation for the

Sim386

simulator. Our design is an

extension of the design of the Sim286 simulator. Through subclassing, Sim386 increases extensibility over Sim286

and facilitates extension of Sim386 to simulators for the Intel486 and Pentium processors.

Sim386

simulates a subset of the features of the 80386 processor. These features are those inherited from Sim286 (virtual memory addressing, protection mechanism, segmentation, prefetching of instructions, and the real and protected modes of operation) and those added by the extensions in the current work (32 bit processing and ELF binary input). We also modi ed the user interface of

Sim286

to take additional command line

parameters to eliminate recompiling the simulator for di erent modes of operation. First we describe, in section 3.1, the modi cations made to the class framework of Sim286, as shown in Figure 4, necessary for the features added in Sim386. Next, in section 3.2, we describe the modi cations and additions that were necessary to convert Sim286 to 32-bit processing. We then explain how Sim286 was extended to include ELF binaries as input in section 3.3.

3.1 Extending the design of Sim286 To incorporate ELF binary input, we refactored the

Memory

class and the

ProcessorSimulator

Class.

Both of these classes were designed to accept input in COM le format and to simulate execution in an MS-DOS type environment. The

Memory

and

ProcessorSimulator

classes are redesigned so that they

are easier to extend to alternate input le formats and operating environments. The Memory class is now a base class that embodies a common interface to di erent types of input les. Two new classes have 13

Processor Simulator

Memory

COMMemory

ELFMemory

DOSProcessor Simulator

COFFMemory

LinuxProcessor Simulator

SolarisProcessor Simulator

Figure 11: The Extensions to the Sim286 framework. The classes shown in bold are new classes added in Sim386. The classes that are dashed show possible future extensions. been added to the framework; both classes are derived from the

Memory

class. The rst class that was

added is COMMemory, which manages COM input les exactly as done in Sim286. The other class that was added,

ELFMemory,

manages ELF input les. We refactored the

ProcessorSimulator

class in a similar

manner. The ProcessorSimulator class is now a base class that performs simulator initializations that are common to varying platforms. Initializations that are particular to speci c platforms are incorporated into classes that are derived from ProcessorSimulator. Two classes that inherit from ProcessorSimulator are DOSProcessorSimulator

and LinuxProcessorSimulator and they perform platform dependent simulator

initialization. Section 3.2.3 further discusses this topic of simulator initialization. The extensions to the Sim286

class framework are illustrated in Figure 11. There are a number of distinct advantages that this

refactoring provides: 1. Sim386 is now extensible to alternate input object le formats such as COFF or a.out. 2. Sim386 is also now extensible to simulating other operating environments such as Vax, Linux, and Windows NT. 3. The class framework that we have incorporated into Sim386, enabled a change to the command line interface of the simulator that obviated recompilations for di erent modes of operation.

3.2 Adding 32-bit processing Converting from 16-bit processing to 32-bit processing required the following additions to

Sim286:

(1)

64 new addressing forms, (2) the addition of SIB byte to the instruction format, (3) adding additional instructions, (4) modifying current instructions to perform 32-bit calculations, (5) modifying segmentation, and (6) increasing the size of the bus and registers. These modi cations are described in the sections that follow. 14

Instruction

Address Operand Segment Opcode ModR/M SIB size size Override (0 or 1 bytes) (0 or 1 bytes) (0 or 1 bytes) (0 or 1 bytes) (1 or 2 bytes) (0 or 1 bytes) (0 or 1 bytes)

Displacement

Immediate

(0,1,2 or 4 bytes)

(0,1,2 or 4 bytes)

Figure 12: General Instruction Format. This format consists of one or two opcode bytes, a MOD byte, a SIB byte, an address displacement (optional), and immediate data (optional). Pre x bytes can also precede the instruction in order to override the default segment, operand, and address size used. The pre x bytes are optional. SIB (SCALE INDEX BASE) BYTE

SS 7

INDEX 6

5

4

BASE 3

2

1

0

Figure 13: SIB byte. The SIB byte consists of a 2 bit scale eld, a 3 bit index eld, and 3 bit base eld. It speci es the based indexed and scaled indexed forms of 32-bit addressing.

3.2.1 Adding additional addressing modes and the SIB byte The instruction format for the Intel386 is shown in Figure 12[1]. The major change from the format for Sim286

is that a new byte, the SIB byte, is now a part of the instruction format. The SIB byte encodes an

additional 32 forms of addressing available for each instruction. Figure 13 describes and shows the layout of the SIB byte[1]. In addition to the 32 addressing modes added by SIB byte, the ModR/M byte can now be used to specify 32 new 32-bit addressing forms along with its original 32 16-bit addressing forms. Thus, there are a total of 96 address forms that can be used in Sim386. We incorporated the additional 64 addressing forms into the existing design. In particular, we modi ed the MemOp class to decode both the 16-bit and 32-bit addressing forms given by the ModR/M byte and added the decode and interpretation of the SIB byte to get the based indexed and scaled indexed forms of addressing. Figure 14 shows the algorithm used to compute the address speci ed by the SIB byte.

3.2.2 Modifying and extending the instruction set A number of opcodes can be associated with individual instructions in the Intel instruction set. The opcodes specify di erent con gurations of operands, operand sizes (8-bit, 16-bit, or 32-bit) and addressing modes 15

unsigned char BASE = SIB & 0x07; unsigned char INDEX = (SIB & 0x38) >> 3; unsigned char SCALE = (SIB & 0xC0) >> 6; if( BASE != 5 ) address = Register[INDEX] * SCALE + Register[BASE}; else { // There are additional displacements when BASE != 5 if( MOD == 0 ) address = Register[INDEX] * SCALE + disp32; else if ( MOD == 1 ) address = Register[INDEX] * SCALE + disp8 + Register[ebp]; else if ( MOD == 2 ) { address = Register[INDEX] * SCALE + disp32 + Register[ebp]; }

Figure 14: Decoding SIB byte: This pseudo-code demonstrates a general method for calculating the correct address according to the eld values of the SIB byte. for the instruction. We use XOR to illustrate these con gurations as shown in Figure 15. A total of ve additional forms of the XOR are created with the introduction of 32-bit processing. We extended the existing instructions in Sim286 to handle these additional forms for all instructions simulated. Sim8088 Sim386,

simulates the execution of 62 instructions and Sim286 simulates 3 additional instructions. In

we added the LEAVE, HLT, and extended IMUL instructions. The LEAVE and HLT instructions

are part of the process termination code. The HLT instruction stops instruction execution and places the Intel 386 processor in a HALT state. The LEAVE instruction releases the stack space used by a procedure for its local variables. These instructions are needed to allow the executable program inputed into the simulator, an ELF binary, to exit gracefully. The extended IMUL instruction is a useful form of the IMUL instruction utilized by many input programs, and was not in the set of instructions implemented by Sim286.

3.2.3 Modifying the registers, segments, and descriptors An important di erence in the 80386 as compared to the 80286 is the increase in register size from 16 bits to 32 bits, the addition of two more segment registers, and 3 more control registers. In extending Sim286

to

Sim386,

we increased the sizes of the general and segment registers, added the two additional

segment registers, but did not add the new control registers. The three new control registers are mainly used by paging mechanisms, a feature not implemented in Sim386. In Section 3.1 of this paper, we note 16

Opcode 34 ib 35 iw 34 id 80 /6 ib 81 /6 iw 81 /6 id 83 /6 ib 83 /6 id 30 /r 31 /r 31 /r 32 /r 33 /r 33 /r

Instruction XOR AL,imm8 XOR AX,imm16 XOR EAX,imm32 XOR r/8,imm8 XOR r/16,imm16 XOR r/32,imm32 XOR r/m16,imm8 XOR r/m32,imm8 XOR r/m8,r8 XOR r/m16,r16 XOR r/m32,r32 XOR r8,r/m8 XOR r16,r/m16 XOR r32,r/m32

Description Exclusive-OR immediate byte to AL Exclusive-OR immediate word to AX Exclusive-OR immediate dword to EAX Exclusive-OR immediate byte to r/m byte Exclusive-OR immediate word to r/m word Exclusive-OR immediate dword to r/m dword XOR sign-extended immediate byte with r/m word XOR sign-extended immediate byte with r/m dword Exclusive-OR byte register to r/m byte Exclusive-OR word register to r/m word Exclusive-OR dword register to r/m dword Exclusive-OR byte register to r/m byte Exclusive-OR word register to r/m word Exclusive-OR dword register to r/m dword

Figure 15: The XOR Instruction: Each instruction in the Intel386 instruction set has multiple forms derived from di erent combinations of operands and addressing modes. that di erent initialization code for the simulator was needed to simulate di erent platforms. Much of the di erence in this initialization code stems from the di erent initial settings for the stack pointer, instruction pointer, control register, and segment registers and segment descriptors. For example, Sim286 sets the IP register to 0x100, sets SP to 0xFFFE, sets the D-bit in the code segment to 0, and maps all segments to an area of memory whose addresses range from 0x100 to 0xFFFF. These settings are all appropriate for a COM le assumed to be running on MS-DOS, a at segmentation model. In order to simulate an ELF binary object le that was compiled on a machine running Linux, a protected at model, the EIP must be set to agree with the information it extracts from the ELF object le. Also, the ESP must be set so that the stack does not interfere with any program data, the D-bit must be set in the code segment to 1, and all segments must be mapped to the area of memory whose addresses range between 0x00000000 and 0xC0000000. Both MS-DOS and Solaris use a at model of segmentation while Linux uses a protected at model of segmentation.

3.3 Adding ELF binary Input One of the most important contributions of this work is the extension to permit Sim386 to accept ELF object les as input to the simulator. The ability of Sim386 to accept ELF les enables additional functionality to be incorporated into Sim386, for example paging and multiprogramming. The addresses used in a COM le undergo a logical to linear transformation and these linear addresses are then assumed to be the physical 17

#include ELFMemory::ELFMemory(char* filename) { int fd; Elf * elf; Elf32_Ehdr *ehdr; Elf32_Phdr *phdr;

// // // //

file descriptor pointer to an elf file Elf header pointer Program Header table pointer

// Get File Descriptor if ((fd = open( filename, O_RDONLY )) == -1) exit(1); // Sync ELF versions (w/ shared library) (void) elf_version(EV_CURRENT); // Get ELF file Descriptor if ((elf = elf_begin(fd, ELF_C_READ, NULL)) == NULL) failure(); // Get Elf Header if( (ehdr = elf32_getehdr( elf)) == NULL ) failure(); // Get Program header table if( (phdr = elf32_getphdr( elf)) == NULL ) failure(); // Search each entry in Program Header Table for text segment for(int i=0; ie_phnum; i++) { if( ((phdr+i)->p_type == 1) && ((phdr+i)->p_flags == 5) ) { // *** LOAD SEGMENT HERE *** } } close(fd); }

Figure 16: ELF Parsing Code. This is C code that nds the text segment in an ELF executable. The portion of the le containing the text segment can be loaded using the information contained in the program header table entry and conventional le reading methods. addresses. However, to simulate paging, linear addresses must be translated into physical addresses by utilizing page table lookup. Thus, they cannot adequately test paging or multiprogramming capabilities in, for instance, a Solaris 2.XX or Linux operating environment. Further, compilers and assemblers that can generate object les in the COM format are not readily available so that it becomes dicult to generate test programs for the simulator. Since most executables today are in ELF format, existing test suites can be used as input to the simulator. In the next section we show, through an example, the technique for parsing an ELF object le to extract the text segment. In Section 3.3.2 we describe how to obtain the virtual address of the rst instruction in the main procedure of an ELF object le and how to simulate calls to the printf function in a shared 18

library.

3.3.1 Parsing an ELF executable The code to load an ELF binary into

Sim386

is incorporated into the class ELFMemory of

Sim386.

This

class is discussed in section 3.1 and shown in Figure 11. We parsed the ELF executable using the routines provided in the ELF access library, libelf. We present an example of parsing an ELF executable in the remainder of this section. Figure 16 contains sample code to nd the text segment in an ELF executable le. The ELF access library provides a function to determine if the ELF versions of the access library and the ELF object le are compatible. Once this determination is made, we then obtain an ELF descriptor. This provides a unique handle to the ELF object le. We obtain a pointer to the ELF header by using the elf32 getehdr() routine. The ELF header provides a map of the rest of the le. With it, we can obtain pointers to the section header table and the program header table. In this example, we retrieve the pointer to the program header table from the ELF header pointer using the elf32 getphdr(). As discussed in section 2.4.1, the program header is an array of structures where each one describes a segment. Thus, we iterate through these entries until we nd the structure whose p type and p flags indicate that the segment is the text segment. The p type and p flags are shown in the structure in Figure 7 as the rst and seventh eld in the structure. Once the entry has been found that contains the text segment, the text segment can be read from the le, using conventional methods, with the location in the le and size of the text segment acquired from the program table entry structure.

3.3.2 Executing main and printf One issue in working with ELF binaries is that they include a .init section and a .fini section. The .init section provides executable code for the initialization of the program. Since ELF assumes it is working in a multiprogramming environment, it uses this code to save registers and other system state information. Also, any shared object le included in the program also has an opportunity to run its initialization code before the call to the main program. Thus, a large amount of start up code can be executed when using a shared library before the main program is even called. Since the simulator is currently not simulating a multiprogramming environment this super uous code does not need to be executed. The issue is that the 19

void getMainPrintf( Elf *ep, Elf32_Ehdr *eh, COUNT *pc, COUNT *pf ) { Elf32_Shdr * shdr; // Section header pointer Elf_Scn * scn; // Elf section pointer Elf_Scn * symscn; // Elf symbol table section pointer Elf_Data * data; // Elf data pointer Elf_Data * strdata; // Elf data for symbol strings Elf_Data * symdata; // Elf data for symbol table entries Elf32_Sym * sym; // Symbol table entry pointer unsigned int cnt; // Count of the number of sections unsigned int nument; // Number of entries in Symbol table // STEP 1: Obtain the .shrstrtab data buffer if( ((scn = elf_getscn(ep, eh->e_shstrndx)) == NULL ) || ((data = elf_getdata(scn, NULL)) == NULL )) failure(); // STEP 2: Traverse Elf, looking for Symbol table for( cnt = 1, scn = NULL; (scn = elf_nextscn(ep, scn)); cnt++) { if( (shdr = elf32_getshdr(scn)) == NULL) failure(); // if string table, save data buffer if( (strcmp( ((char *)data->d_buf + shdr->sh_name), ".strtab") == 0) ) if( (strdata = elf_getdata(scn, NULL)) == NULL ) failure(); // if symbol table, save section and data buffer if( strcmp( ((char *)data->d_buf + shdr->sh_name), ".symtab") == 0 ) { symscn = scn; // save scn descriptor // get data for symbol table section if( (symdata = elf_getdata(scn, NULL)) == NULL ) failure(); // Calculate number of entries nument = shdr->sh_size / shdr->sh_entsize; } } // STEP 3: Search through symbol table data for( unsigned int i=1; id_buf + i*16); if( strcmp( ((char *)strdata->d_buf + sym->st_name), "main") == 0 ) *pc = sym->st_value; // assign the starting executable instruction if( strcmp( ((char *)strdata->d_buf + sym->st_name), "printf") == 0 ) *pf = sym->st_value; // record the virtual address for printf } }

Figure 17: Parsing the Symbol Table section of an ELF executable: This is an excerpt from code in Sim386 that nds the main and printf labels in an ELF executable.

20

typedef struct { Elf32_Word Elf32_Addr Elf32_Word unsigned char unsigned char Elf32_Half } Elf32_Sym;

st_name; st_value; st_size; st_info; st_other; st_shndx;

// // // // // //

Index into .strtab Virtual address Symbol's size Symbol's type and binding attributes Currently undefined Index into section header table

Figure 18: Symbol Table Structure pointer ELF supplies in the ELF header, shown in Figure 6, to the rst executable instruction references the beginning of the .init le. Another issue is that there is no way to tell when the main program ends. It is just a function call from the

.init

le and, when main terminates, control returns to that location

and then proceeds to call the .fini code which holds executable instructions that contribute to the process termination code. This is also extraneous code that does not need to be run in the simulator. However, if this code is circumvented, to improve performance, then an issue becomes how to stop the simulation. The HLT command executes after the return from the .fini code. If the .init code is eliminated it becomes dicult to know when the main function has truly ended. We decided, for reasons mentioned above, we would prefer to have the rst simulated instruction be the rst instruction in the main procedure. The label for the main procedure in the code has the same virtual address as the rst instruction in the main procedure. To nd the virtual address of the main label, we parsed the ELF executable le, found the symbol table section, and retrieved the virtual address. In the same way, we retrieved the virtual address for the printf label. For reasons also explained above, we wanted to be able to identify when a call to printf occurred so that we could handle the request locally. We describe how we obtained virtual addresses for the main and printf labels in the remainder of this section. Figure 17 illustrates getMainPrintf(), the function that we used in Sim386 to determine the virtual addresses for the printf and main labels. First, we retrieved the data for the section that contains the null terminated string names for the sections. Next, we traverse through all sections looking for the sections with the name \.strtab" and \.symtab". The .symtab section is an array of structures where each structure describes a symbol in the executable. Figure 18 shows the elds of the structures in the symbol table; for example, the rst eld in the structure is

st name,

21

an index into the

.strtab

section. The

.strtab

CALL 8048000 HLT Text Segment: main: 8048000

PUSH ebp

8049500

RET

Figure 19: Memory image of ELF object le: Sim386 inserts a CALL and HLT instruction in front of the text segment of the ELF executable to properly start and stop execution of the main procedure. section contains the null-terminated string representations of the names of the symbols in the symbol table. Once these sections are found we retrieve their associated data with the ELF access library function elf getdata().

Lastly, we search through the symbol table entries looking for \printf" and \main" record-

ing the eld st value when they are found; note that st value is the second eld in the structure depicted in Figure 18. The st value of printf is used for later comparisons when the CALL instruction is invoked. To run the st value

main

procedure, we insert a CALL instruction, that calls the

main

procedure, using the

already obtained. The call to main is placed at the beginning of the text segment that we loaded

into our simulator; the EIP is set accordingly. We then set the instruction that follows the call to main to be the HLT instruction so that the program will terminate after returning from main. Figure 19 illustrates this sequence of instructions that is added when Sim386 loads an ELF executable. Thus, the rst instruction that is executed in our simulator is a call to the main procedure, the main procedure executes, and returns to the next instruction, the HLT instruction. The execution of the HLT instruction then signals the simulator that the simulated program has completed.

4 Performance of Sim386 In this section, we report the results of some experiments that gather timings for for the Intel 80386 processor. and

Sim386

Sim386

Sim386,

our simulator

can accept either COM or ELF binary input. Since both

Sim286

accept COM les as input, we compare timings for these two simulators using a test suite 22

Program bbk gauss isort livermore matmult normal sieve tiling transpose

No. instr 286 No. instr 386 286 COM 386 COM 386 ELF (COM input) (ELF input) (seconds) (seconds) (seconds) 797 135 0.14 0.15 0.03 9,336,673 13,436,169 1806.9 1828.5 2875.28 50,609 52,982 9.22 9.33 10.63 8,815 5,218 1.67 1.7 1.07 5,290,788 8,900,880 930.89 941.41 1905.74 3,613,580 8,633,740 651.57 660.57 1889.9 224,961 186,875 46.06 46.5 43.12 4,506,527 6,872,184 829.48 841.03 1449.97 2,641,436 3,056,643 489.48 495.04 645.49

Figure 20: Performance Results for the Test Suite of 7 programs. of nine programs with COM binary input. We also report timings for

Sim386

using ELF binary input.

All experiments with the test programs were conducted on a Gateway 2000 with a 200 MHz Pentium Pro processor running the Linux Red Hat 5.0 operating system. The programs were executed ten times and the execution times reported in this chapter are averages over these ten executions. To create COM binary executables for both Sim286 and Sim386, we use the Borland 4.5 C compiler to produce 8086 assembly code. The 8086 assembly code is assembled using Wolfware Assembler, or WASM[9], to create an executable COM le. To create ELF binary executables for Sim386, we use the gcc C compiler version 2.7.2.3 with O2 optimizations. The test programs listed in column one of the table in Figure 20 include a program to compute Fibonacci numbers, fibbk; a program that uses Gaussian elimination without pivoting, gauss[10]; an insertion sort, isort;

the rst Livermore loop, livermore[5]; matrix multiplication, matmult[10]; a program to transform a

matrix into Hermite normal form, normal[10]; the sieve of Erosthothenes, sieve; a program that uses tiling to optimize data cache references, tiling[2]; and a program to perform matrix transposition, transpose. Our experiments indicate that for COM le input, Sim286 is, on average, 2.06 percent faster than Sim386 when using the test suite of nine programs. Additional logic is included in Sim386 to check for operand size during processing.

Sim386

now includes an additional check for 32-bit operands; this check is not a part of

Sim286.

The table in Figure 20 also indicates that six of the nine programs slow down for ELF binary input and the three others are faster. For example, gauss, matmult, normal, sieve, tiling and transpose slow down when ELF binary input is used. Of these six programs.

sieve

uses a one-dimensional array and the

other ve programs use two-dimensional arrays. When performing array computations the SIB byte is used 23

heavily by the gcc compiler.

Sim386,

in determining the additional addressing modes provided by the SIB

byte, requires more time when simulated in software but is more ecient when executed in hardware. Thus, for the test suite, those programs that make heavy use of array computations are slower when they accept ELF binary input.

5 Future Work In this section, we suggest re nements to Sim386. The aim of these is to facilitate modi cation and extension of

Sim386

and to exploit more features of ELF object les. In particular, we suggest an approach for

refactoring the class framework of Sim386 in section 5.1. In section 5.2, we discuss dynamic linking of shared libraries during the execution of an ELF object le.

5.1 Approaches for Improving the Design of Sim386 The design of Sim286 facilitated the extensions and modi cations we made in incorporating features of the Intel386 processor. The use of inheritance in the design of Sim286 minimized the impact of making changes to the simulator. For example, in the extension of Sim286 to include ELF executables, only the Memory class needed modi cation. The modi cations to the Memory class are outlined in Section 3.1. Another example of the extensibility of Sim286 is the addition of instructions to the simulator. In adding instructions, we only needed to derive our class, for the instruction, from a choice of existing classes. The choice of which class to inherit from depended on the number of operands to the instruction. Only the class CpuDecodeEvent and, occasionally, ExtendedCpuDecodeEvent also needed to be informed of the new instruction. Refer to Figure 4 for an illustration of the class framework of Sim286. However, in extending Sim286 to include 32bit processing, we did nd some areas in the design that could be improved. For example, the extension of Sim286 to include 32-bit processing required changes to 85% of the classes in the simulator.

These de ciencies

in the design are a direct result of Sim286 adding the functionality of 16-bit processing without appropriate extensions in the design to maintain a high level of extensibility and ease of modi cation in the changing of processing size. Figure ?? illustrates history of design and implementation extensions of processing sizes in Sim8088/86, Sim286

and Sim386.

Our suggestions for future work, on the design of Sim386, are aimed at making Sim386 more extensible 24

Simx86

Design

Implement

Sim8088/86

Sim286

Sim386

Implement

Implement

Design

Implement

Figure 21: Design and Implementation History. This gure captures the design and implementation progress from the framework of classes for the simulators, Simx86, to the current simulator, Sim386. to modi cations involving changes to the size of the processing environment. For example, we would like the design of

Sim386

to better facilitate an extension to 64-bit processing. One such improvement we

suggest is the inclusion of templates in classes that are dependent on the current processing size of the simulator.

Sim286 does

contain one templated class, the Register class. This is a good example of the type

of class that would bene t from templating. To convert the registers in the simulator to a size of 32-bit only required one change. A non-inclusive list of other classes that would become more extensible through the use of templating are:

SegRegister, SegDescriptor, DescriptorTable, VirtualAddress, PhysicalAddress,

and Instruction. Another suggested improvement to the design of Sim386 is redesigning classes that alter their processing depending on the current size their arguments. A number of the classes check the size of their operands (8-bit, 16-bit, or 32-bit) and perform disjoint operations based on this determination. We suggest redesigning these classes by moving size speci c processing into subclasses leaving the original class as a common interface to the rest of the simulator. For instance, the class used to determine the current addressing mode, MemOp, could have been greatly simpli ed and made more extensible if it had been changed to a virtual base class with one class, say MemOp8, to handle 8-bit processing and another class, say MemOp16, to handle 16-bit processing. This would have made Sim286 easier to modify to include 32-bit addressing modes (SEE FIGURE 22)

5.2 Dynamic Linking with ELF Object Files A simulator that accepts binary object les as input must incorporate a technique for linking calls to functions that reside in a shared library. When the current implementation of Sim386 encounters a call to a function in 25

MemOp General Processing

MemOp8

MemOp16

MemOp32

Specific 8-bit Processing

Specific 16-bit Processing

Specific 32-bit Processing

Figure 22: A redesign of the MemOp class a shared library, the simulator intercepts the function call and processes the request locally. For example, a call to the shared library function printf is intercepted and an actual printf is executed by the simulator. For simplicity, the current implementation restricts the printf to a single integer parameter. To fully simulate an ELF binary, functionality for performing dynamic linking must be incorporated into the simulator. In general, an ELF executable le can specify a program interpreter in the ELF program header that is loaded into memory rst; this interpreter then controls the environment for the application. The interpreter is speci ed by setting the p type eld to PT INTERP for the appropriate segment structure in the program header; gure 7 illustrates the segment structure. The PT INTERP value in the

p type

eld indicates

to the system that this entry is a segment that speci es a path to a program interpreter. The program interpreter is set to the system's dynamic linker when dynamic linking is used by the ELF executable. The ELF executable has stored data in the .dynamic, .hash, .got, and .plt sections that assists the dynamic linker in the running of the ELF executable. Figure 9 gives a brief description of these sections. Linking calls to functions in the shared library represents a subset of the services that the simulator must perform to truly emulate dynamic linking. In simulating the dynamic linking, Sim386 must handle all services requested from the ELF executable to the shared object library.

6 Concluding Remarks In this paper we have described the design and implementation of Sim386, a partial simulator for the Intel 80386 processor. Our simulator performs both 16 and 32-bit processing and accepts both COM and ELF 26

binary input. We also provide an overview of previous simulators in the Simx86 family and an overview of the layout and design of ELF binaries. Finally, we report some preliminary results of our experiments with seven test programs.

References

[1] Intel Corporation. Intel386 SX Microprocessor Programmer's Reference Manual. Intel Literature Sales, 1998. [2] M. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. Proceedings of Fourth Conference on Architectural Support for Programming Languages and Operating Systems, pages 63{74, April 1991. [3] Hongjiu Lu. ELF: From the programmer's perspective. www.cinfo.ru:8030/linux/WWW/www debian.org/Documentation/elf/elf.html, 1998. [4] B. A. Malloy and S. Chitre. Extending simx86 to include prefetching, segmentation, virtual memory addressing and protection mode. Proceedings of the 1998 Conference on Object-Oriented Simulation, pages 39{44, January 1998. [5] F. H. McMahon. FORTRAN CPU performance analysis. Lawrence Livermore Laboratories, 1972. [6] James Rumbaugh, Michael Blaha, William Premerlani, Fredrick Eddy, and William Lorensen. ObjectOriented Modeling and Design. Prentice-Hall, 1991. [7] A. R. Shealy, B. A. Malloy, and D. A. Sykes. Simx86: An extensible simulator for the intel 80x86 processor family. Proceedings of the 30th Annual Simulation Symposium, pages 157{166, April 1997. [8] Tool Interface Standards. ELF: Executable and Linkable Format. ftp://ftp.intel.com/pub/tis, 1998. [9] Eric Tauck. WASM 1.0: Wolfware Assembler for the IBM Personal Computer. Wolfware, 1985. [10] M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, rst edition, 1996.

27

Suggest Documents