Abstract. We present an educational software package (Cachesim) used as a ... a simulation environment of computer with a cache memory [Smith82, Prete91,.
CACHESIM: A GRAPHICAL SOFTWARE ENVIRONMENT TO SUPPORT THE TEACHING OF COMPUTER SYSTEM WITH CACHE MEMORIES
COSIMO ANTONIO PRETE Dipartimento di Ingegneria dell'Informazione: Elettronica, Informatica, Telecomunicazioni Facoltà di Ingegneria, Università di Pisa, Via Diotisalvi, 2 - 56126 PISA (Italy)
Abstract. We present an educational software package (Cachesim) used as a teaching tool to study and analyse computer with cache memory. Cachesim allows students to execute step-by-step a program, to observe the cache activity needed for a memory operation, to evaluate the system performance by varying the program and/or the cache parameters and, finally, to analyse the program behaviour by the memory references. The user interface is fully graphic: Architectural modules of the simulated computer are managed as graphical objects and the main actions on them can be made by mouse clicks. The environment is based on VGA card and can be used on both MS-DOS and Windows platforms. This paper describes the software package and the simulated computer features by examining a student's exercise.
Keywords: Education Techniques, Simulation, Computer Architecture, Cache memory
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
1. INTRODUCTION Sometimes, lessons based on physical systems do not produce good results as these systems do not give or give little useful information as far as understanding the system operating is concerned. More information can be obtained by adding, to the system, adhoc hardware which builds trace files for recording the events and the actions of the system. Moreover, this technique is not always applicable as in certain systems it is not possible detect all the events. The main raisons that prevent detection of the events are: i) the high number and high frequency of these events may require a too expensive acquisition system; and ii) several events happen within the chips. For these reasons, in basic and advanced courses of Computer and Electronics Engineering at the Pisa University (Italy), the teachers continue to use simulation environments as learning tools in computer science [Corsini88, Cinquini90a, Cinquini90b, Domenici89]. Simulators offer several advantages compared to experimenting on a physical system. In the teaching of Computer Architecture, a simulator allows us to observe and trace actions and events that usually couldn't be observed in a physical system. The observation of such actions and events, in several cases, facilitates the functional and design aspect learning of a computer system [Shmidt81, DeBlasi88]. Simulators offer other advantages, such as: i) the possibility to have several computer architectures and to compare them in various application areas; ii) the possibility to evaluate new solutions in a short time; and iii) the low cost of a virtual laboratory. In this paper, as an example of teaching techniques based on simulation tools, we present a simulation environment of computer with a cache memory [Smith82, Prete91, Prete92]. The simulator (called Cachesim) allows the student to observe the CPU and the cache activities during the execution of a read or a write memory operation, to evaluate the system performance, to analyse the reference locality [Denning72] and the distribution of memory accesses due to the program execution. A student organizes an exercise in three phases: configuration, simulation and analysis. In the first one, students write a program in assembly language (like the Assembly Intel 8086) and then configure
2
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
the system. For cache memory (see Appendix A) the student chooses: i) the cache capacity, ii) the placement policy (direct, full or set associative mappings), iii) the cache block size, iv) the degree of set associative (in the case of set mapping), v) the main memory update policy (write-through or copy-back) and, finally, vi) the block replacement policy (FIFO, random or LRU). For I/O devices, the student specifies the I/O type (monitor, keyboard or general purpose), the synchronisation scheme (none or handshake), the interrupt scheme (none, vectored interrupt or non vectored interrupt) and the addresses of the relative device registers. Finally, the student indicates the main memory size. At the end of the configuration phase, students start the simulation on the execution of a memory operation or of a program. The simulator however illustrates, by drawings, i) the actions sequence necessary to the computer to perform the required memory operation and ii) cache and main memory events. At the end of simulation, Cachesim produces some statistical results about the cache and processor activities. In the analysis phase, finally, students can see their solution performance, analyse the cache use, the memory location distribution accessed by the program and the program locality. The program is in C language and may be executed by a Personal Computer with MSDOS or Windows. The assembler (see Appendix B) and the disassembler are made using Lex and Yacc tools. In the following Sections we explain Cachesim by an illustrative example.
2. THE CONFIGURATION PHASE
The simulator starts with the command "CACHESIM": after the presentation screen (see Figure 1), the user can choose the cache memory structure, the main memory size, the kind of microprocessor (actually, only an Intel 8086-like microprocessor is available), the parameters of simulated I/O devices and, finally, the source program. In Figure 2 we can see some choices made for our example: a 64-kbyte main memory and a 8-kbyte
3
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
two-way set associative cache, 16-byte block, Least Recently Used (LRU) as replacement policy and copy-back as main memory update policy.
Fig. 1. Initial window of Cachesim. Now, the student can go ahead with simulation phase and Cachesim draws icons representing the main memory and the cache. The contents of data field, tag field, valid bit and modified bit (for copy-back caches) of any cache block can be examined by clicking on the cache icon and then on icon of a particular cache block. The processor registers can be examined and changed by the "PROCESSOR_INFO" sub-menu.
4
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
3. THE SIMULATION AND ANALYSIS PHASES
In simulation, Cachesim can work in one of following three modes: single: the student can ask for the execution of a memory operation by specifying the memory address and the operation type. trace: the student can execute step by step a program, a memory operation or a instruction a time. exe: the student can ask for the execution of a program.
Fig. 2. Configuration of the cache memory.
5
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
Fig. 3. Example of Cachesim operating in single mode.
3.1 THE SINGLE MODE
Cachesim executes a single memory operation and shows, by drawings, the cache and main memory events, and the sequence of actions necessary to perform the required memory operation. Now we will show the behaviour of Cachesim operating in single mode (Figure 3). The simulator expects a memory address and the memory operation type. Let's suppose that the student asks for a read on location (1FD4)16. The student begins simulation by pressing the start button and, similarly, by pressing the step button he/she may examine
6
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
the actions sequence made by the cache. In the example, cachesim shows that: 1) the memory block (1FD)16 is not present in cache memory (miss condition), 2) the victim cache block to be replaced is the block 0 in the set (FD)16; 3) it is not necessary to update the main memory block because the bit M is 0 (that is, the copy is not modified); 4) the cache loads the memory block (1FD)16 and so on. In the bottom of picture, Cachesim summarizes the operation information and shows as the cache uses the memory address. In our example, the address (1FD4)16 is split in (01)2 used as tag field (that the cache compares with the tag field of both blocks of set (FD)16); in (11111101)2 as set field (that the cache uses as set address) and, finally (0100)16 as word offset in the block. To understand the cache structure and how it operates, the student can open a window representing the cache logical scheme and the relative use of the above address field.
3.2 THE TRACE AND EXE MODE
Both in the trace and the exe mode, Cachesim executes a program. If Cachesim operates in trace mode, it executes an instruction or a memory operation at a time and shows trace information. In case of memory trace, it executes the program and shows, for each memory operation: i) the memory address, ii) the cache access result (hit or miss) and iii) the tag field, the valid bit and the modified bit (in copy-back mode) of cache block involved in the operation. In case of instruction trace, it provides: i) the next instruction, ii) the fetch address, iii) the actual memory address and the operation type, iv) the tag, the set and disp fields of the address (Figure 4). In any case, Cachesim also shows the cache and the main memory icons by highlighting the cache and memory blocks involved in the operation.
7
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
Fig. 4. Cachesim operating in trace mode. The student can choose, using the mouse, between instruction trace and memory access trace.
At the end of the simulation, both in trace and exe mode, it proposes three recapitulating windows. The first one shows the following percentages (as bar chart): i) the hit and miss conditions; ii) hit and miss conditions occurred during fetch, data and stack operations; and iii) read and write operations. The window summarizes also the numbers both of block read and write operations performed in main memory. Figure 5 shows the above percentages belonging to the following sorting program.
8
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
PROG CODE BEGIN
A
B
C
SECTION MOV R1,#100 SUB R1,#1 MOV R2,R1 SUB R2,#1 MOV R3,VETT[R1] CMP R3,VETT[R2] JNE B CALL EXCHANGE SUB R2,#1 JGE A SUB R1,#1 JL C MOV R2,R1 SUB R2,#1 JMP A STOP
EXCHANGE PUSH R3 PUSH R4 MOV R3,VETT[R1] MOV R4,VETT[R2] MOV VETT[R1],R4 MOV VETT[R2],R3 POP R4 POP R3 RET ENDC BEGIN DATA SECTION STACK 20 ENDS DATA SECTION VETT DATA 100 (15 6 8 ..... 5 78) ENDS ENDP
Fig. 5. Statistics of cache operations.
9
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
The second window shows 7 diagrams: memory accesses, misses, hits, reads, writes, data accesses and fetches versus cache blocks are charted. The zoom of a specific diagram is obtained by clicking on the relevant diagram icon. In Figure 6, the student has zoomed on total accesses diagram and has opened a window to find out the statistics of the set number 4.
Fig. 6. Statistics of cache operations versus cache blocks. The third window summarises the main statistics by specifying total accesses and the accesses for fetch, data and stack operations. In any case, the accesses are also specified in terms of read operations, write operations, hit conditions and miss conditions. The number of memory blocks transferred into the cache, blocks write back in memory (only
10
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
for copy-back caches) and bytes transferred on the bus on write operations (only for write-through caches) are also displayed.
Fig. 7. Distribution of memory accesses. Finally, Cachesim records the address sequence produced by the program execution. The analysis of this sequence produces two windows. The first shows the frequency distribution (versus the memory address) of the addresses for write, data and fetch accesses. In figure 7, the student analyzes the address distribution of data accesses from (420)16 until (45E)16. By starting of the address sequence: a0, a1, a2, ...., ai-1, ai, ....., an-1, an
11
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
Cachesim calculates the following sequence: a1 - a0, a2 - a1, ...., ai - ai-1, ....., an - an-1 The distribution of the number sequences produced by data and fetch accesses are presented in a window (Figure 8). These distributions give the idea of program locally
Fig. 8. Examination of the program locality in data area. both in data and code areas. In particular, Figure 8 shows that the locality in code area is more marked that in data area. 4. CONCLUSIONS
12
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
Teaching tools based on simulator are able to show and analyse the features and the problems of complex system. As an example, we have shown a teaching tool which simulates a standard computer system. The package was written in C programming language and it could be used on a Personal Computer with MS-DOS or Windows operating systems. The user interface is fully graphic. Now we are working to expand this environment by giving full details of the other computer modules and by considering also multiprocessor architecture [Domenici89, Prete91, Prete92] and memory management units.
ACKNOWLEDGEMENTS This work has been supported by the "Ministero dell'Università e della Ricerca Scientifica e Tecnologica" (Ministry of University and Scientific and Technological Research), Italy. Many people contributed to the Cachesim design; the author thanks S.Cinquini for his help in the design phase, and M. Giusti, R. Storti and F. Vernia for their contribution to the code phase. The author would also like to thank the students for their comments on improvement of the man-machine interface and for their contribution to program testing.
REFERENCES Cinquini90a
Cinquini, S. and Prete, C.A., "An Interactive Software Environment to Help in the Teaching of Cache Memories", Proc. of Third biennial meeting on Microcomputers and their applications, Education and Application of Computer Technology, Community of Mediterranean Universities, M. De Blasi, E. Luque, E. Scerri (eds.), Spain, (1990) pp. 295-306.
Cinquini90b Cinquini, S. and Prete, C.A., "Teaching in Computer Architecture based on
simulation environments", Proc. of First World Conference on Parallel Computing: In Engineering and Engineering Education, UNESCO, Paris, (1990) pp. 39-43. Corsini88
Corsini, P. and Prete, C.A., "SYNCONET: A Tutor for the Synthesis of Combinational Networks via Karnaugh Maps and Prime Implicant Charts"
13
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
Proc. of Second biennial meeting on Microcomputers and their applications, Education and Application of Computer Technology, Community of Mediterraneam Universities, M. De Blasi, J. Donio, E. Luque, E. Scerri (eds.), Malta, (1988) pp. 687-692. DeBlasi88
De Blasi, M. and Tangorra, F., "A Prolog Simulator for the Teaching of Computer Architecture" Proc. of Second biennial meeting on Microcomputers and their applications, Education and Application of Computer Technology, Community of Mediterraneam Universities, M. De Blasi, J. Donio, E. Luque, E. Scerri (eds.), Malta, (1988) pp. 263-278.
Denning72
Denning, P.J. On modeling program behaviour. In Proc. of the Spring Joint Computer Conference. AFIPS Press, Arlington, Va., 40, (1972), pp. 937944.
Domenici89
Domenici, A., Lazzerini, B. and Prete, C.A., "A Synthetic Trace Generator for Multiprocessor Performance Evaluation", Proc. of 3rd Inter. Symp. on Multiprocessor System, Stralsund, G.D.R., Vol. 1, (1989) pp. 242-253.
Prete91
Prete, C.A., "The RST cache memory design for a tightly coupled multiprocessor system", IEEE Micro, vol 11, n. 2 (April 1991), pp.16-19 40-52.
Prete92
Prete, C.A., "A Process Cache Memory for Tightly Coupled Multiprocessor Systems", Proc. of 30-th Annual Southeast Conference, Cherri M. Pancake and Douglas S. Reeves, Eds., Raleigh, North Carolina, (April 1992), pp. 131-138.
Shmidt81
Shmidt, J. W., "Fundamentals of digital simulation modeling", Proc. of Winter Simulation Conference, T.I. Oren, C.M. Delfosse, C.M. Shub (eds.) Atlanta, GE, (1981) 13-21.
Smith82
Smith, A.J., "Cache memories", ACM Computing Surveys, vol. 14, no. 3, (1982) pp. 473-530.
14
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
APPENDIX A: THE MEMORY CACHE
A cache memory is a high-speed buffer, placed between the processor and the main memory, that temporarily holds the data and the instructions which have been used most recently. Its success has been explained in relation to the property of locality. Generally, programs spend most of their time executing repeatedly a few tight loops of code; read access to that code and to the relative data is faster if the code and the data are held in cache memory. Write accesses are more complex, because the cache must update its copy of data and the one in the main memory. There are two main memory update policies: write-through and copy-back. In the first case, during a write operation the data are copied both in cache and in main memory. Therefore, the cache and main memory are always consistent. In the copy-back policy, the data are copied only in the cache, and the main memory is updated only when the cached copy that has been modified must be overwritten by another copy. A cache memory is made up of a set of blocks (called cache blocks or cache lines), each of them able to store the copy of the contents of a memory block (i.e., a sequence of memory locations). A copy can be stored in a cache on the basis of a placement function (direct addressing, set associative and full associative). The structure, the management and the performance of the cache itself depend on this function. In case of direct addressing, a specific memory block can be stored in only one cache block; in the case of set associative cache it can be stored in a cache block of only one set. Ever set can contain 2, 4 or 8 blocks and the cache is defined two-way, four-way and eight-way set associative, respectively. In the fully associative cache, a memory block can be placed anywhere in the cache. In the case of miss conditions, set and full associative caches chose a victim block to replace it. The most used replacement policies are: Least Recently Used (LRU), First In First Out (FIFO) and Random. In the miss condition for direct addressing ones, the cache does not have the problem of victim block selection because only one cache block can store the copy.
15
Proc. of 7-th SEI Conf. on Software Engineering Education, Springer-Verlag, January 1994, San Antonio, pp. 317-327
APPENDIX B: THE PROCESSOR FEATURES AND THE INSTRUCTION SET
The features of the processor simulated by CACHESIM are the following: the processor can operate on 64-Kbyte main memory, each memory operation involves a 16-bit word, and the processor includes a program counter and a stack pointer (PC and SP), a flag register and four 16-bit general registers R1, R2, R3 and R4. The general registers are directly accessible, the flag register is used only by some "jump on condition" instructions and contains the number of flags needed to code the following conditions: less than 0; greater than 0 and equal to 0 (overflow condition is not considered). The instruction set is a simplified version of the Intel 8086 set. Considering the teaching goal of the project, the instruction set includes the minimum number of instructions that are needed to write any program. The addressing modes are: immediate, register, direct and index. In two-operand instructions, one of them is a general purpose register. The language includes a set of directives to: i) indicate the program entry point; and ii) declare stack and data area. Table 1 synthesizes the instructions and their formats.
INSTRUCTIONS MOV dst src DIV src OR dst src JMP label JNZ label JGE label OUT src
ADD dst src CMP dst src XOR dst src JE label JL label CALL label NOP
SUB dst src NOT dst PUSH src JZ label JLE label RET STOP
Tab. 1. The instruction set.
16
MUL src AND dst src POP dst JNE label JG label IN dst