Session T3E
ARMs for the Poor: Selecting a Processor for Teaching Computer Architecture Alan Clements University of Teesside,
[email protected] Abstract – Teachers of computer architecture and organization courses have to choose a target processor to illustrate the basic principles of instruction set design. In this paper we suggest that it is time to choose the ARM processor architecture that is markedly different to those used in most current courses. A specific computer architecture is required as a vehicle to teach about registers, addressing modes, instruction types, and so on. Resorting to a hypothetical teaching machine reduces the student’s learning burden and makes their learning curve shallow, but failing to introduce them to the complexities they will encounter in the real world can destroy their motivation. Teachers are concerned not only with covering a body of knowledge; they must motivate students and create a sense of excitement. In a discipline as rapidly changing as computer science, only those students who can adapt to change are likely to thrive over the four or more decades of their career. This paper explains why the ARM architecture is an excellent vehicle for teaching computer architecture; in particular, its predicated execution, inclusion of shifting in all data-processing instructions, and its compressed code (Thumb) mode. Moreover, the ARM has a RISC architecture with some traditional CISC architectural features. Index Terms – Computer architecture education, microprocessor instruction sets, selecting a microprocessor. COMPUTER ARCHITECTURE CURRICULUM Computer architecture is a key component of degree courses in computer science; in particular, the joint ACM IEEE Computer Society Computing Curriculum spells out what should be included in the core curriculum for computer architecture [1] [2]. There is a widespread consensus on the content of computer architecture courses, although, in the UK, there is a growing tendency to combine architecture with computer networks or operating systems because of the way in which curricula overlap. Table 1 lists the key components of the computer architecture curriculum proposed by the recently revised CC2001 report. Most of the topics refer to elements of the computer system other than the CPU itself. Table 2 expands the curriculum and includes the learning objectives for the CPU [1]. Note that no specific computer is specified and the individual teacher is free to choose a suitable example.
Table 2 demonstrates that the intent of the curriculum is to cover the underlying principles of the operation of the computer and not the details of either its low-level programming or the characteristics of a particular machine. TABLE 1 PROPOSED ARCHITECTURE IN THE REVISED CURRICULUM 2001 Digital logic and computer arithmetic Computer architecture Interfacing and communication Memory system organization and architecture Functional organization Multiprocessing and alternative architectures Performance acceleration Architecture for networks and distributed systems Devices; New directions in computing TABLE 2 PROPOSED ARCHITECTURE IN THE REVISED CURRICULUM 2001 Computer architecture [core] 1. Overview of the history of the digital computer 2. Introduction to instruction set architecture, microarchitecture and system architecture 3. Processor architecture – instruction types, register sets, addressing modes 4. Processor structures – memory-to-register and load/store architectures 5. Instruction sequencing, flow-of-control, subroutine call/return 6. Structure of machine-level programs 7. Limitations of low-level architectures 8. Low-level architectural support for high-level languages Learning objectives: • Describe the progression of computers from vacuum tubes to VLSI. • Appreciate the concept of an instruction set architecture, ISA, and the nature of a machine-level instruction in terms of its functionality and use of resources (registers and memory). • Understand the relationship between instruction set architecture, microarchitecture and system architecture, and their roles in the development of the computer. • Be aware of the various classes of instruction: data movement, arithmetic, logical, and flow control. • Appreciate the difference between register-to-memory ISAs and load/store ISAs. • Appreciate how conditional operations are implemented at the machine level. • Understand the way in which subroutines are called and returns made • Appreciate how a lack of resources in ISPs has an impact on highlevel languages and the design of compilers • Understand how parameters are passed to subroutines and how local workplace is created and accessed.
THE PROFESSOR’S DILEMMA In practice computer architecture is taught by real professors to real students; and that rather complicates matters. A
978-1-4244-6262-9/10/$26.00 ©2010 IEEE October 27 - 30, 2010, Washington, DC 40th ASEE/IEEE Frontiers in Education Conference T3E-1
Session T3E glance at typical computer architecture texts [4]-[5], demonstrates that authors usually select a real commercially available device as a vehicle to illustrate the course; for example the Motorola 68K, the Intel Pentium 4, or MIPS. Why do professors make life difficult for themselves by using CPUs that were made by engineers wanting to maximize market penetration and company profits? Why don’t they specify a simple, hypothetical teaching machine and make it easier to teach the subject? Some professors do invent their own machines; I do myself for the first two weeks of the course. Most do not. Professors I have spoken to say that students do not want to use hypothetical hardware because they feel it is unrealistic and does not give a true picture of the real world they will soon be entering. Moreover, I find that students prefer to use hardware like the PC because they feel familiar with it. When I based my courses on the 68K microprocessor, it was well received by students in the days of the Apple Mac that incorporated it. When the 68K was dropped by Apple, students became less enthusiastic.
microprocessors now account for 75% of the world’s embedded 32-bit applications [8]. As we have already stated, students are more likely to be motivated if they study a processor that is in their cell phones. ARM in a Nutshell As there is insufficient space here to discuss all the ARM’s attributes, we list its key features and highlight how they differ from other processors, pointing out the pedagogical advantages. •
THE PROFESSOR ARMED The most visible role of a professor is to teach a student a given body of knowledge and to examine the quantity and quality of their knowledge. The real job of the professor is to instill in the student a love of the subject [6]. Without that, it’s difficult to transform the student into an autonomous learner who will work independently and continue to build on the course after its end. I decided to change the processor I use to teach computer architecture from the Motorola 68K to the ARM family. The principle reasons for making the change are that the ARM covers the requirements of existing curricula, is easy to learn, and has an elegant and sophisticated architecture. Moreover, it is widely found in real systems.
•
ARM – the Background Microprocessors used as vehicles to teach computer architecture are often mainstream industry-standard devices like the Motorola 68K or the Intel IA32. The processor that is the subject of this paper, the ARM, is beginning to appear in mainstream texts [4]-[5]-[7]. It is an unusual processor from a Cinderella of the microprocessor world rather than a giant like Intel or Texas Instruments. Although a RISC processor like the high performance MIPS or PowerPC, it is found in low-cost consumer applications such as PDAs and cell phones. Its characteristics make it stand out from other processors. It has a delightfully simple core architecture, there are development tools freely available, and it fully supports the computer architecture curriculum. Advanced RISC Machines Ltd. was founded in the UK in 1990 and changed its name to ARM Ltd. Unusually, ARM does not manufacture microprocessors. It is an IP (intellectual property) company that designs systems and licenses other companies to make them; for example, ARM microprocessors are manufactured by Intel, Texas Instruments, and Samsung. Indeed, ARM 32-bit
•
Register set: The ARM has 16 general-purpose registers r0 to r15 – the same as the 68K and less than typical RISC processors with 32 registers. Register r15 holds the program counter which is very unusual (the program counter is normally hidden from the user and cannot be directly accessed). As the program counter is visible, the student can read its contents and even modify them to perform jumps. This feature allows a class discussion of the advantages and disadvantages of general purpose register sets in contrast with special purpose register sets. One of my prime teaching objectives is to demonstrate how machine manufacturers have to make choices and how those choices affect future performance and applications. Instruction Set: The ARM has a RISC load/store (register-to-register) architecture. RISC processors are called 1 ½ address machines because they permit operations of the form ADD T1,D1 which adds the contents of memory location T1 to register D1 and puts the sum in D1, overwriting the old value. The term 1 ½ is used (sarcastically) to indicate a full memory address and a short register address. RISC processors permit data processing operations only on registers and provide instructions of the form ADD r1,r2,r3 where register r2 is added to r3 and the sum put in r1 (the destination register is in bold font). The only memory operations are load a register from memory and store a register’s contents in memory. A load/store computer reduces the student’s burden because he or she does not have to remember what addressing mode each instruction can use. Instruction types: The ARM has a conventional integer data-processing instruction set with traditional arithmetic, logical, and shift operations (although the shift is implemented in an unusual way). One special instruction is the MLA (multiply and add) that takes four operands and has the form MLA r1,r2,r3,r4. Its effect is r1 = r2*r3 + r4; that is, it calculates a product and adds it to a previous value. This seemingly innocuous instruction is at the heart of many signalprocessing operations (used in audio and video applications). It is able to implement the inner product of two vectors efficiently (i.e., s=a0.b0+a1.b1+a2.b2…). The pedagogical advantage of this instruction is that it allows you to introduce modern applications such as multimedia and graphics.
978-1-4244-6262-9/10/$26.00 ©2010 IEEE October 27 - 30, 2010, Washington, DC 40th ASEE/IEEE Frontiers in Education Conference T3E-2
Session T3E •
•
•
•
Subroutine call: Two subroutine call mechanisms are widely in use. CISC processors are stack-based. They push the return address before a subroutine call and end a call by restoring the address from the stack. This process is handled automatically in hardware and uses a call instruction, JSR, and a return instruction, RTS. RISC processors increase speed by saving the return address in a register prior to a call and moving the return address to the program counter to return. This mechanism is very fast because it does not access external memory. It does not permit nested subroutines unless the return address is saved in memory. The ARM implements a RISC like call/return mechanism but it also provides a conventional stack mechanism which gives the programmer the best of both worlds. The pedagogical advantage of these features is that you can compare and contrast the two call mechanisms and the students can investigate them for themselves. Shadow registers: Shadowing, where two physical memory locations share the same logical name, is an important concept. For example, the 68K has two physical stack pointers with the same name. One stack pointer is visible to the user programmer and one to the operating system. By using different physical pointers, an application program can’t corrupt the operating system stack. Shadowed registers allow the professor to mine a rich vein of system security and reliability. The ARM has several shadowed registers and the physical instance is determined by the interrupt and exception handling mechanism. When the ARM is interrupted, a new bank of shadowed registers is switched in. This allows an interrupt handler to access a clean set of registers and avoid saving pre-interrupt data that is in use elsewhere in the program. Shadowing enables the teacher to demonstrate how special-purpose hardware increases performance. It also provides an opportunity to discuss hardware-software tradeoffs. Literals: All computers provide a means of loading a literal (immediate value). The ARM deals with literals in a unique way by providing a 12-bit value where 8 bits specify the significant bits and 4 bits specify a multiplier; for example, the literals 8416 or 840016 can both be specified in 12 bits. This mechanism reinforces notions of exponents and mantissas that appear in floating-point arithmetic, as well as the concepts of range and precision. Shift instructions: The ARM implements a zerocycle shift by incorporating a shift as part of other data processing instructions. Because of its unusual characteristics we deal with it separately HIGHLIGHTS OF THE ARM INSTRUCTION SET
These are: the shift, predicated execution, and addressing modes. A shift operation moves a string of bits by one or more positions left or right. The difference between shift operations depends on: • the direction of the shift (left or right), • the number of shifts – one or more places, • dynamic/static shifts (a dynamic shift permits the number of places shifted to be changed at run-time by using a variable in a register), • the type of shift – arithmetic (preserves the sign), logical, circular (the bit shifted out at one end is shifted in at the other end), extended (the shift takes place through the carry bit to allow multipleprecision arithmetic). The ARM implements shift instructions but in an entirely unusual way. The computer architect is engaged in an eternal struggle to minimize the time taken to perform operations. A designer’s ultimate goal is the zero-cycle instruction that takes no time to execute. Such an operation is impossible, but the effect of a zero-cycle instruction can be created by hiding the operation. Consider the following non-ARM code. ASL r0,#4 ;shift contents of register r0 left 4 places ADD r1,r1,r0 ;Add the contents of r0 to r1 The time taken to execute this code is two cycles. The ARM implements shifts ingeniously by shifting the second operand during a data-processing instruction. High-speed on-chip logic implements the shift by directly routing bits from the source to their destination in a network called a barrel shifter. A typical ARM shift is written: ADD r1,r1,r0 ASL #3
;shift r0 left before adding
and implements [r1] = [r1] + 8 * [r0]. The shift and addition are performed in a single cycle. To perform a shift without data processing, a shift can be placed in the data path of a move operation; that is, MOV r1,r1,ASL #3 ;shift r1 left before moving to r1. There is quite a pedagogical significance in this operation. A little thought and ingenuity in the design of the ARM’s architecture has significantly increased performance without incurring a lot of additional logic. This demonstrates that tried-and-tested systems can sometimes be improved by looking at the system in a new way. In class I point out that ARM’s invention is not entirely new – they have borrowed a technique from the realm of microprogramming that was popular in the 1970s. I stress that old tricks can be reused in new circumstances and that students should always appreciate the value of discussions about computer history.
Although we can’t cover all the ARM’s architectural features, there are three that are particularly important from The Delights of Predication a teaching point of view because they illustrate interesting When teaching computer architecture it is important to let and innovative features – some of which are excellent students know exactly why you are using a particular vehicles for engaging in class discussions with the students. 978-1-4244-6262-9/10/$26.00 ©2010 IEEE October 27 - 30, 2010, Washington, DC 40th ASEE/IEEE Frontiers in Education Conference T3E-3
Session T3E architecture out of the many available. The aspect of the ARM that I find most appealing is its predicated execution ability where an instruction is executed if, and only if, certain conditions are met. Typical architectures used in teaching lack predicated execution and each op-code in the instruction stream is executed in turn unless a change in the flow-of-control (e.g., branch or jump) bypasses it. A suffix can be applied to an ARM op-code to define the condition under which it is executed; for example, ADDEQ performs an operation (addition) only if the result of a previous operation was zero – otherwise the instruction is not executed (it is said to be nullified or squashed). Consider the following fragment of pseudo-code. If (x == 0) || (y < 5) p = p + 1; A conventional assembly language uses two conditional tests and generate the following (illustrative) code: CMP BNE CMP BGE ADD
r1,#0 exit r2,#5 exit r3,r3,#1
Exit
;test x in r1 ;if not zero then exit ;compare y in r2 with 5 ;if greater than 4 then exit ;increment p in r3 ;exit point
Now consider the use of predicted code. CMP r1,#0 ;test x in r1 CMPEQ r2,#5 ;if zero then the test y < 5 ADDLT r3,r3,#1 ;if y