Teaching computer systems through common principles

0 downloads 0 Views 279KB Size Report
doctor is never idle because a patient did not show up. Unfortunately, this ... textbook. The book by Saltzer and Kaashoek [5] does .... [9] Dr. Seuss. The Cat in the ...
Session S2G

Teaching Computer Systems through Common Principles Mark A. Holliday Department of Mathematics and Computer Science, Western Carolina University, [email protected]

Abstract - Computer system subjects ranging from computer organization and operating systems to computer networking and database systems form an integral part of a computer science or computer engineering major. Because the subjects are usually taught as separate courses, students may not recognize that they share many design principles. We identify a set of these principles and demonstrate how they apply to all these aspects of a computer system. In our experience, students' understanding of these subjects and how they are inter-related improves when we identify and illustrate these common principles.

LAYERS OF ABSTRACT MACHINES Principle: A computer system is a series of layers of abstract machines. Each abstract machine uses the interface provided by the abstract machine below it and provides an interface to the abstract machine above it. Alternative implementations of an abstract machine interface are possible.

Index Terms - Computer Systems, Design Principles, Computer Organization, Operating Systems, Computer Networking, Database Systems. INTRODUCTION The report on Computer Science Curricula of the Joint Task Force [1] identifies a number of aspects of a computer system as central to a computer science or computer engineering curriculum. This paper addresses the teaching of four of these aspects: computer organization, operating systems, computer networking, and database systems. Often we teach these aspects of computer systems in separate courses. Excellent textbooks and well-defined sets of course topics have been developed for each of these subjects. However, students do not always understand that all these aspects of a computer system share certain principles that influence design decisions. This is unfortunate because in our experience identifying and illustrating common features will not only enhance awareness of the commonality of these subjects but also aid in developing a better understanding of each subject itself. We encourage instructors to use common design principles as a unifying means of teaching these subjects. To assist instructors, we present an example set of principles and illustrations. In addition, we illustrate when each design principle appears in a settings besides the computer system subjects. Each of the next nine sections of this paper identifies and illustrates one design principle. Section Eleven discusses related work. We conclude in Section Twelve.

FIGURE 1

THE LAYERS OF A COMPUTER'S ABSTRACT MACHINES.

 Computer organization: The machine instruction set is the interface provided by the processor. Different processors may implement the same machine instruction set. In Figure 1 note the border of the hardware is exposed to program executables and to the operating system kernel. This illustrates how programs and the kernel use the instruction set interface but are independent of what lies below that interface. For example, Intel and Advanced Micro Devices (AMD) developed processors that are very different internally while supporting the same instruction set. As a result, the same program executables will run on either processor.  Operating systems: The interface to the operating system kernel is its set of system calls. The interface to the complete operating system

978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12-15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2G-1

Session S2G distribution also includes its set of library calls (that is, its application programmer interface (API)). Different operating system kernels (for example, Linux, FreeBSD, and UNIX) may implement the same system call interface. Figure 1 illustrates the system call and library interfaces and implementations. 

Computer networking: o Each layer of the protocol stack depends solely on the layer below supporting the agreed upon interface [2]. o The socket interface is a set of system calls and thus, as part of the operating system kernel, can be implemented differently in different kernels .  Database systems: Relational database systems typically implement a common interface that is the Structured Query Language (SQL) standard.  Non-computer systems: An example is that in an object-oriented programming language the interface of an object is independent of its implementation. LATENCY VERSUS BANDWIDTH Principle: Improving the performance of a computer system is an ambiguous concept since improving performance could mean increasing bandwidth or it could mean decreasing latency. In general, increasing bandwidth is relatively easy with parallelism being a common technique for increasing bandwidth. It is difficult to decrease latency because of fundamental physical constraints (such as the speed of light). Consequently, latency avoidance and latency tolerance techniques are important in computer system design.

 Computer organization: Disk bandwidth, that is, the number of bytes that can move to or from a disk system in a time unit, can be increased through use of parallel disks (as in a Redundant Array of Independent Disks (RAID) system) while decreasing latency, that is, disk access time, can be done by making disk head movement faster.  Operating systems: File system bandwidth is closely related to disk bandwidth. Similarly, file system latency is closely related to disk access time latency.  Computer networking: Bandwidth of a network is the number of bits per second the network can transfer. The transmission delay of a packet, that is, the time needed for the entire packet to be placed on the link, is proportional to the link bandwidth. The propagation delay of a packet, that is, the time for the first bit (or the last bit) to propagate from the source to the destination, is the latency. The two types of delay are independent and combine to form the total packet delay. Figure 2 illustrates this point from the perspectives of the sender and the receiver [2]. The sender perspective shows the transmission delay from the first bit leaving the sender to the last bit leaving the sender followed by the propagation delay until the last bit reaches the receiver. The receiver perspective shows the propagation delay until the first bit reaches the receiver and then the transmission delay from the first bit reaching the receiver to the last bit reaching the receiver.  Database systems: Methods of implementing a join operation that involve materializing an intermediate table cause the latency of the operation to be visible to the next relational operator in the evaluation plan.  Non-computer systems: An example is that the time until the light from a distant star reaches the earth is independent of the amount of light that star is emitting. LATENCY AVOIDANCE: CACHING Principle: Keep copies of a subset of the storage objects in a closer location as well as in their standard location. Which subset of objects is being cached changes over time automatically according to which objects are referenced.  Computer organization: The processor caches and the translation lookaside buffer (TLB) use this technique.

FIGURE 2

BANDWIDTH AND LATENCY AS THE TRANSMISSION DELAY AND PROPAGATION DELAY, RESPECTIVELY, OF A PACKET ON A LINK.

 Operating systems: The use of main memory as a cache for virtual memory and as a cache for the file system are examples.

978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12-15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2G-2

Session S2G  Computer networking: Caching occurs in the Domain Name System (DNS) entries in local DNS servers, the cache in the web browser, and the Address Resolution Protocol (ARP) cache.  Database systems: The main memory buffer cache is an example.  Non-computer systems: An example is keeping a list of the names and phone numbers of people you often call instead of always looking up the phone number when needed. LATENCY AVOIDANCE: PREFETCHING AND SPECULATIVE RETENTION Principle: In anticipation of needing a resource, prefetch or create the resource ahead of time or retain a previous copy, so that when it is needed, you can avoid the latency of fetching or creating the resource.  Computer organization: Prefetching instructions from beyond a branch point by predicting whether the branch will be taken is an example.  Operating systems: Prefetching a page in a virtual memory system is a case.  Computer networking: Maintaining a set of open connections so that the overhead of creating a connection is avoided is an example of speculative retention.  Database systems: An illustration is prefetching pages into the main memory buffer cache.  Non-computer systems: An example of prefetching is when a cook at a fast-food restaurant prepares what he expects customers to order before the customers arrive. LATENCY TOLERANCE: PIPELINING Principle: Instead of waiting for a job to complete, start a second job as soon as the first job leaves the first of the series of steps.

Error control in the transport layer using either Go-Back-N Selective Repeat [2] uses pipelining.  Database systems: Pipelined evaluation of a relational operator is an example. The initial result tuples are sent to the next relational operator in the query evaluation plan before the evaluation of the first relational operator is finished.  Non-computer systems: An oil pipeline or an assembly line uses the concept of pipelining. LATENCY TOLERANCE: CONTEXT SWITCHING Principle: When one operation requires a long time, switch to performing another operation instead of being idle waiting for the first operation to finish.  Computer organization: A multithreaded processor uses context switching. The threads are hardware contexts that can be switched between after every machine instruction and differ from operating system threads.  Operating systems: Examples include multiprogramming (switching between processes on disk i/o), switching between user-level threads within a process by the process, and switching between system-level threads within a process by the operating system.  Computer networking: An example is supporting the Transmission Control Protocol (TCP) by maintaining multiple connections each with its own thread and switching to a different thread as needed.  Database systems: The apparent concurrent execution of multiple transactions is actually context switching between the transactions.  Non-computer systems: An example is the multi-tasking that people do when talking on the phone while opening the U.S. Mail is an example.

 Computer organization: The data path in the Central Processing Unit is pipelined.  Operating systems: Asynchronous message sends and asynchronous file writes is an example of pipelining. 

FIGURE 3

Computer networking:

RESOURCE CONTENTION AS JOBS ARRIVE AT A SERVER AND ITS QUEUE.

978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12-15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2G-3

Session S2G THROUGHPUT VERSUS RESPONSE TIME

INDIRECTION

Principle: Throughput of a system is the number of jobs that complete per time unit. Response time in a system is the time from when a job enters the system until it leaves the system. Response time is composed of service delay and queueing delay. Figure 3 illustrates the key abstractions when jobs arrive and queue to wait for service. As shown in Figure 4, due to congestion and queuing delays, maximizing throughput and minimizing response time are competing performance goals. Increasing the job arrival rate increases throughput, but also increases response time.

Principle: Add a level of indirection in the procedure of accessing some object to allow the actual location of the object to be changed while not preventing continued use of the object by users.  Computer organization: An example is the indirect addressing mode in many machine instruction sets.  Operating systems: Two examples are: 1) soft (symbolic) links in file systems (instead of hard links) and 2) that machine instructions use logical addresses instead of physical addresses (the reference code and data can be moved around in main memory).  Computer networking: An example is using a url instead of an IP address. This allows the IP address of a machine to be changed without the user noticing. Use an IP address instead of a MAC address (such as an Ethernet address). This allows the network interface card to be changed without users of the IP address noticing.

FIGURE 4

THE TRADEOFF_ BETWEEN THROUGHPUT AND RESPONSE TIME

 Database systems: An example is the concept of a view in a relational database. The mapping from the virtual set of tables to the real set of tables is done when a query using the view executes.

 Computer organization: In this case, the server is the disk subsystem and a job is an I/O (Input/Output) request. Increase throughput by using multiple servers (a RAID system). Decrease response time by using a better disk scheduling policy.

 Non-computer systems: The use of a P.O. Box mailing address instead of a physical address is indirection.

 Operating systems: The server is the central processing unit and a job is a process in operating systems.

Principle: A hash function uniformly distributes values among the storage locations within a hash table in a deterministic manner. This allows an on average constant time lookup of individual values.

 Computer networking}: In computer networking the server is a switch and a job is an arriving packet.

 Computer organization: Set-associativity in processor caches is a simple form of hashing.

 Database systems: The server is the database system and a job is an arriving transaction in database systems.

 Operating systems: A hash table maps virtual addresses to an index in the inverted page table.

 Non-computer systems: Waiting in a doctor's office is an example. Doctors often schedule many appointments at the same time so that the doctor is never idle because a patient did not show up. Unfortunately, this can result in the patients having to wait a long time to be seen.

 Computer networking: Distributed Hash Tables (DHTs) in peer-to-peer networks and the lookup process used in caches of many types (such as web caches) are examples.

HASHING

 Database systems: An example is hash tables as a data structure for holding data records and access methods. Also hashing is used as part of implementations of relational operators such as hash

978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12-15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2G-4

Session S2G joins and hashing in the duplicate elimination phase of projection.  Non-computer systems: An example is that hash functions are used in computational geometry for many proximity problems such as finding closest pairs in a set of points. RELATED WORK Textbooks tend to focus on each of these four subjects separately and do not explain how they are inter-related. Teaching each as a separate course is the most likely cause of this omission. However, Nisan and Schoken [3] and Bryant and O'Halloran [4] address both computer organization and some operating system topics in one textbook. The book by Saltzer and Kaashoek [5] does address common design principles across many areas of computer systems. However, their use of the term \emph{design principles} differs from the use in this paper. They identify fifteen principles such as ``Avoid excessive generality’’, ``Be explicit’’, and ``Principle of Least Astonishment’’. Nisan and Schoken [3] and Bryant and O'Halloran [4] also address compilation issues. Language translators (compilers, interpreters, and runtime environments) fit naturally with the common design principles approach advocated in our paper. For example, the use of a virtual machine, as in the Java interpreter, is an example of using an abstract machine layer. The use of an intermediate language that allows the front-end of a compiler to be used with multiple back-ends each for a different target machine is an example of indirection. Lookup in the symbol table of the front-end of a compiler is an example of using a hash table. Design patterns have become an important concept in software engineering [6]. While the identification of common design principles in computer system subjects is related, the differences are significant. We are considering components of a computer system, not the development of general-purpose software. This in turn causes the nature of the specific principles (such as hashing or latency tolerance through context switching) to be different. As mentioned and illustrated above, these common design principles also appear in other settings. This is a type of analogy which should help the student understand the meaning of the principles as well as their significance. The role of analogy in teaching computer science has been noted in the teaching of beginning programming [7, 8], but we are not aware of discussions in the literature about the use of analogy when teaching about computer systems. Besides using analogies to illustrate shared design principles, analogies can be used to illustrate important concepts that are specific to one aspect of computer systems. Three examples we use in teaching of analogies that are specific to one computer system subject involve the implementation of recursion, the implementation of the fork system call, and access control rights.

 Implementation of Recursion. Recursion occurs when one invocation of a particular procedure creates another invocation of that same procedure. As shown in Figure 5, recursion is implemented by each of those invocations being represented by pushing another activation frame on the stack in the process address space. Each of these frames is identical upon creation except for the return address and the values of the arguments. This is illustrated on pages 37 and 49 of The Cat in the Hat Comes Back [9]. The cat takes his hat off showing on top of his head another identical cat who is taking his hat off and on top of his head is another identical cat who is taking his hat off and so on. Each of the cats is identical except for the letter displayed on the side of his hat with the first cat having the letter A on his hat, the second cat having the letter B, and so on. Each new cat represents a new activation frame being pushed on the stack for a new recursive call of the procedure. Each cat is identical (except for the letter on his hat), just as each activation frame is identical (it is the same procedure after all) except for the return value. The stack of cats growing upward represents the stack of activation frames growing downward.

FIGURE 5

THE STACK AND OTHER SEGMENTS WITHIN A PROCESS ADDRESS SPACE.

 Access Control Rights. The operating system determines whether a particular user has the right to access a particular resource by determining whether that user has access control rights to that resource. Access control rights are represented either by a capability list or an access control list. When purchasing a ticket to a performance (for example, a sporting event) a person is allowed to enter the venue by presenting the ticket to an agent. A capability list serves the same purpose.

978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12-15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2G-5

Session S2G When voting a person arrive at the voting center and authenticates himself or herself (for example, with a driver's license). The poll worker checks to see if that name appears on the list of people who are registered voters in that district. An access control list functions in the same manner.  Creating a Process. Starting a program causes a process to be created to execute that program. This happens often (almost every time when a command is entered at the command line). In Unix/Linux operating systems process creation is done by the fork system call that makes an identical copy of the current process. Each copy of the process starts executing at the machine instruction after the fork system call. The two process copies can determine which one is the parent process because the return value from the system call for the parent is different from the return value for the child process. The Second Chance episode of Star Trek: The Next Generation [10] begins ten years before the time of the other episodes. A Federation starship carrying Lieutenant Riker arrives at a planet with a strong ionosphere. Riker transports down and after a time transports back up to the ship. Because of the strength of the ionosphere his transported image also bounces back to the planet's surface so there are now two Rikers; the only difference being where they returned to (the return value). A fork system call's creating a second process is analogous to the creation of a second Lieutenant Riker. CONCLUSIONS The design of the different components of a computer system from the underlying computer organization and operating system through computer networking and the implementation of a database system are to a large extent based on common design principles. The range of applicability of these principles is even wider as indicated by examples from other settings. To our knowledge, identifying common system design principles and emphasizing their importance have only rarely been done. The book by Saltzer and Kaashoek [5] is the most noteworthy example, but, as mentioned above, the types of design principles identified here and in that book are substantially different. We conjecture that student learning in the computer systems courses can be enhanced by the instructor incorporating these design principles in the course structure. An interesting next step would be to develop an assessment of this conjecture. One issue is ``How would the design principles be incorporated in the course structure?” One approach is to maintain the current approach of separate courses on computer architecture, operating systems, computer networking, and database systems, but to identify

the design principles in each course, how those design principles occur in the system area being studied, and brief references to how those principles are used in the other system areas. The second approach is to introduce a new course that is organized by design principles and for each design principles includes examples from all the system areas much like how this paper is organized. The first approach is more amenable to assessment since each course can be offered in the traditional form and in an enhanced version with the design principles added. The students in each section would take a pre-test at the start of the semester and a post-test at the end of the semester. ACKNOWLEDGMENT This work was supported by the National Science Foundation under grant CPATH EAE 0722313. REFERENCES [1] ACM/IEEE Computer Society. Computer science curricula 2008: An interim revision of cs 2001. www.acm.org/education/curricula/, 2008. [2] Holliday, M.. “Animation of computer networking concepts. “ ACM Journal of Educational Resources in Computing, 3(2):1-26, June 2003. [3] Nisan, N. and S. Schocken. The Elements of Computing Systems: Building a Modern Computer from First Principles. MIT Press, Cambridge, MA, 2004. [4] Bryant, R. and D. O`Hallaron. Computer Systems: A Programmer`s Perspective. Prentice Hall, Upper Saddle River, NJ, 2002. [5] Saltzer, J.H. and M. Frans Kaashoek, Principles of Computer System Design: An Introduction, Morgan Kaufmann, 2009. [5] Astrachan, O., G. Mitchener, G. Berry, and L. Cox. “Design patterns: An essential component of cs curricula.” In Proceedings of the Twenty-Ninth SIGCSE Technical Symposium of Computer Science Education, pages 153-160. ACM, March 1998. [7] Matocha, J., T. Camp, and R. Hooper. “Extended analogy: An alternative lecture method.” In Proceedings of the Twenty-Ninth SIGCSE Technical Symposium of Computer Science Education, pages 262-266. ACM, March 1998. [8] Davis, J. and S. Rebelsky. “Food-first computer science: Starting the first course with pb&j.” In Proceedings of the ThirtyEighth SIGCSE Technical Symposium of Computer Science Education, pages 372-376. ACM, March 2007. [9] Dr. Seuss. The Cat in the Hat Comes Back. Beginner Books, Random House, New York, NY, 1958. [10] Star Trek: The Next Generation. Second chances. Season Six, Episode 24, May 1993.

AUTHOR INFORMATION Mark A. Holliday, Professor, Department of Mathematics and Computer Science, Western Carolina University, Cullowhee, NC 28723, [email protected].

978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12-15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2G-6

Suggest Documents