A Taxonomy of Active Code D. Scott Alexander1, Michael Hicks2 , Angelos D. Keromytis2, Jonathan T. Moore2, Scott M. Nettles2 , and Jonathan M. Smith2 1 2
;?
Bell Labs, Lucent Technologies, Murray Hill, NJ 07974, USA,
[email protected]
University of Pennsylvania, Department of CIS, Philadelphia, PA 19104, fmwh,angelos,jonm,nettles,
[email protected]
Abstract. We have developed a classi cation of active networking architectures which includes active packets and active extensions. This taxonomy is based on the nature of the programs used by programmers. Pure active packets, like pure datagrams, carry sucient information for their evaluation. Pure active extensions prepare network elements for evaluation of subsequent packets. We develop a continuum based on the location of the code used to process a packet. The taxonomy allows us to predict the existence of a variety of new architectures. For example, caching code both improves performance and makes a system appear to be an active extension with high probability. Attractive hybrids can be constructed, with heavily authenticated code in an active extension invoked by lightweight active packets.
1 Introduction In 1997, Tennenhouse, Smith, Sincoskie, Wetherall, and Minden [18] suggested that active networking research could be divided into two models: programmable networks and capsules. We believe that subsequent research has shown the landscape to be somewhat more complex. In this paper, we hope to provide an updated taxonomy of active code. We nd that active code can be viewed along a continuum based on the amount of executable code carried in the packet itself and the amount of code referenced on the switch. We also examine distribution mechanisms and have divided active code into two basic models: active extensions (which correspond to some programmable switches and some capsules), and active packets (which correspond to some of the more extreme capsules). We have chosen new names both to signify the break from the old model and because we feel that the old names can be misleading. (Is a commercial router with ash memory a programmable switch?) This paper is divided into three parts. We rst present our taxonomy of active code. We then discuss the rami cations of these de nitions and introduce how the caching of active code muddies the water. We follow with a discussion of the tradeos inherent in the various approaches. Such tradeos involve the security, ?
This work was supported by DARPA under Contract #N66001-96-C-852.
2
performance, robustness, and exibility of the system. We nish up by classifying some of the mature active network implementations using our taxonomy.
2 The Taxonomy We believe that there are three elements of an active datagram: an executable portion, a name or entry point, and a data portion. These are shown in Figure 1. The name portion is found in any well-designed datagram service. In traditional datagram services, it appears as a type identi er or protocol identi er. In active code, this name in some way tells the interpreter where to begin execution. This entry point may point into code resident on the switch or may point into the executable portion of the active datagram. Similarly, the data portion of the active datagram is like the payload of a traditional datagram with the exception that the code may act on the data within the network.
code
name
data A
A A A
executable part
Fig. 1. An Active Datagram The executable portion of the active datagram is critical to comparing active networking systems because it determines the autonomy of the active datagram. This autonomy can be placed along a continuum. At one extreme is a completely dependent packet which contains only data. When the data arrives at the switch, all processing is determined by the switch without examining the packet at all. This corresponds to a circuit-switched network. The other end of the spectrum is a theoretical system in which the datagram is able to transport and process itself without regard to any details of the switch. The most extreme system which could be built would be one which loaded the active packet into processor memory and then started executing the code at the entry point with the processor in supervisor mode. In such a system, the active packet would have rather complete control over its destiny. Another way of considering this continuum is to look at the source of the code that processes a packet. Thus, in a system with a protocol identi er, the code to process the packet is resident in the switch and is chosen by the protocol
3
identi er. Protocol processing begins and additional data may be acquired from the packet (e.g., the destination address), but the code for the protocol code is executed. In systems where executable code is included in the active datagram, that code can be executed in concert with code resident on the switch to process the packet.
2.1 Code Distribution It is often useful to examine code distribution techniques to understand where a system falls within the continuum. Among current systems, the dependent end of the continuum is populated by systems in which code is loaded at \start up" time and does not change until the system is restarted. A second category of systems are based on what we call active extensions. These systems also load code into the switch to process packets, but can load code dynamically while the system is running. The third category of systems are those based on active packets. An active packet carries some of the code needed for processing the packet at a switch. Obviously, these categories overlap; systems exist which use active extensions to process active packets. Moreover, all the systems with which we are familiar have some bootstrap code which is also used during the running system. This code corresponds to our rst category. Active Extensions An active extension is some code that may be loaded on a running switch to alter the processing of future packets. It should be possible to load an extension without replacing all of the code in the switch. It should be possible to load an extension by sending it across the network to a remote location; if this is not possible, i.e., code may only be loaded from a local persistent store, then the extension is referred to as a local extension. Extensions may make use of other extensions already loaded on the node; they need not be independent. Extensions reside on the node, e.g., in memory or on local disk, until they are unloaded. Active Packets An active packet is one which contains both code and data needed to process the packet in the network. Active packets may invoke, or may be processed by, active extensions that reside on the nodes they traverse. Thus, packets are programmed assuming a particular active extension interface. Like traditional packets, active packets are, by nature, ephemeral. In a traditional router, a packet is received, temporarily occupying resources, and is sent on its way or discarded, freeing up those resources. Some special packets may indirectly aect router state, such as routing notices, but this state is a function of the code processing the packet, not the packet itself. In our de nition, we make the similar assumption that active packets will only occupy resources while they are executing and that any state left on the router is the responsibility of some active extension.
4
2.2 A Comparison of Active Code Types Some clear tradeos are evident in the current taxonomy. Because code resident in a switch may be used to process multiple packets, it has the advantage that any one-time costs can be amortized over many uses. For example, before an active extension is loaded, it might be type-checked and its owner authorized to guarantee certain safety or security properties. Such costs may be prohibitive if applied on a per-packet basis. However, extensions occupy router state even while not in use. They may also unnecessarily occupy CPU cycles in the circumstance that they require periodic maintenance, such as to timeout stored information, even if that information is not in regular use. It may also be dicult to safely unload or replace an extension if some subsequently-loaded extension refers to it. All of these reasons dictate that a reasonable policy should be used to manage the space occupied by extensions. We discuss the idea of using a cache for this purpose below. Active packets have the advantage that after each packet has been processed, it no longer occupies space on the router. But this may be a disadvantage as well. For example, it is redundant to transmit the processing code in each packet of a ow when only the payload portion of the packet changes. And, as mentioned earlier, one-time costs, such as unmarshalling, type-checking, and authentication, must be performed for each packet. This suggests that some form of caching may be desirable for active code in order to achieve the amortization properties of active extensions. We discuss the rami cations of caching to the taxonomy below. These trade-os suggest the careful choice of hybrid systems. In practice, we have found that there are parts of our systems that change relatively frequently and other than change rarely. As an example, a protocol which loads a routing algorithm into the switches along the path between two communicating nodes, but which sends an error processing algorithm based on the importance of the payload in each packet would be an example of such a hybrid system. Caching Both active packets and active extensions may make use of caching to take better advantage of resources. In eect, a cache provides a policy for managing the space on an active router. This policy will alter the `default' lifetime of each active entity: in the case of extensions that lifetime will be reduced, while for packets it will be extended. An active extension cache might work as follows: when a packet arrives, the system checks to see if the active extension(s) required to process it is present on the system. If not, it uses some algorithm to determine the location of a repository which can be expected to have a copy of the active extension. It then retrieves a copy of the active extension from the repository, loads it, and uses it to process the packet. Since active extensions may make use of other extensions, it may be the case that multiple extensions must be loaded before the packet is processed. In fact, it may be that the initial extension needed to process the packet is present, but it is in executing that extension that another extension is required.
5
An active code cache is much the same. In fact, even after re ection, it is dicult to distinguish between active packets and active extensions in the presence of caching. For the purposes of the taxonomy, therefore, we consider a code cache to be a function from active packets to active extensions: once cached, an active packet in eect becomes an active extension. This highlights the essence of the dierence between the two abstractions: active packets are ephemeral, while active extensions are node-resident.
3 The Security Perspective In all active networking schemes, safety and security are two important factors for the correct operation of the network. Code in the network (in the form of active extensions or packets) needs protection from other pieces of code. In closed systems, such protection (\isolation") is provided by either the operating system (e.g., memory protection), the language runtime [9, 6], or some other mechanism [14, 13]. While similar techniques can be applied inside an active switch, additional mechanisms are necessary to enforce isolation across a network. Due to the network's distributed nature, principal authentication and authorization is not as easy as in an operating system, where all the necessary information is available inside the kernel. Further complications are introduced by the existence of dierent administrative entities, with correspondingly diering access policies. Flexible security mechanisms are also necessary for resource management, which will be more complicated and potentially more important than in presentday networks as it will involve additional real and abstract resources (e.g., CPU cycles and line scheduling disciplines respectively). The best-eort resource-usage model of today's Internet is naturally biased towards users with faster connections; since the processing time of an IP[15] packet at a router is more or less xed (even taking into consideration option processing), users who can transmit more packets in the same amount of time will use more of the router's resources. Contrarily, a user can use at most the resources his local bandwidth allows him to. With the introduction of CPU cycles as a resource available to users, more stringent control is necessary. Without such controls, the user of an active network could eectively monopolize CPU cycles (and other active resources). Thus, while safety is an important feature of an active node, additional mechanisms are generally necessary to allow a collection of node to act as an active network. In particular, unless there are no privileged operations and the resource allocation model is best-eort (as in the Internet), there needs to be some way of distinguishing between network entities (nodes, users, etc.) and of determining what they are allowed to do. These two tasks are typically accomplished through authentication and authorization mechanisms, respectively. In the following two paragraphs, we examine the requirements for such mechanisms in dierent active network scenarios.
6
Authentication In general, authentication can be performed in a way similar to IPsec [4]: there is a negotiation phase, during which the two parties (in an active network, typically an active switch and the sender of an extension or packet) authenticate each other and establish a shared secret key. The authorizations of the two parties are also typically determined at this time (e.g., which extensions can the user load on the switch, or which networks does the switch interconnect). Subsequent communications between the two parties can be cryptographically secured with the shared secret. The negotiation-based model of authentication lends itself equally well to either type of active networking (active extensions or active packets). There are some cases where this mutual authentication is not necessary. This is typically the case when the user is transmitting some type of mobile agent packet (i.e., for network statistics gathering purposes), where only the user need authenticate to the switch. If the number of packets transmitted is fairly small (no more than 3 or 4), then this approach is faster than a full negotiation. Particular care has to be taken to avoid replay attacks [17] in such a setting. Authorization Contrary to authentication, authorization is highly-dependent on the active networking model in use. In particular, we need to distinguish between the following environments:
{ Active-Packets-only network. In this scenario, active packets need to estab-
lish their credentials (e.g., what services resident on an active node are they allowed to invoke, how many resources can they consume, etc.). There are no dynamic extensions, allowing for a simple security model. { Active Extensions as libraries. Here active extensions may be dynamically installed, but are only executed as a result of invocation from an active packet. The main consideration is extension naming; in particular, how to avoid (potentially malicious) name collisions between extensions installed by dierent users. Since active packets invoke extensions on the node, an attacker could install a module under a fake name, that would (for example) transmit a copy of the active packet to the attacker. One solution described in [1] involves use of a decentralized naming scheme based on the principals' public keys, imposing a \self-regulated" hierarchy 1 . Notice that in this scenario, active extensions are in eect part of the active packet, cached on the node. Thus, its privileges (and the resources it is allowed to consume) are those of the active packet that invokes it (and thus may change between dierent invocations). Additional packet authorizations may be necessary if code-sharing among dierent users of the network is not desired. { Active Extensions as packet handlers. This is a variation of the previous case, albeit an extension is now invoked by the runtime system when a particular type of inactive packet is received (e.g., telnet session packets between two
1
Impersonations and collisions in that scheme require successfully breaking the underlying public key cryptosystem.
7
hosts). The extension would then process the received packet and return to inactivity waiting for the next packet. In some sense, this variation is an extreme case of active code caching. The naming issue previously mentioned exists only insofar as function calls between active extensions are permitted. Unlike the previous scenario however, the owner of the active extension needs to establish his authority to receive and process certain (classes of) inactive packets. Also unlike the library situation, extensions here have a consistent set of privileges, typically determined at installation time. { Active Extensions as processing servers. Here active extensions act as library extensions, albeit with their own, consistent set of privileges. The authorization model is similar to that of active packets. The extensions themselves need to perform authorization checks on active packets that invoke them, to determine whether a request should be honored or not. In a more advanced version of this model, active packets may be able to \lend" (delegate) their privileges to active extensions, so that the required task can be completed.
4 Classi cation of Existing Projects In this section, we examine several of the existing active networking projects and analyze where they t into our taxonomy.
4.1 ALIEN ALIEN [2] is an example of a hybrid system. It contains a small portion of code, the loader, which is started when ALIEN is started and cannot be replaced without restarting the system. It then loads active extensions (including the Core Switchlet) so as to process packets from the network. One of the available active extensions processes active packets. When all of these pieces are running, the user is able to choose to a large extent where in the spectrum his protocol will lie. At the same time, there are limitations in ALIEN to ensure that resources are fairly divided. This means that certain decisions cannot be made by the active datagram. The Active Bridge [3] was an early application of ALIEN. It is an example of an active extension system. Several extensions are loaded into the system. These extensions process packets according to the requirements of the 802.1D bridging standard [12]. By loading additional active extensions, it is possible to change the way in which packets are processed.
4.2 ANTS ANTS [20] provides a (cached) active extension model. Each active extension is named by computing its MD5 [16] checksum. When a packet arrives at a node, in addition to some payload it includes a eld containing the checksum of the active extension to be called. If this extension is not currently on the node, a
8
protocol is used to attempt to retrieve it from the node which most recently processed the packet. In this way, if packets form a ow, the rst packet in the
ow will \pull" the active extension into the nodes along the path of the ow and subsequent packets will nd the extension already in place in the caches of the nodes along the path.
4.3 NetScript
NetScript [8, 21] provides active extensions as packet handlers. It is a programminglanguage based project being developed at Columbia University. The main object in this system is a box which is a stream-processing element. Netscript can then be used to specify the interconnections between boxes to compose more complex protocols. The main strength of the project is the ability to dynamically compose, reorder, substitute, add, or remove processing elements from a protocol stack. Boxes may be sent to remote machines but the system operates on non-active packets.
4.4 PLANet
PLANet [11] provides an integrated active environment based around active packets. All packets in the network contain programs written in PLAN [10], the Packet Language for Active Networks. The language provides some basic control constructs and datatypes such that interesting computations may be performed but without posing a security risk. For example, all PLAN programs must terminate and none may directly leave any resident state. PLAN programs may additionally call node-resident services. Services are implemented as active extensions.
4.5 PLAN-P
PLAN-P [19] is another system which uses active extensions as packet handlers. It is an active networking language being developed at IRISA. Although its syntax resembles that of PLAN (see above), it is used to specify node-resident processing elements and therefore has an entirely dierent evaluation model. They apply dynamic code-generation techniques to achieve eciency for their extensions. In addition, the limited expressibility of PLAN-P permits them to prove properties about their protocol processors such as global termination and an absence of exponential packet duplication.
4.6 Smart Packets
Smart Packets [5] makes use of active packets but not active extensions. Smart Packets programs are written in a language called Spanner which the compiled form of a higher-level, C-like language called Sprocket. Like PLAN, Sprocket code may call routines with more complex functionality, but unlike PLAN, these complex functions are statically compiled into the router and are thus not extensible.
9
4.7 Traditional networking It is actually possible to classify currently-available networking technology in our taxonomy as well. For example, some commercial routers contain Flash ROMs which can be upgraded remotely; if it is possible to load part of the ROM and begin executing code from this part of the ROM while running, they can be viewed as active extension-based systems. Another example is the well-known Simple Network Management Protocol [7]. Here, network elements' operation is governed by the values of variables in a Management Information Base (MIB). In turn, SNMP packets contain commands which essentially consist of assignments to these variables. Thus, they can be viewed as active packets which carry code (albeit of limited expressibility).
5 Conclusions We have described a classi cation of active networking architectures using the relative amount of code in the packet and resident in the switch and using the code distribution mechanism. The location of the code provides a continuum which describes how dependent the active code is upon the switch. The means of code distribution allows us to divide active code into active extensions and active packets and to make predictions about performance. We have also described how caching aects this classi cation. Caching can provide a means by which the programmer sees an active packet system while programming, but sees the performance of an active extension system. Nonetheless, the programmer should be aware of the implications of the caching.
References 1. D. S. Alexander, W. A. Arbaugh, A. D. Keromytis, and J. M. Smith. A secure active network environment architecture: Realization in switchware. IEEE Network Magazine, special issue on Active and Programmable Networks, 12(3):37{45, 1998. 2. D. Scott Alexander. Alien: A Generalized Computing Model of Active Networks. PhD thesis, University of Pennsylvania, December 1998. 3. D. Scott Alexander, Marianne Shaw, Scott M. Nettles, and Jonathan M. Smith. Active Bridging. In Proceedings, 1997 SIGCOMM Conference. ACM, 1997. 4. R. Atkinson. Security Architecture for the Internet Protocol. RFC 1825, August 1995. 5. Smart packets. http://www.net-tech.bbn.com/smtpkts/smtpkts-index.html. 6. Caml home page. http://pauillac.inria.fr/caml/index-eng.html. 7. J. Case, M. Fedor, M. Schostall, and J. Davin. A Simple Network Management Protocol (SNMP). RFC 1157, May 1990. 8. Sushil da Silva. Netscript tutorial. http://www.cs.columbia.edu/~dasilva/pubs/netscript-0.10/doc/tutorial.html, October 1998. 9. James Gosling, Bill Joy, and Guy Steele. The Java Language Speci cation. Addison Wesley, 1996.
10 10. Michael Hicks, Pankaj Kakkar, Jonathan T. Moore, Carl A. Gunter, and Scott Nettles. PLAN: A packet language for active networks. In Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming Languages, pages 86{93. ACM, 1998. Available at www.cis.upenn.edu/~switchware/papers/plan.ps. 11. Michael Hicks, Jonathan T. Moore, Scott Alexander, Carl A. Gunter, and Scott Nettles. Planet: An active internetwork. To appear, 1999. 12. IEEE. Media access control (MAC) bridges. Technical Report ISO/IEC 10038, ISO/IEC, 1993. 13. George C. Necula. Proof-Carrying Code. In Proceedings of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '97). ACM Press, 1997. 14. George C. Necula and Peter Lee. Safe Kernel Extensions Without Run-Time Checking. In Second Symposium on Operating System Design and Implementation (OSDI '96), 1996. 15. J. Postel. Internet protocol. Technical report, IETF RFC 791, September 1981. 16. R. Rivest. The MD5 Message-Digest Algorithm. Internet RFC 1321, April 1992. 17. P. Syverson. A Taxonomy of Replay Attacks. In Proceedings of the Computer Security Foundations Workshop VII (CSFW7), June 1994. 18. D.L. Tennenhouse, J.M. Smith, W.D. Sincoskie, D.J. Wetherall, and G.J. Minden. A survey of active network research. IEEE Communications Magazine, pages 80 { 86, January 1997. 19. Scott Thibault, Charles Consel, and Gilles Muller. Safe and Ecient Active Network Programming. In 17th IEEE Symposium on Reliable Distributed Systems, October 1998. 20. David J. Wetherall, John Guttag, and David L. Tennenhouse. ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols. In IEEE OPENARCH, April 1998. 21. Y. Yemini and S. daSilva. Towards programmable networks. In IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, October 1996. http://www.cs.columbia.edu/~dasilva/netscript.html.