Spotlight
E d i t o r : S i o b h á n C l a r ke • s i o b h a n . c l a r ke @ c s . t c d . i e
Mobile Code Paradigms and Security Issues R.R. Brooks • Clemson University
Programs are no longer constrained to execute on the nodes where they reside, and many systems therefore support code mobility. Although mobile code has yet to fully realize its promise of increased system flexibility, scalability, and reliability, the marketplace has embraced mobile code implementations such as Java, Jini, PostScript, and .NET. Several mobile code paradigms exist, and mobile code use raises many security concerns. Here, the author maps a taxonomy of mobile code paradigms to a taxonomy of network security vulnerabilities, revealing that many important security issues are being ignored.
obile code has a long and varied history, beginning with computing pioneer John von Neumann’s seminal concept of one automaton controlling another. In the 1960s, the mobile code idea was evident in remote job-entry terminals that transferred programs to mainframe computers. Ten years later, Ukrainian researcher Peter Sapaty introduced the Wave system, which offered full mobile code functionality.1 In the 1980s, Scandinavian packet-radio enthusiasts developed a Forth-based approach to remotely transferring and executing programs through a wireless infrastructure. In the 1990s, Sun Microsystems introduced Java, marking the first widely used mobile code implementation. Along the way, mobile code has been viewed using different perspectives and paradigms. Unlike mobile computing, in which hardware moves,2 mobile code changes the machines where the program executes.3 Mobility lets vendors reconfigure software without shipping a physical medium. Sun researchers initially designed Java to reprogram cable TV boxes to avoid the cost of sending technicians to physically upgrade cable TV software. Microsoft uses mobile code to promptly distribute software patches. PostScript documents are another type of mobile program that tells printers how to create images. Mobile code can also help distributed systems adapt
M
54
MAY • JUNE 2004
Published by the IEEE Computer Society
autonomously. Adaptation can balance loads or compensate for hardware failures. It can also include downloading and installing software for new features. Mobile code promises to increase system flexibility, scalability, and reliability. To date, however, this promise has been only partially fulfilled. Among the reasons for the technology’s unmet potential are security concerns and incomplete knowledge of the possible consequences of mobile code use. To address these issues, I map a taxonomy of mobile code paradigms to a taxonomy of network security vulnerabilities. This illustrates how many important mobile code security issues are being ignored.
Mobile Code Paradigms Several researchers have offered overviews of mobile code paradigms.2-6 As Table 1 shows,4 the established paradigms offer a clear progression of technology: • Client–server: The client invokes code resident on another node. • Remote evaluation: A remote node downloads code before executing; examples include the common object request broker architecture (Corba) and SOAP. • Code-on-demand: Local clients download and execute code as needed.
1089-7801/04/$20.00 © 2004 IEEE
IEEE INTERNET COMPUTING
Mobile Code Paradigms Table 1. Common mobile code paradigms. Paradigm
Example
Description
Client–server Remote evaluation Code-on-demand Process migration Mobile agents Active networks
Corba Corba Factory Java, Active X Mosix, Sprite Agent-TCL Capsules
Client invokes code resident on another node. Client invokes a program on remote node. Remote node downloads code. Client downloads code and executes it locally. Operating system transfers processes from one node to another for load balancing. Client launches a program that moves from site to site. Packets moving through the network reprogram network infrastructure.
• Process migration: Processes move from one node to another to balance the load. • Mobile agents: A program moves from site to site according to its own internal logic. • Active networks: Packets moving through the network reprogram the network infrastructure.6 Paradigms differ on where code executes and who determines when mobility occurs. Consider an example scenario: data file f is input on node nf, and program p is input on node np for execution. The user u is on node nu. Given this, the following actions would occur2: • Client–server: transfer file f from nf to np. Program p executes on np and the results are transferred to nu. • Remote evaluation: transfer program p to nf and execute there. Results are returned to nu. • Code-on-demand: transfer data file f and program p to nu and execution occurs there. • Mobile agents: transfer program p to nf and execute there. Program p carries the results to nu. Each approach will vary in its efficiency, depending on network configuration and the size of p and f.4 Strong and weak code mobility differ.2 Weak mobility transfers limited initialization data, but no state information, with the code. Strong mobility migrates both code and execution state; programs move while executing. Mobility might even be transparent to the program itself. The utility of strong migration is debatable. It increases the volume of data transmitted as a process migrates. For load balancing, strong migration is worthwhile only for processes with long lifetimes.3 Mobile agents can be implemented using either weak or strong mobility. Some researchers view distributed systems with transparent migration as mobile code systems3 while others do not.2 I consider them mobile code systems. Despite differences, all mobile code systems
IEEE INTERNET COMPUTING
have things in common. For example, they must have a network-aware execution environment. For Java applets, a Web browser with a virtual machine downloads and executes the code. Other implementations use a network operating system layer coupled with a computational environment.5 Several mobile code implementations warrant further discussion because they do not fit well into established paradigms. • Although rarely recognized as mobile code, PostScript is one of the technology’s most successful applications. PostScript files execute on printers to produce graphic images. Many users are unaware that these files are mobile code packages that are fully capable of performing malicious activities on local file systems. • Wave is perhaps the earliest successful implementation of network-aware mobile code.1 In the Wave programming environment, network nodes correspond to graph nodes, and network connections correspond to edges. Wave offers an elegant approach for presenting distributed computing problems in terms of graph theory. • Tube extends a Lisp interpreter to distributed applications.7 As an interpreted system, Lisp can metaprogram, generating and modifying code on the fly. Tube uses metaprogramming to offer robust computation by compensating for network errors. • Messenger is similar to active networks, but focuses on the semiotics of message passing rather than the mechanics of communication.8 As these descriptions indicate, these paradigms are primarily oriented toward producing prototypes or commercial applications.
Mobile Code Taxonomy The taxonomy9 I developed with my student Nathan Orr is shown in Figure 1 (next page). It characterizes mobile code paradigms. Each paradigm places
www.computer.org/internet/
MAY • JUNE 2004
55
Spotlight Behavior Itinerary Transmission
Initiating Entity
Message0 Message1
Target Entity
Messagen
Figure 1. Taxonomy of mobile code paradigms. Behavior is defined by the itineraries of transmissions. A transmission is a sequence of messages sent between threads on machines. Message
Instruction
Payload
code request
empty
resource request
code
reference request
resource
thread request
reference
execution request
execution state
code migrate resource migrate reference migrate thread migrate
Figure 2. Message definition. Each message sent between threads has an instruction and a payload. constraints on system behavior. In the taxonomy, a transmission is a set of messages sent between threads on hosts. A system’s behavior is defined as the itineraries followed by its transmissions. Figure 2 shows the definition of a message. Each message has an instruction signifying some action and a payload signifying the (possibly empty) target of that action. In this model, resources, threads, and programs can be either fixed or mobile. The paradigms and mobile code implementations I have discussed thus far are all limited instances of the taxonomy in Figure 1. Code-ondemand, for example, is limited to code requests moving from the initiating host to the target, which returns a code-migrate message. Another example
56
MAY • JUNE 2004
www.computer.org/internet/
is mobile agents, which are a series of code (and state) migration requests in which the agent determines the itinerary. My students and I have used the mobile code taxonomy as the basis of an API for a flexible mobile code execution environment. Using our taxonomy,9 we can further group the common paradigms into two families. Figure 3 shows the client–server family. In the client–server model (Figure 3a), the client thread (X) transmits two concatenated messages to the remote thread (Y). One message requests the program resource, providing data if needed. The second requests program execution. After execution, Y transmits execution results to X. In remote evaluation (Figure 3b), local thread (X) transmits three concatenated messages to remote thread (Y). A message containing the executable code is concatenated to a client–serverstyle transmission. After execution, Y sends possibly NULL execution results to X. Java applets use the code-on-demand paradigm (Figure 3c), in which local thread X transmits a single message to Y, requesting a code download. Thread Y then transmits a message to X that contains the code, and X executes the code locally. In contrast to the client–server family, which is characterized by users initiating action and by a reactive infrastructure, the agent family supports autonomy and adaptation within the infrastructure. Figure 4 shows the agent family. The mobile agent paradigm (Figure 4a) uses two threads for each hop. Thread X executes locally and composes a thread-migrate message containing agent code and state. This message is transmitted to thread Y on the remote host, where execution continues. A set of n hops requires n transmissions between up to n + 1 threads. The agent decides when and where to migrate. The process-migration paradigm differs from the agent paradigm in one way. The local host, rather than the agent, decides when and where the process migrates. Active networks include many paradigms. In one, packets execute while traversing the network, which is a type of process migration. In another paradigm, packets reprogram the network infrastructure. As Figure 4b shows, this combines the mobile-agent and codeon-demand paradigms.
Mobile Code Security Now that we understand what mobile code is, we can briefly examine its security implications (for more details, see my colleague John Zachary’s arti-
IEEE INTERNET COMPUTING
Mobile Code Paradigms
cle10). There are currently four main approaches to mobile code security11: • Sandboxes limit the instructions available for use. • Code signing ensures that code originates from a trusted source. • Firewalls limit the machines that can access the Internet. • Proof-carrying code (PCC) carries explicit proof of its safety. The first three approaches are in widespread use. Netscape and Sun browsers use a hybrid approach that combines use of a sandbox and code signing.11 Firewalls are also in widespread use, but they are seriously limited in their ability to detect malicious code. Finally, it’s not clear that generic implementations of PCC will ever be possible. These approaches solely protect target machines (or networks) from malicious code. Little has been done to protect code from malicious hosts. Methods being investigated include: • Computing with encrypted functions: it’s possible, in some cases, to execute encrypted functions on encrypted data.12 • Code obfuscation: deliberately scramble the object code in a way that keeps it functional, but makes it difficult to reverse engineer.12 • Itineraries: keep itineraries of the nodes that a mobile code package has visited.13 • Redundancy: run multiple code packages in parallel on multiple hosts and compare their results.13 • Audit trail: log partial results throughout a distributed computation.13 • Tamper-proof hardware: viruses or other methods cannot corrupt tamper-proof hosts.14
X: user thread at initiating client Y: host thread at target server [execution request, empty] (1) [resource request, empty] X
(2) [resource migrate, resource]
Y Y (a) Client server
X: user thread at initiating client Y: host thread at target server [code migrade, code] [execution request, empty] (1) [resource request, empty] X
(2) [resource migrate, resource]
Y Y (b) Remote evaluation
X
X: user thread at initiating client Y: host thread at target server (1) [code request, empty] X
(2) [code migrate, code]
Y Y (c) Code on demand
X
Figure 3. The client–server family. This paradigm family consists of the (a) client–server, (b) remote evaluation, and (c) code-on-demand models.
X: user thread at initiating sitte Y: host thread at target site Z: host thread at next target site (1) [thread migrate, code and execute state] X Y (2) [thread migrate, code and execute state] X
X now at Y's location Execute for a while... X now at Z's location
Z (a) Mobile agent
X: user thread at initiating sitte Y: host thread at target site Z: host thread at next target site (1) [thread migrate, code and execute state]
Widespread network attacks tend to involve some type of mobile code. Viruses and worms are a danger almost entirely due to their ability to migrate from host to host. That we’re still confronted by viruses and worms illustrates that widespread security measures are not working. They might be inadequate or just poorly implemented.
X
X (2) [code request, empty] X
Y
X now at Y's location Execute for a while... (3) [code migrate, code]
Z
Z
Y
(b) Active network
Mapping Security to the Taxonomy
Figure 4. The agent family. This paradigm family consists of (a) mobile agent and (b) active network models. It also includes the process-migration paradigm, which differs from the agent paradigm only in that the local host, rather than the agent, decides when and where the process migrates.
Our mobile code taxonomy was based on a security-incident taxonomy developed at Sandia National Laboratories.15 According to Sandia’s taxonomy, each security incident is a combination of
one or more attacks, which use tools to exploit system vulnerabilities and create an unauthorized
IEEE INTERNET COMPUTING
www.computer.org/internet/
MAY • JUNE 2004
57
Spotlight
Incident
Attack(s)
Event
Unauthorized result
Objectives
Increased access
Challenge, status, thrill
Attackers
Tool
Vulnerability
Action
Target
Hackers
Physical attack
Design
Probe
Account
Spies
Information exhange
Implementation
Scan
Process
Configuration
Flood
Data
Disclosure of information
Authentication
Component
Corruption of information
Bypass
Computer
Denial of service Theft of resources
Terrorists Corporate raiders
User command Script or program
Professional criminals
Autonomous agent
Spoof
Network
Vandals
Distributed tool
Read
Internetwork
Voyeurs
Data tap
Copy
Political gain Financial gain Damage
Steal Modify Delete
Figure 5. Sandia’s security-incident taxonomy. Attackers use tools to exploit vulnerabilities, then take actions against targets to produce unauthorized results and fulfill their objectives. Events in this taxonomy correspond to messages in Figure 1. result. Each unauthorized result is produced by an event, which is the action an attacker takes to exploit a specific target’s vulnerability. Figure 5 enumerates the most common possibilities for each taxonomy element. With mobile code, a malicious package’s overall behavior constitutes a single security incident. The package behavior’s itinerary is a set of transmissions that the malicious code uses in an attack; each message constitutes a separate security event. Each instruction is an action applied to a payload, which is a potential target. Unauthorized mobile code executions produce unauthorized results. Where do mobile code security measures fit in? A sandbox contains code execution. It protects a target machine from unauthorized access. A firewall’s goal is to protect a target subnetwork from unauthorized access. PCC’s goal is to allow a target machine to reject offensive code before executing it.
58
MAY • JUNE 2004
www.computer.org/internet/
Although a case could be made that such approaches remove vulnerabilities, in essence they simply protect target machines, or networks, from attacks. Code signing works at a different level. By identifying a program’s source, unsafe code can be rejected. Alternatively, if code is found to be malicious, the signature can be a forensics tool for proving culpability. Other approaches for protecting code also concentrate on fortifying components. Code obfuscation and computing with encrypted functions, for example, protect mobile code programs by making them difficult to decipher. Tamper-proof hardware makes system corruption impossible, removing an entire class of vulnerabilities. This allows both host and code to trust the tamper-proof component. In the ideal case, this protects both from being targets of attack. The use of itineraries, redundancy, and audit trails works at an entirely different level. Although
IEEE INTERNET COMPUTING
Mobile Code Paradigms
each single event in a mobile code intrusion is of relatively minor importance, the consequences of the aggregate behavior can easily become catastrophic. These approaches look at aggregates of messages, and thus work closer to the taxonomies’ incident or behavior levels.
8.
9.
Conclusion Most security measures fortify potential targets of attack. While this is important and necessary, we must consider the larger picture. Many email viruses perform actions allowed by a sandbox. Worms primarily exploit software-implementation errors. It’s unlikely that software design will soon (if ever) advance to the point where we’ll automatically foresee abuses or consistently produce bug-free systems. The Internet infrastructure enables distributed attacks. Fortifying individual processors now is akin to fortifying individual positions after the Blitzkrieg: it will not solve our problems. Distributed attacks have become widespread, and we need distributed countermeasures to defend against them. Concentrating on fortifying individual processors is like building a stronger Maginot line after World War II. Let’s not make that mistake. Acknowledgments The content of this article is based on work supported by the US Office of Naval Research under award no. N00014-01-10859. The opinions, findings, and conclusions are those of the author and do not necessarily reflect the views of the Office of Naval Research.
References 1. P. Sapaty, Mobile Processing in Distributed and Open Environments, Wiley & Sons, 1999. 2. A. Fuggetta, G.P. Picco, and G. Vigna, “Understanding Code Mobility,” IEEE Trans. Software Eng., vol. 24, no. 5, 1998, pp. 342–361. 3. D. Milojicic, F. Douglis, and R. Wheeler, eds., Mobility: Processes, Computers, and Agents, Addison-Wesley, 1999. 4. R.R. Brooks, and N. Orr, “A Model for Mobile Code Using Interacting Automata,” IEEE Trans. Mobile Computing, vol. 1, no. 4, 2002, pp. 313–326. 5. D. Wu, D. Agrawal, and A. Abbadi, “StratOSphere: Unification of Code, Data, Location, Scope, and Mobility,” Proc. Int’l Symp. Distributed Objects and Applications, ACM Press, 1999, pp. 12–23. 6. D.L. Tennenhouse et al., “A Survey of Active Network Research,” IEEE Comm. Magazine, vol. 35, no. 1, 1997, pp. 80–86. 7. D.A. Halls, Applying Mobile Code to Distributed Systems,
IEEE INTERNET COMPUTING
10.
11.
12.
13.
14.
15.
doctoral dissertation, Dept. of Computer Science, Univ. of Cambridge, 1997. C.-F. Tschudin de Bâle-ville, On the Structuring of Computer Communications, doctoral dissertation, Informatique, Université de Genève, 1993. N. Orr, A Message-Based Taxonomy of Mobile Code for Quantifying Network Communication, master’s thesis, Dept. of Computer Science and Eng., Pennsylvania State Univ., 2002. J.M. Zachary, “Protecting Mobile Code in the Wild,” IEEE Internet Computing, vol. 7, no. 2, Mar./Aprl. 2003, pp. 78–82. A.D. Rubin and D.E. Geer, “Mobile Code Security,” IEEE Internet Computing, vol. 2, no. 6, Nov./Dec. 1998, pp. 30–34. T. Sander and C. F. Tschudin, “Towards Mobile Cryptography,” Proc. IEEE Symp. Security and Privacy, IEEE CS Press, 1998, pp. 215–224. W. Jansen and T. Karygiannis, Mobile Agent Security, NIST Special Publication 800-19, Aug. 1999; http://csrc.nist.gov/ mobileagents/publication/sp800-19.pdf. S. Loureiro and R. Molva, “Mobile Code Protection with Smartcards,” Proc. 6th ECOOP Workshop on Mobile Object Systems, Springer-Verlag, 2000; http://citeseer.nj.nec.com/ 408410.html. J.D. Howard and T.A. Longstaff, A Common Language for Computer Security Incidents, tech. report SAND98-8867, Sandia Nat’l Labs, 1998; www.cert.org/research/ taxonomy_988667.pdf.
R.R. Brooks is an associate professor of electrical and computer engineering at Clemson University in Clemson, South Carolina. His research interests include network security, sensor networks, and self-organizing systems. He has a BA in mathematical sciences from the Johns Hopkins University and a PhD in computer science from Louisiana State University. He is a senior member of the IEEE. His books Disruptive Security Technologies with Mobile Code and Peer-to-Peer Networks and Frontiers in Distributed Sensor Networks (with S.S. Iyengar) will be published by CRC Press in 2004. Contact him at
[email protected].
Write for Spotlight potlight focuses on emerging technologies or new aspects of existing technologies, that will provide the software platforms for Internet applications. Spotlight articles describe technologies from the perspective of a developer of advanced Web-based applications. Articles should be 2,000 to 3,000 words. Guidelines are at www.computer.org/internet/dept.htm. To check on a submission’s relevance, please contact department editor Siobhán Clarke at
[email protected].
S
www.computer.org/internet/
MAY • JUNE 2004
59