THE UCSD ACTIVE WEB A Proposal to the National Science Foundation CISE Research Infrastructure Program1
Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0114 Principal Investigators Joseph Pasquale, Rik Belew, Jeanne Ferrante, Russell Impagliazzo, Venkat Rangan Faculty Investigators Scott Baden, Mihir Bellare, Fran Berman, Walt Burkhard, Brad Calder, Larry Carter, C.K. Cheng, Gary Cottrell, Flaviu Cristian, Charles Elkan, Joseph Goguen, William Griswold, William Howden, T. C. Hu, Ramesh Jain, Keith Marzullo, Alex Orailoglu, Yannis Papakonstantinou, George Polyzos, Ben Rosen, Dean Tullsen, Victor Vianu, Rich Wolski, Bennett Yee October 26, 1997 ABSTRACT The UCSD Department of Computer Science and Engineering recently submitted a proposal for large-scale Research Infrastructure funding to the National Science Foundation. The theme of the proposal is the “Active Web”, a next-generation World Wide Web premised on the support for active content, content that is rich in multimedia and references to other other objects, and for mobile agents, programs that can move about and execute on remote servers, carrying out requests at a distance on behalf of users. These servers are no longer passive databases as in today's Web, but context-sensitive "knowledge networks" that contain all kinds of active content; between the servers themselves there is a constant exchange of agents, which add to, refine, form interconnections, and make consistent, the distributed content. In the Active Web, there is a high degree of resource sharing, usage is bought and sold as in a market economy, and security is paramount. To realize this vision, the Department is taking a coordinated approach, focusing and integrating its strengths in network and operating systems design, security, multimedia, content-based search, scientific metacomputing, and, computer and software engineering.
1 This document is a revised version of the actual proposal submitted to NSF on October 17, 1997. Please direct questions or comments to Joseph Pasquale, who can be contacted by email at
[email protected], and by telephone at 619 534-2673.
1 This document was created with FrameMaker 4.0.4
A. TABLE OF CONTENTS
Section
Page
B: Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 C: Research Infrastructure Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 C.1 Summary of Requested Experimental Facilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 C.2 Five-Year Development of Research Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 D: Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 D.1 Current Departmental Research Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 D.2 Description of Requested Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 D.3 Rationale for Requested Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 D.4 Equipment Access Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 D.5 Space for Equipment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 D.6 Institutional Cost-Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 E: Management Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 F: Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 G: Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 G.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 G.2 The UCSD Active Web Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27 G.3 Faculty Research. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36 H: Staff Credentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71 H.1 Curriculum Vitaes of Principal Investigators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71 H.2 Biographies of Faculty Investigators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83 I: Results from Prior Awards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .90
2
B. EXECUTIVE SUMMARY
The UCSD Department of Computer Science and Engineering is seeking its first NSF Research Infrastructure grant totalling $2.4 million, with $1.8 million being requested from the NSF. We are a department of approximately 30 faculty, many of whom do work in the following experimental areas that require large-scale computer, communication, and storage resources: • Systems: involves the development of new processing and communication structures for operating systems, distributed systems, and networking, that support scalability, heterogeneity, fault tolerance, and high performance • Security: involves the development of methods for maintaining privacy and integrity, and determining authenticity, of both communicated messages and mobile programs, to support large-scale resource sharing and electronic commerce between untrusted domains • Multimedia: involves the design of distributed applications that process, communicate, and store, video, images, graphics, audio and text, and the design of systems for their support, which generally requires quality of service provisions • Content-based Search: involves finding new ways to search for patterns in large data objects that reside in disparate databases, and where the search is often content-based, requiring significant processing to determine relevance • Scientific Metacomputing: involves the development of tools that support large-scale distributed scientific computations, many of which are highly data-intensive and require large numbers of processors that intercommunicate • Computer and Software Engineering: involves the analysis, simulation, and testing, of very large and complex systems, both hardware (VLSI circuits) and software (multi-component programs) We are seeking a heterogeneous mix of equipment that will serve multiple projects, and whose combinations can be configured dynamically to meet individual project demands. A summary of the equipment includes: • Cluster of 4 Digital “Rawhide” 4100 (quad) multiprocessors, addressing the requirements of our more computation-intensive applications • Sun “Enterprise” 3000 (quad) multiprocessor, with 1 terabyte RAID and 8 terabyte tape tertiary storage from ISS Corp., acting as a hierarchical storage (with advanced compute
3
capabilities) server to address the requirements of our more memory-intensive applications, and for long-term storage • PC-based (quad) multiprocessor with special-purpose multimedia devices (MPEG-II encoder/decoder, DVD writable disk), providing advanced multimedia processing capabilities and supporting special-purpose multimedia devices • 30 PC-based multimedia network computers (that rely on network servers for their larger computation and memory resource requirements) equipped with MPEG decoders, and will support multimedia user interfaces • Hierarchical network connecting all the equipment and connecting to external networks: a high-speed central hub that support numerous types of network connections (e.g., fast Ethernet, ATM, etc.), connecting a set of 10 programmable routers (using very fast dual-processor PC’s) that will run our experimental protocols, one per laboratory, each of which connects laboratory machines using fast 100 Mbps Ethernet • 43 secure coprocessors that will be installed on all servers, routers, and network computers, to support secure agent-based computing, a fundamental feature of our approach to decentralized access of resources Our infrastructure design is premised on our vision for a more active (World Wide) Web. While today’s Web has enabled the sharing of information by users on a truly global scale, its simple passive model of operation, while powerful and easy to implement and therefore propagate, is limiting its capabilities in content management, large-scale integrated resource sharing, and ultimately, user interactivity and productivity. We envision a much more dynamic, Active Web, supporting a much greater degree of interactivity between users and the active content it will provide. Everything in the Active Web is active, including the units of communication which are programs. The Active Web offers capabilities and modes of computing that are at best in primitive development in today’s Web: network computing, where users rely on power contained in the network to carry out their computing, and secure computing, which provides fine-grained programmable confinement of and access to content. To realize this vision, a flexible architecture for resource sharing between untrusted domains is required. This architecture must allow resources to be aggregated into higher-level computational and communication structures, with predictable levels of performance and reliability. We view the development of our infrastructure as a microcosm of how resource sharing can successfully be implemented on a World Wide Active Web.
4
We will organize our infrastructure as an Active Web of resource servers, that supply the power of resources to applications via a “middleware kernel,” i.e., a user-level distributed software control system that supports common mechanisms. The architecture of the UCSD Active Web supports a high-degree of programmability, so that the system itself can be reconfigured, allocating any number (including all) of the resources into experiment-specific computation and communication structures to meet the needs of our researchers. It also supports flexible security and a token economy for resource sharing. An important feature of the UCSD Active Web is support for agent-based computing. Agents are simply mobile programs that visit servers to access resources; they are the basis for transporting control and aggregating resources in our system. Examples of the research issues include: compiler support for mobility and efficient execution of Java agents; operating system support for application-controlled fine-grained resource allocation to achieve predictable performance; security of agents given malicious servers, and security of servers given malicious agents. Users Multimedia
Content-based Search
Scientific Metacomputing
Computer and Software Engineering
THE UCSD ACTIVE WEB INFRASTRUCTURE
System developers
Systems
Security
Figure 1: The UCSD Active Web infrastructure will support the requirements of our multimedia, content-based search, scientific metacomputing, and computer and software engineering groups. The system developers will be our systems and security groups, who will also carry out their research using this testbed. The system will support the needs of our researchers in the multimedia, content-based search, scientific metacomputing, and, computer and software engineering groups. As shown in Figure 1, these groups are the users of the system, driving its requirements, while the remaining groups, systems and security, will be the developers of the system’s design and implementation to meet these requirements.
5
In addition to supporting individual research projects, the UCSD Active Web will promote a much greater level of cross-collaboration between the groups. The demanding requirements of our users drive the system design, and the availability of a common experimental testbed will promote the transfer and sharing of tools and results between the projects. The opportunities for collaboration are certainly present: the content-based search group seeks to incorporate more aspects of multimedia computing; the multimedia group seeks more help from the systems group to support better quality of service (i.e., guarantees on performance such as delay and throughput bounds); the scientific metacomputing group is already developing agent-based schedulers, and will benefit from a more systematic approach to supporting mobile agents; and, all the groups are concerned about security, reliability, and performance.
6
C. RESEARCH INFRASTRUCTURE DESCRIPTION
C.1 Summary of Requested Experimental Facilities We are requesting four general categories of equipment: • Processing power: a set of three multiprocessor-based compute servers of heterogeneous architectures (a 4-node cluster of Digital Alpha “Rawhide” 4100 quad multiprocessors, one Sun Ultrasparc-based “Enterprise” 3000 quad multiprocessor, and an Intel Pentium-based quad multiprocessor) to support the variety of applications we have • End-user access: 30 multimedia network computers (built out of PCs) for our end users (faculty and students conducting research), whose power will be devoted primarily for multimedia user interface functions, and will rely on the compute servers for executing applications • Network connectivity: a hierarchical network of routers, whose first-level central hub (a highperformance switch) can support a variety of networks (e.g., fast and standard Ethernet, ATM, FDDI), and will connect second-level programmable routers (high-performance dual-processor PCs) that service each laboratory via fast Ethernet (100 Mbps). Each laboratory will also use fast Ethernet internally (as well as existing networks, which include FDDI and ATM). The central hub also provides connections to the large-scale servers, and to external networks, the most important one being the new campus ATM network, giving us significantly better access to the UCSD Library, the San Diego Supercomputer Center, and other research groups, as well as being our link to the rest of the Internet • Storage space: one of the compute servers, the Sun Enterprise 3000, will also serve as a hierarchical storage server, to which will be attached a 1 terabyte RAID system and an 8 terabyte tape library system C.1.1 Computing The three compute servers are targeted for two primary classes of applications: • large compute-intensive applications: these include scientific metacomputing (i.e., the use of clusters of computers, multiprocessors in our case, to execute an application whose tasks have been distributed), and various design, analysis, and testing tools developed by our computer engineering (e.g., CAD) and software engineering groups
7
• large memory-intensive applications: these include multimedia applications, that process multiple streams of video and audio, often in real time, and content-based search applications, that try to construct indexes for large semi-structured databases or try to find patterns in large datasets These categories are certainly overlapping: many of the memory-intensive applications require large amounts of processing, and vice-versa. For example, many of our scientific metacomputing applications are data-intensive simulations of physical processes that use grids of fine resolution, and the finer the grid, the better; many of our content-based applications involve searching for patterns in large datasets (e.g., genomes), and where the determination of a “match” involves a very complex computation; many of our multimedia applications involve the aggregation (or “mixing”) of multiple video streams, a computationally-intensive process operating on very large data objects. In addition, these applications have been developed for different architectures, as some architectures are more suited for the type of processing required, e.g., MMX processors and an MPEG-II encoder/decoder for video. However, these compute servers will not be allocated to support only a single application (even though, as we will describe in Section D.3, we have applications that individually justify the power of each compute server); they will be a resource that is part of our Active Web and available to all the groups. This does not mean the resources will be accessed in a “free-for-all” and unmanaged manner that can lead to chaos, nor will they be under a centralized management regime that can be highly bureaucratic and inflexible, and that inevitably discourages use. Our approach to sharing and arbitration when there is high competition for resources is an important issue for us, and is based on a set of highly decentralized mechanisms, including agentbased computing and a token economy. As part of our plan to mimic a real decentralized Active Web, security is an important added ingredient to our requirements. All our machines (servers, end-user network computers, routers) will have a “secure coprocessor”, which enables various security functions to support mobile agentbased computing. A primary function of the Active Web’s middleware kernel is to support mobile agents; they are the primary transport for distributing computations in a secure way. Their protection is achieved via the secure coprocessors.
8
C.1.2 End User Network Computers The end user will access the resources of the Active Web via multimedia-capable network computers. In today’s market, the “network computer” is only just emerging, and is not mature enough for the kinds of experimentation we wish to pursue, especially regarding multimedia end-systems applications. Consequently, we have chosen PCs as the basis for our network computer, outfitted with secure coprocessors and with MPEG-II decoders. PCs are low in cost, and also offer programming environments that are becoming relatively mature. C.1.3 Networking The networking equipment must support the following goals: (1) provide fast connectivity for the UCSD Active Web infrastructure; (2) provide programmability so that we can test and evaluate experimental network protocols, e.g., that provide real-time guarantees, or support new forms of multicasting for inter-agent group communication; (3) provide good connectivity to the rest of the world, e.g., the campus network, the Internet, and to SDSC and the UCSD Library (with the possibility of supporting direct links). Routers to Laboratories
PC-based Multimedia Server
R
Laboratory
R R R R
Digital Rawhide Cluster
laboratory machines
Router HUB R
Sun Enterprise Server
R R
ISS RAID and Tape
R R
Campus ATM switch
Routers to Laboratories
Figure 2: The UCSD Active Web network will connect all laboratory machines via programmable routers that connect to a central switching hub. The hub also connects to the large-scale servers, as well as the campus ATM network (and other external networks).
9
Consequently, we have chosen a hierarchical approach (see Figure 2), where a high-speed hub serves as a central switch to laboratory routers, large-scale servers, and to our new high-speed campus network ATM switch. The hub necessarily supports a variety of networks, as our external connections are ATM, FDDI, and in some cases, specially-designed links; our internal networks (those connecting our laboratories) will be based on fast Ethernet (100 Mbps2), because of the maturity, capability, and low cost of this technology. The laboratory routers will be constructed using fast dual-processor PCs, and will execute our experimental protocols. The two processors will allow one to execute non-network-related operating system functions and the other to be devoted to network protocol processing (which may execute in user space). These PCs will also execute experimental locally-developed operating system kernels that support fast inter-domain data transfers (required to achieve high performance for user-space protocol processing distributed over multiple protected domains). C.1.4 Storage The storage requirements for our applications are massive. For example, the primary focus of our multimedia group is video processing. A typical video object such as a movie requires approximately one gigabyte of storage. Thus, one quickly arrives at the requirement of one terabyte to store a video database of 1000 movies. Other examples include the indexing of large symbolic content databases used by our content-based search group, large datasets used in our scientific metacomputing applications, large log files created by traces in our experimental systems and security work, and the large programs and circuit descriptions used by our software engineering and computer engineering groups, respectively. Indeed, the sum of these requirements greatly exceeds one terabyte, and so our terabyte RAID secondary storage will act as a cache for our 8 terabyte tape-based tertiary storage. The server that will manage these storage systems, the Sun Enterprise 3000 quad multiprocessor, has substantial computing power. This is important, as we expect compression agents to execute on this server, to reduce the size of transmitted data objects. Using agents, users can develop their own, or use others’, compression schemes (appropriately designed for their application, as what works best for video does not work best for text); consequently, the ability of our storage server to support these compute-intensive agents is an important requirement.
2 Our convention is to use a small “b” when we mean bits, and a capital “B” when we mean bytes. So, “Mbps” means megabits per second, whereas “MB” means megabytes.
10
C.2 Five-Year Development of Research Infrastructure Our Active Web research infrastructure will be developed in stages. The first stage is to get the base system software running, including the middleware kernel that will support system-wide agent-based computing and security. This will require the major equipment items, i.e., the various servers and a portion of the network, to be in place. We expect the first stage to take one year, and so only these items (which still form the bulk of the funds) will be purchased during year 1. The groups that will be most involved at this stage are the systems and security groups; they will develop the infrastructure software, much of which has already been developed on a much smaller scale. (However, this does not mean the equipment cannot be used on a stand-alone basis, i.e., not as part of a coherent working Active Web, by other groups during this early phase. Indeed, they will be getting experience with using the systems, and use will be coordinated by the PI management team. However, such managed coordination will not be necessary once the UCSD Active Web is in place). In year 2, the infrastructure will be ready for preliminary use by the various “user” groups: multimedia, content-based search, scientific metacomputing, and, computer and software engineering. We will acquire a portion of the end-user multimedia network computers for the initial set of users, to support generally one representative from each group. The necessary additional networking infrastructure will also be obtained. During this stage, we will evaluate how well our systems and security mechanisms are working, and allow the users to get their first impression of a working Active Web. (Actually, we expect some more-than-willing/more-than-patient users to be involved during stage 1, to get immediate feedback as we develop the various mechanisms). In year 3, the Active Web infrastructure will be ready for all our users. In fact, in each of years 3, 4, and 5, there will be a different theme, focusing on the different classes of applications. Year 3 will focus on multimedia computing, as these applications, e.g., collaborative tools such as videoconferencing, video file retrieval and playback, etc. are of wide interest to all our researchers. In year 4, the focus will be content-based search; the tools developed by this group, e.g., agent-based searches of large symbolic databases using queries-by-content, are also of wide interest. Finally, year 5 will focus on our computationally-intensive applications, both scientific metacomputing and the tools developed by our computer and software engineering groups. The details of exactly when each equipment item will be purchased are described in Section D.2.
11
D. RESOURCE ALLOCATION
D.1 Current Departmental Research Equipment Our current research equipment is comprised of a variety of computers connected mostly by 10 Mbps Ethernet. The computers are as follows: • 65 Sun Sparcstation I and II class uniprocessor workstations or servers • 56 PC’s, the majority being pre-Pentium (80386 and 80486) and a small number of Pentiums • 34 Digital DECstation and Alpha class uniprocessor workstations • 22 X-terminals • 25 Macintosh computers, ranging from 68030-based MACci’s to PowerPC-based machines • 4 Hewlett-Packard workstations • 2 SGI workstations This equipment is distributed throughout our ten major research laboratories (there are additional laboratories that are smaller in scope, and do not have major resource requirements). The research in the laboratories span our six major research groups. All of the requested equipment supersedes the corresponding existing equipment significantly, by 1-2 orders of magnitude, in terms of rates, capacities, etc. The same holds true for existing networking equipment (other than the Computer Systems Laboratory, which has some more advanced networks for experimentation). Connectivity between laboratories is based on 10 Mbps Ethernet. The cumulative storage capacity of all the research laboratories is approximately 250 gigabytes. Some of our faculty, especially those in scientific metacomputing, make use of facilities at the San Diego Supercomputer Center (SDSC), the leading edge site for the National Partnership For Advanced Computational Infrastructure (NPACI). The mission of SDSC/NPACI is “the development and deployment of production computational infrastructure for the nation's academic research community, serving all disciplines.” While access to this infrastructure is valuable to us, we need the flexibility of the research infrastructure within our department to allow exploration of early stage ideas; these ideas often involve the use of locally-developed system-support software (e.g., operating system modifications, experimental message-passing systems) that are difficult or impossible to install on SDSC systems, since they are necessarily run as production systems and are not amenable to kernel-tampering.
12
However, when such ideas turn into successful research results, SDSC offers a natural mechanism for converting them into supported production infrastructure, thus making them available to the wider community. A recent example of this synergy is the incorporation of the work on MEME (see Section G.3.4.1) by Charles Elkan, one of our faculty in the Content-based Search group, into a production computational biology project now running at SDSC. In the meantime, new experimental implementations of MEME involving highly-distributed heterogeneous computing and the use of mobile agents are desired, and require interactions with our systems and security researchers; this will be greatly facilitated by having a common departmental high-performance experimental research infrastructure, and a primary goal for the design of the UCSD Active Web. The end result is that our requested departmental infrastructure will enhance and accelerate such examples of technology transfer, thus directly leveraging NSF's funding for both the PACI program and the CISE Research Infrastructure program. D.2 Description of Requested Equipment What follows is an itemized list of all our requested equipment, software, maintenance, and technical personnel. D.2.1 Servers • 4 (four) Rack Mounted “Rawhide” 4100 Digital Alpha Servers Each 4100 Alpha Servers consists of EV5.6, 600 Mhz processors, 2GB of memory per server and 36GB of local disk storage (four 9GB drives), and 8 (eight) Digital Memory Channel 2.0 connected to 1 (one) 8-port crossbar switch. To be purchased in Year 1 (required for startup). • 1 (one) Sun Microsystems E3002-C50 “Enterprise” E3000 Server This includes 1 (one) Sun Microsystems E3002-C50 E3000 Server Based Package, tower enclosure, SunCD12, power/cooling module, CPU/memory board, two 250MHz/4Mb UltraSPARC, Solaris server license; 1 (one) 2601A CPU/Memory Board for additional CPUs; 1 (one) 954A Power/Cooling Module, 300W; 2 (two) 2550A UltraSPARC modules, 250MHz, 4MB Cache; 4 (four) 7022A 256MB ECC memory expansions (8 x 32Mb SIMMs); 2 (two) 2610 SBus I/O Boards, three empty SBus slots, two empty 25Mb/sec. Fibre Channel sockets, one Fast/Wide SCSI interface, one 10/100 Mb/sec. Ethernet (Twisted Pair or MII) interface; 1 (one) 5251A Internal 9.1Gb 7200 RPM Fast/Wide SCSI-2 Disk Drive. To be purchased in Year 1 (required for startup). • 1 (one) Compaq ProLiant 5000 6/200-1X multiprocessor This includes 1 (one) Compaq ProLiant 5000 6/200-1X w/ 128MB memory; three additional PPro-200 processors #219043-001; Process Board #219420-001; 512MB kit #219285-001;
13
9GB Disk #199882-001; Optibase Forge MPEG-2 encoder; VideoPlex MPEG-2 decoder; DVD-Writable Pioneer, bundled w/ Gear Software. To be purchased in Year 1 (required for startup). • Software Licences (campus-wide) These are software “site licenses” for various operating systems and applications that are applicable to the purchased equipment. To be purchased each year. D.2.2 End-User Multimedia Network Computers • 30 (thirty) Compaq DeskPro 6000 P-II 266 32MB #284050-002 In addition to the base unit, each PC contains 100MbaseT module #225435-001, 32MB additional Mem. #243013-001, 2MB Video RAM #223335-001, ViewSonic 17GS (17") Monitor, and a RealMagic Hollywood MPEG-2 decoder. Purchasing plan: 10 in year 1 (system group), 10 in year 2 (initial users), 10 in year 3 (extended set of users). • 43 (forty-three) IBM Secure Coprocessors These are coprocessors that will be placed in all the multimedia network computers, as well as all the servers and all the routers. They support security functions, and enable the building of “tamper-proof” agents. They form the basis of our security strategy. Currently, they are only available for the PCI bus. If, at the time of purchase, they are not available for other busses, we will use a PC front-end strategy. Purchasing plan: 23 in year 1 (all servers, routers, and initial set of multimedia network computers), 10 in year 2 and 10 in year 3 (for remaining set of multimedia network computers). D.2.3 Networking • 1 (one) Fore PowerHub 8000 #8105-01 In addition to the 5-slot, chassis, and packet engine base unit, this includes a 16 port 100baseT #7380-00, ATM - 2port #7401-00, Media adapter #7280 (multimode). To be purchased in Year 1 (required for startup). • 10 (ten) 3Com SuperStack-II 3000 100baseT Switches Each switch has 12 ports and 10/100 autosense. Purchasing plan: 5 in Year 1 (required for startup), 5 in Year 2 (these and those purchased in Year 1 now to connect labs). • 10 (ten) ALR Revolution 2X PC-based Routers Each router is an ALR Revolution 2X PC with dual Pentium-II/266MHz processors, 32MB Memory DIMM - ECC, 2 Fast Ethernet NIC, Seagate Cheetah 4.6GB Disk, 15" Monitor, additional 64MB Memory. Purchasing plan: 5 in Year 1 (required for startup), 5 in Year 2 (these and those purchased in Year 1 now to connect labs).
14
D.2.4 Storage • 1 (one) 1,000 GByte RAID System (from ISS Corp.) Includes 1 (one) R-19-60 19" Heavy Duty Rack, 60" high; EK-9 Kingston Technology DS500, 9 Bay RAID enclosure including Dual 300; Watt Load Sharing Hot Swappable Power Supply; Two Hot Swappable cooling fans, 4 x cooling fans; 1 (one) CRD05500-8 CMD, CRD5500 w/8 drive channels and one host channel (wide differential SCSI); 1 (one) IC256 256Mb Internal Cache for CRD 5500; 61 (sixty one) RKDE1000WD Kingston Technology Removable Drive mechanism wide differential SCSI; 61 (sixty one) IBM-18 IBM 18Gb, 3.5" 7200 RPM wide differential SCSI disk drives; 6 (six) IKDS500 Interconnection kit for DS500; 7 (seven) LUN Cable. Internal Multiple LUN cable set; 1 (one) CW3W SuperFlex 3' Round Shielded WideSCSI cable (68 pin High-Density connector); 1 (one) Active terminator, wide single-ended; plus miscellaneous items (e.g., documentation). To be purchased in Year 1 (required for startup). • 8TB AIT Tape Library System (from ISS Corp.) Includes Qualstar 46120 Library with 1 tape drive; 120 tape capacity: 3.15TB Native / 8.19TB w/compression; additional tape drive, Sony SDX-300C; 120 tapes Sony SDX-T3C; Platinum Technology NetArchive SW system. To be purchased in Year 1 (required for startup). D.2.5 Technical Staff • Programmer/Analyst, Step IV (Ph.D. level staff programmer/administrator), 100% time • Programmer/Analyst, Step II (B.S. level programmer): the actual yearly percentage time will increase as follows: 25% (y1), 33% (y2), 50% (y3), 67% (y4), 75% (y5) The primary technical staff-person will aid in the programming and management of the UCSD Active Web infrastructure, and will be working especially closely with the systems and security groups. This includes assisting in the implementation of the middleware kernel and resource servers. This staff-person will also work with faculty and students in the other groups to help them develop Active Web applications and services. The secondary technical staff-person will aid in day-to-day operations. This includes monitoring system operation, overseeing backups, and identifying faulty equipment and working with vendor maintenance personnel. D.3 Rationale for Requested Equipment In C.1, we provided an overview of how the various classes of equipment we are requesting, i.e., compute servers, end-user computers, networking, and storage, satisfy the broad requirements of our faculty research. We now present some actual examples of faculty research requirements to
15
justify our requested infrastructure. In G.3, we describe our faculty experimental research and further motivate the infrastructure requirements and uses. • Dean Tullsen and Brad Calder, members of our Systems group, are working on multi-threaded architecture design (see Section G.3.1.4), which requires simulating the steady-state computational behavior of a workload consisting of up to a thousand applications, each of which requires 1 billion instructions, on multi-threaded architectures under their design. Since each emulated instruction requires roughly 1000 real instructions, a single experiment requires the execution of 1015 real instructions. This would take about 60 days on a 200 MIPS workstation; using the processing power of our requested infrastructure, this collapses to a day or less (still a lot, but a significant improvement). With the requested complex of faster network-integrated multiprocessors, their ability to evaluate multiple architectural options is greatly enhanced by the faster turn-around times for these simulation runs during the design and analysis phases of the project • Ramesh Jain, a member of our Multimedia group, is researching Multiple Perspective Interactive Video (see Section G.3.3.4), where multiple video streams that are different-angled views from separate sources of the same scene are processed and aggregated to construct a single coherent 3-D view. This aggregation is currently done on a single machine (first, because the algorithms are still not distributed, and second, because of the lack of multiple high-performance machines), and requires many hours to aggregate a few minutes worth of video from just three sources. The goal is to process up to eight sources within human response times, i.e., tens of milliseconds. This system will push the limits of the video-processing (MMX processors, MPEG-II encoder/decoders, and general-purpose processors), the large storage, and highspeed networking (MPEG-compressed video can consume 1-10 Mbits per second of video, per stream), capabilities of our Active Web infrastructure. • Scott Baden, a member of our Scientific Metacomputing group, has developed the KeLP system (see Section G.3.5.1), a toolkit for scientific applications. One such application is a firstprinciples simulation of real materials, which is a Computational Grand Challenge, and is being developed with Professor John Weare of the UCSD Dept. of Chemistry and Biochemistry. These computations are extremely memory-intensive and benefit greatly from multi-processing. Our requested 4-node Digital Rawhide cluster, with 2 GB memory per node, would match the memory capacity of 32 nodes of a Cray T3E (with 256 MB per node), a machine to which the application has already been ported (but for which access is highly limited). The Rawhide
16
cluster has a more favorable balance of memory and processing capacity than the T3E, and as a result will compute a medium-scale problem more efficiently. This enables a sensible strategy of carrying out performance tuning on local resources, and then using more advanced machines for larger-scale computations. • Gary Cottrell and Rik Belew, members of our Content-based Search group, are working on Adaptive Lenses (see Section G.3.4.5), an information retrieval tool based on the idea of matching a system ranking of documents with an interested user’s or user community’s ranking. Each lens needs to maintain an index from K keywords to D documents. Assuming K = 50 thousand, and D = 10 million, an index would require 1 GB, assuming the use of sparse representations. Typical indexing algorithms require O(N3) algorithms with N scaling with keyword/ vocabulary size. This suggests 1015 instructions/indexing operation (again, achievable in roughly a day with our requested infrastructure). The fact that we are exploring adaptive lenses implies that we must watch how these indices change over time, through use. Assuming a yearlong experiment during which time an index is entirely rebuilt once/month and incrementally updated weekly (at 0.25 full indexing expense), we will require 144 indexing operations over the year if each lab, along with the entire department, has a lens. Statistical analysis and crossvalidation of inductive methods like these can easily increase this number another order of magnitude! • Bennet Yee, a member of our security group, is using secure coprocessors, i.e., tamper-proof chips with processing capabilities appropriate for security functions, as a cornerstone for building electronic payment systems in physically exposed environments (see Section G.3.2.3). The idea of secure coprocessors, as developed in Yee’s thesis, potentially has wide applicability to many other areas of security for the Active Web. An example is the mutual distrust problem for agent-based computing: just as servers fear agents will attempt to subvert them, the agent’s user is concerned that the server might subvert the agent. All of our machines will be equipped with secure coprocessors. • C. K. Cheng, a member of our Computer and Software Engineering group, is working on interconnect-dominated analysis and physical layout synthesis (see Section G.3.6.1). It is estimated that the number of transistors per chip will increase to 100 million in the next six years. Assuming each transistor uses 1 KB, high-performance layout synthesis tools will require at least 100 GB of memory to work with such large circuits. Our storage infrastructure will allow us to develop large-scale fast-access hierarchical memory servers to support such requirements.
17
• Numerous other projects (described in Section G.3) require a heterogeneous network-integrated system to experiment with agent-based solutions to their problems. This is a key provision of our Active Web architecture. D.4 Equipment Access Strategy The hierarchical network, as described in Section C.1.3, will provide high-speed access within the department. The central hub also connects to the high-speed campus network ATM switch, providing fast (10-100 Mbps) access to the entire campus and to SDSC, which is a major node on the Internet. Off-campus connectivity is provided via an extensive university dial-up service (and most cable companies in San Diego are now offering cable-modem services). D.5 Space for Equipment The CSE department has adequate space to house the proposed infrastructure. The servers and network central hub will be housed in our 1300 sq. ft. machine room, which is properly outfitted with adequate power and air conditioning. Our laboratories, whose combined space is close to 10,000 sq. ft., will house the end-user equipment and the routers. D.6 Institutional Cost-Sharing UCSD is providing $601,050 in matching funds, which meets the 33% requirement. These matching funds are broken down as follows: • $145,000 by CSE Department for equipment • $276,050 by CSE Department for personnel (technical staff) • $180,000 by School of Engineering for equipment
18
E. MANAGEMENT STRUCTURE
The project will be managed by the five PIs, which as a group are responsible for representing the interests of the various research groups of the faculty: • Joseph Pasquale: systems • Rik Belew: content-based search • Jeanne Ferrante: scientific metacomputing and, computer and software engineering • Russell Impagliazzo: security • Venkat Rangan: multimedia The PIs will serve as a “board of directors” to establish broad policy guidelines on hardware equipment usage. However, specific policies for “resource servers” (which are the logical resource objects of the UCSD Active Web, see Section G.2.1) will be established by the developers of these servers, supporting the goal of decentralized management of resources. For example, a broad policy guideline may be: a fraction of a server’s local disk must be reserved for system performance monitoring (part of the low-level workings of the UCSD Active Web middleware kernel), the remainder is available for the creation of disk block resource servers. Using those remaining resources, two logical disk blocks servers might be then established, one that is designed to meet the needs of specific applications, and another to provide a general low-level disk block service. Their management policies may be different, the users to whom they cater may be different, etc. To experiment with resource sharing, we are developing a “token economy” where users can buy/ sell resources based on supply and demand. Thus, continuing with the example above, different resource servers may even have different pricing policies (including not charging, pay-per-use, flat-rate, we-pay-you, etc.). The PIs will serve as a limited “federal reserve board” for this token economy, determining the value of tokens and how many are circulating. Furthermore, as each PI represents a different group (one represents two groups that are closely related regarding infrastructure requirements), the interests and concerns of each group will be heard so that the broad policy guidelines can be improved over time. The PIs will meet every 1-2 weeks, and have already developed a good working relationship in having developed the proposal and bringing their groups views “to the table”. The PI Pasquale will oversee overall project management. This means presiding over meetings, working with the technical staff, taking responsibility for the overall project being on track, seeing
19
that students receive mentorship, working with vendors, etc. Leadership will be shared regarding the development of various policies: usage, management, maintenance, evaluating new technology for mid-course corrections in equipment purchases, etc. This means that the PIs will develop consensus for the various policies, having gotten input from their respective groups, and making final determinations by majority vote. Furthermore, each year will have a different research focus. Year 1 will focus on putting in place the systems and security mechanisms to establish a working Active Web, and year 2 will focus on getting experience with using the system with an initial set of users (distributed over the various research groups). Breaking this down further, we expect year 1 to emphasize systems development, and year 2 to emphasize security. Year 3 will focus on multimedia; year 4 on content-based search, and year 5 on the high-performance computing groups, i.e., scientific metacomputing and computer/software engineering. Consequently, research leadership will alternate amongst all the PIs, corresponding to their areas, i.e., Pasquale in year 1 (systems), Impagliazzo in year 2 (security), Rangan in year 3 (multimedia), Belew in year 4 (content-based search), and Ferrante in year 5 (scientific metacomputing and, computer and software engineering).
20
F. BUDGET3
• Equipment and maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $1,359,062 • Salaries (and benefits) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $686,795 • University overhead (51.5%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $353,700 • TOTAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .$2,399,557 • NSF request. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .$1,798,507 • UCSD matching funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $601,050
3 This is a coarse summary of the budget submitted to NSF.
21
G. RESEARCH
G.1 Introduction Today, the Web may be viewed as a network of information servers that simply wait for client requests and respond to them. These requests are simple messages, and servers generally respond only when requests are made. While the Web has enabled the sharing of information by users on a truly global scale, its simple passive model of operation, while powerful and easy to implement and therefore propagate, is limiting its capabilities in content management, large-scale integrated resource sharing, and ultimately, user interactivity and productivity. We envision a much more dynamic, Active Web. The Active Web supports a greater degree of interactivity between users and the “active content” it will provide. This active content is typically rich in multimedia components (parts of which are dynamic such as audio and video), and rich in references to other objects. This active content will change over time, refining itself, growing new references, deleting old ones, and managing its own implementation (e.g., how it is stored to achieve higher levels of reliability or efficiency, how and when processing is applied to refine it, etc.). Everything in the Active Web is active, even the units of communication which are no longer just data, but mobile programs or agents. This concept is similar to that of emerging Active Network research [131], but at a higher level. These agents act on behalf of users; they move about autonomously and execute on servers to carry out user requests. Servers are no longer passive databases, but context-sensitive “knowledge networks” that contain all kinds of active content. Between the servers themselves there is a constant exchange of agents, which add to, refine, form interconnections, and make consistent, the distributed content. The servers provide services beyond information, such as compute time and memory space for remote execution. G.1.1 Features and Capabilities The Active Web is characterized by the following features: • active objects, including content that is rich in multimedia data types, and agents that process and transport this content in user (or server) specific ways, exploiting the use and location of distributed resources
22
• programmability, where the “instruction set” is defined by servers that host agents, through (1) the instruction sets or scripting commands they support, and (2) the higher-level methods (forming “services”) they export, both of which are made available via the execution environments provided to agents • decentralized control and administration of resources, to realize active objects and programmability by supporting resource sharing without necessarily assuming mutual trust of separate interacting administrative domains Users may view the Active Web in two ways, which follow from the first two features. First, it may be viewed as a wide-area content storage/retrieval system, providing users with relevant content from the appropriate sources, both in user-directed or source-directed ways. Second, it may be viewed as a large-scale virtual computing system that can execute user programs using resources, possibly large numbers of them, owned by (and possibly rented from) others. Our researchers in content-based search and multimedia, who deal with content or the form of content, emphasize the first view, while those in scientific metacomputing and, computer and software engineering, who deal with designing, executing, and analyzing large programs, emphasize the latter view. The third feature implies that these views are supported by system mechanisms that support the decentralized aggregation of resources, provided incrementally by many different organizations. Our security and systems groups are concerned with how to support these features via decentralized control structures. The Active Web offers capabilities and modes of computing that are at best in primitive development in today’s Web: network computing, and secure computing. In network computing, users rely on power contained in the network to carry out their computing. The reliance may be so great that the user requires only a simple access device for communication and the user interface. Secure computing provides fine-grained programmable confinement of and access to content. The basic idea is, rather than sending out sensitive content upon request to the user as is done today, the user is expected to send an agent to the content’s server. This agent will execute in an environment under the complete control of the content’s owner. We will say more about the value of agents below. To realize this vision, a flexible architecture for resource sharing between untrusted domains is required. This architecture must allow resources to be aggregated into higher-level computational and communication structures, with predictable levels of performance and reliability. These struc-
23
tures will be the basis for the high-level services of the Active Web. We intend to organize the infrastructure we obtain according to this architecture. G.1.2 Why Use Agents? Agent-based computing is an important technology for the Active Web, and will play a key role in supporting decentralized management of resources and the ability to configure them into distributed computational structures. Agents allow applications to capitalize on and have finer control over resources that are most appropriate for carrying out a job. We define an agent as simply a program that (1) can execute locally (on the machine where it originates) or remotely (on a server); (2) operates asynchronously relative to the process that initiated it; (3) is “network-aware” in that it can move about (from node to node) autonomously, i.e., based on decisions determined by the execution of its own code; (4) can maintain (and modify) its state when it moves from one node to another. Agent-based computing provides valuable capabilities. Some of the more important ones include: • Real-time interactions at a distance: Agents can support more effective real-time interactions that are to occur far from the originating node. One can send an agent so that it operates close to the object it is to interact with, reducing the interaction time by reducing distance. This is an important consideration because communication latency is becoming an increasingly important problem. As processor speeds continue to improve, the performance of multi-process distributed programs will be limited by communication speeds between nodes that are bounded by the speed of light. The agent approach is to move processes so that communication takes place on nodes that are close to each other, or better yet, on the same node. • Meeting bursty resource demands: Agent-based computing is an effective way of structuring applications that have bursty resource-usage behavior. Consider an application that, over time, uses resources in varying amounts. During some intervals, the application may require excessive amounts; many of our scientific applications fall into this category. During these resource demand bursts, it is advantageous to access resources beyond those that are locally available on the desktop; indeed, what is available locally may be inadequate. The application can dispatch agents to go out, find free resources, and use them according to the application’s needs. • Controlling the distribution of information: Agent-based computing can be an effective solution in controlling the distribution of content (we called this “secure computing” above). While the Web has enhanced the ability for users to share code and data, there is a desire to have better
24
control over distribution. Traditional approaches, where access control is assumed to be under the operating system’s control, are often inappropriate. The Active Web is a highly decentralized environment, where different networks and nodes are “owned” by different groups/organizations who determine their own access-control policies, and where one cannot make assumptions about the underlying access-control mechanisms because different nodes may execute different operating systems. The agent approach is to avoid distributing the information, but allowing access by hosting foreign agents so that all access is local. One can then limit communication so that perhaps only results of computations on the accessed data, and not the data itself, can be sent back; and one can limit the communication data rate so that a covert channel, while theoretically possible, is ineffective. This approach applies to code also. G.1.3 Security Issues The Active Web concept envisions a world-spanning collection of autonomously controlled computational resources and content accessible by anyone, anywhere. In such an environment, what mechanisms can be adopted to compensate servers and to ensure fairness of access to users? How can users be protected from fraud and servers from vandalism and theft of intellectual property? How can support for critical applications (e.g., large financial transactions; electronic stock brokerages; medical, police and military databases), which demand accuracy, reliability and confidentiality, be guaranteed? Today’s Web (and the Internet that underlies it) allows users and servers to operate autonomously and to hide the details of message delivery from the user. These are some reasons for its tremendous success. However, autonomy and anonymity also cause potential problems. Servers for communication links provide resources voluntarily without recompense. Although they are generally reliable, they have no incentive to be so. Users likewise have no real disincentive for not over-utilizing resources, since most services are free or not proportional to resources used. No central body is monitoring the quality or accuracy of information available. No one can be held responsible if a service is not reliable or timely. The need to establish trust between anonymous parties and to guarantee access to and quality of resources will only increase in the Active Web. The potential risk of providing or using anonymous services increases sharply when these services are no longer merely passively transmitting data, but allowing active agents to utilize their computational resources. Mechanisms need to be in place to ensure the security of servers from Trojan Horse agents that disguise viruses or attacks on
25
the system. Since agents may be allowed to spawn other agents without bound, there need to be guarantees that this cannot happen. Less obviously and equally importantly, users also need protection against malicious servers. For example, consider an agent whose mission is to search for the lowest cost and most convenient airline ticket and to then purchase it. Servers for ticket vendors would have a motivation to give such an agent misleading information, to prevent it from having access to other servers, to use whatever information the agent contains about the user’s keys authorizing purchase, and even to spy on the competition by examining the other deals the agent has been offered. Clearly, security and robustness are crucial to the design of the Active Web. In addition, the Active Web should support mechanisms for rationing and prioritizing access to resources. The simplest such mechanism would be a token economy, where users and servers could trade credit for access. The security mechanisms need to be flexible because different applications will require different, and sometimes mutually exclusive, forms of security. They must also be simple and efficient so that they are actually used (e.g., an expensive but complicated lock will be left open for convenience). We envision a collection of compatible security options implemented at the systems and agent level that includes a micro-payment system. We describe some steps to address this question below. While the Active Web presents many security challenges, we can also take advantage of the many features this framework presents to establish security. Previously, implementation of cryptographic protocols was limited by communication. It was basically assumed that the parties were totally paralyzed by message delays. In an agent-based, distributed system, this need not be the case. Instead of viewing the protocol as requiring extensive communication between participants, view the protocol as itself an agent, moving between the participants to collect information. When not actively engaged with the protocol, participants could perform other tasks. The Active Web could also be used to establish a free market of security servers. These would act on the behalf of users as would an agent or lawyer. They could be authorized by a central governing agency, with some sort of legal bond that would be forfeited if they failed to act in their client’s interests or failed to cooperate with law enforcement.
26
G.2 The UCSD Active Web Architecture The UCSD Active Web is based on the vision presented above. It is a network of resource servers, that supply the power of resources to applications via a middleware kernel, i.e., a user-level distributed software control system that supports common mechanisms. The main function of the middleware kernel is to support basic communication (including multicasting), security (e.g., cryptographic protocols), and an agent/server model of interaction between applications and resource servers, hiding the complexity of the underlying interconnecting networks. In this section, we present a description of the architecture and features, and the systems-related and security research we are carrying out to support the architecture.
Content-based Search Apps
Multimedia Applications
Scientific Applications
Computer Eng/ Software Eng Apps
THE UCSD ACTIVE WEB MIDDLEWARE KERNEL (Security and System Support Mechanisms) agent agent interface: resource access + agent hosting
Higher-level Server
Resource Server virtual machine Resource Server
resource units
Resource Server
Resource Server
Figure 3: Logical structure of the UCSD Active Web of resources. A resource server is composed of resource units, e.g., memory pages, disk blocks, etc., that it manages, and a virtual machine to execute foreign agents, the primary transports of control. Resource servers are basic servers, and will generally provide support to higher-level servers. G.2.1 Resource Servers A resource server encapsulates resources such as processors, memories, and disks, and accepts requests to allocate units such as processor quantums, memory pages, and disk blocks. Figure 3 shows the logical structure of the UCSD Active Web and the components of a resource server. A resource server exports an interface that allows access to the functionality of the resource and supports the hosting of agents, which execute directly on the resource server to achieve a more controlled interaction with the resource.
27
The physical realization of a resource server will generally be a general-purpose computer and the resource, e.g., disks. The computer executes a portion of the middleware to accept messages, return replies, and to accept agents and execute them on a virtual machine. Part of our operating systems research focuses on the design of fine-grained control over resources to better support resource servers. A simple example of a resource server is a disk block server. It provides access methods to store and retrieve individual disk blocks, which are the basic resource units managed by the server. It might support a variety of different block sizes, and allow its storage to be reconfigured. Finally, it will execute agents sent to it by applications, or higher-level servers (i.e., servers that provide a higher-level, more abstract service, than simply access to basic resources). For example, a client may wish to deposit a caching program that uses an application-specific replacement policy (in fact, this is something our researchers working on multimedia file servers would like to do). The ability to host agents allows one to build higher-level servers that are composed of a complex of lower-level ones, and to carefully control the distributed resources by locating agents near the resources they control. These agents communicate to achieve a coordinated operation. The middleware kernel’s support for multicasting facilitates such group communication. Note that a single physical computer may actually support multiple logical resource servers. For example, the Sun multiprocessor, to which will be connected our RAID system and tape tertiary storage, will logically support one or more compute servers, disk storage servers, and tape storage servers. The Digital Rawhide multiprocessor cluster and the Compaq PC-based multiprocessor, each will support multiple compute servers. The network that interconnects the resource servers also allows its resources to be allocated. While the middleware “hides” the underlying physical networks, logical components can be made visible. This is exactly why we are using the programmable ALR dual-processor PCs as routers, as they can operate as resource servers, allowing individual link bandwidths to be allocated (in portions) and having the ability to host agents. The main actors in the UCSD Active Web architecture are agents. They are the primary vehicle for accessing remote resources and realizing dynamic distributed computing, in a highly heterogeneous and decentralized control environment. Our faculty who will be users (rather than builders) of our Active Web infrastructure see agent-based computing as a key capability for their work.
28
Consequently, in the following sections we describe the agent-related aspects of the UCSD Active Web. G.2.2 The Agent/Server Model In the UCSD Active Web, interactions between clients (applications or higher-level servers) and servers are based on a powerful extension of the traditional client/server model, which we call the agent/server model. That it is an extension (rather than a departure) is an important convenience, as this limits the generality of distributed system programming structure we need to consider, and builds on what is clearly established as a successful model of structuring distributed programs.
client/server model
client
server
server-defined interaction
becomes agent/server model
light client
move by agent agent
agent
server
Figure 4: Client/Server vs. Agent/Server: In the agent/server model, client-defined interactions are independent of server-defined interactions. Consider a client that interacts with a server: the client makes remote procedure calls based on a well-known interface published by the server. While this promotes modularity and allows the client and server to be built independently, there may be differing requirements for the communication protocol between the client and server that depend on where they are situated, e.g., on the same fast local area network versus separated by multiple routers interconnected by networks of differing performance and reliability. Note that these issues generally have nothing to do with the functionality provided by the server, and in fact, may be difficult to deal with using the remote procedure call paradigm. Yet, we do not wish to complicate the server interface by extending it to solve these problems. In the agent/server model (see Figure 4), the functionality of what is commonly embodied in a client of a traditional client/server system, is partitioned into two parts: a light client and an agent. The light client acts as the interface to the user or to local programs with which it communicates, and the agent is that part for which remote execution (and mobility) is worth considering. For
29
example, the agent and server may require a large number of interactions before generating a final answer for the user; in this case, having the agent execute near or at the server ’s location may significantly reduce latency as well as bandwidth. In fact, this is commonly argued as a benefit of agent-based computing. But perhaps an even more important advantage is that the interactions between light client and agent are completely determined by the client programmer. Thus, a client-specific protocol can be designed to deal with the communication issues between the light client and agent mentioned above, without affecting the server’s interface. For example, consider a video server that supports procedural interface, including a GetFrame call that returns a particular frame within the video. If the user is located far from the server, a special-purpose protocol, perhaps supporting a streaming communication model with compression and maybe even forward-error correction (e.g., for error prone wireless links) at the server side, may be most appropriate. The protocol might even support buffering, at the server side to deal with periods of disconnectivity, and at the client side to filter out jitter for smooth playback. This protocol can be carried out without effect on the interaction between the agent and server, which is completely determined by the server’s interface. Agents can support one-shot or long-lived interactions between users and servers. In a one-shot interaction, a user requests a service and a server responds. A long-lived interaction represents a long-term relationship between a user and a service. The agent encapsulates the state of the relationship, and over time, the service becomes more and more customized for the user (based on how the user and server interact). In addition, these interactions may be pull or push-oriented. Pull-oriented interactions are generally user-directed, i.e., the user sends a request, a server responds. In push-oriented interactions, servers initiate the provision of content to users, often as a result of an already established relationship, i.e., a long-lived interaction. Push-oriented technologies (e.g., Castinet, BackWeb, Diffusion) have indeed been developed for the Web, but these generally assume a broadcast model of communication, whereby content is pushed by a publisher interested in selling content to the broadest possible audience. We believe author-based push technologies allow a much more focused type of communication, directly from the content-producer to the audience they have in mind. For example, when a scientist wishes to publish a new result, rather than placing it in a passive archive hoping that interested readers may happen upon it, author-push mechanisms will be directed by the author to actively seek out readers sharing common interests.
30
The result of all this is more dynamic control. There can be a clear separation of mechanism and policy, and the latter can be dynamically determined and made user specific. Further, performance can be enhanced by allowing the code executing the policy to be located as close to the resource as possible. This, along with our approach to access control, promotes our goal of supporting the development of decentralized usage and management policies. G.2.3 System Support for Agent-based Computing System tools are needed to support agent-based computing in the UCSD Active Web; this includes compiler support, operating system support, and network communication support. We will briefly describe our approach. G.2.3.1 Compiler Support Agents will require the following compiler support: (1) the representation of and support needed for mobility; (2) techniques for resource allocation and revocation during execution; and (3) justin-time compilation and performance issues for efficient code on the destination architectures. We are using Java as the “lingua franca” for the UCSD Active Web. The popularity of Java and the widespread availability of the Java Virtual Machine for different architectures makes it a convenient language for experimentation on heterogeneous systems. Our approach to agent mobility is to use mobility translators of two types, Java source-to-source and Java bytecode-to-bytecode. One of our faculty, Brad Calder, who is leading the compiler research effort, was part of the development team for the Intel x86 architecture version of ATOM [129] and OM [130], which are binary-to-binary translation tools that allow instrumentation and optimization of executables on DEC Alpha and Intel x86 architectures. Our mobility translator allows us to instrument any compiled Java program to gather statistics about the program and to modify it for mobility, resource control, security, optimization, and many other research problems. Furthermore, our translation technology is Java safe, meaning that we are not violating any of Java’s safety rules, as we want our agents to work not only on the UCSD Active Web, but also on the future (World Wide) Active Web. Regarding resource allocation and revocation, we use compiler analysis to determine live resource regions in programs, and automatically generate operating system resource allocation requests. Mechanisms are provided to allow code to be interrupted for resource revocation at certain points in these regions, and for the ability for the programmer to address the reason for revocation (simi31
lar to handling an exception). Regarding performance, current Java interpreters are roughly 10 to 20 times slower than compiled C or C++ programs. We are examining techniques for increasing the performance using just-in-time compilation, translating the Java bytecode directly to the native architecture, and adaptive execution. Others have investigated the performance of directly compiling a Java applet to the native code and bypassing the Java interpreter [74], showing that they can compile and produce fairly efficient native code from Java bytecodes. We are investigating more advanced optimizations such as code layout, data layout, and method optimizations, which we have researched in the past in other contexts [23] [25] [69] [24]. G.2.3.2 Operating System Support An operating system controls local resources, and allocates them to processes. When an agent arrives at an Active Web resource server, it will generally expect a certain level of performance, especially if it is willing to pay. For example, an agent may need to execute for a certain amount of time and have access to a certain amount of memory every 33 milliseconds, if it is to retrieve and send back video at 30 frames per second from a video server to its client. The agent may need to compress the video frames, or extract reduced-resolution images, and then send the result back for real-time display. Another example would be a distributed scientific computing application that requires a coordinated execution of tasks the periodically communicate to exchange partial results. Agents on each processing node must schedule the use of processing time, memory space, and communication bandwidth, so that the computing and exchanging of partial results occurs at regular agreed-upon intervals. The UCSD Active Web will use operating system mechanisms being developed by Joseph Pasquale’s group (see Section G.3.1.1) that allow agents to control their level of performance. Rather than pursuing the more common approach of trying to design a specification language for performance and then determine how to translate specifications into schedules, we allow agents to build their own “performance-controlled environments.” This is done by providing an agent, that is working on behalf of some application-specific task, with the ability to request for the direct use of physical resources such as processor time and memory space, and allow it to control how they are used. It is then up to the agent to determine what resources are needed and when, to meet whatever performance objective it has. A good example of how such agents are used is in the work of one of our faculty, Fran Berman, who has been investigating “application-level schedulers,” i.e., application-tailored agents that help scientific metacomputing (distributed) applications make good run-
32
time decisions concerning where their tasks should execute [21]. The ability to get specific finegrained amounts of resources, under process control, is lacking in most popular operating systems. However, new research systems are beginning to address this problem, including Exokernel [47], L5 [91], SPIN [22], Cache Kernel [31], Fluke[54], and Scout[102]. Our approach differs either in the level of abstraction provided, or the granularity of resource control, or both. G.2.3.3 Network Communication Support Agents need to communicate with the client they left behind, with servers they visit, and, of course, with other agents. We expect the creation (and destruction) of agents in our Active Web to be highly dynamic. Agents will create other agents to subcontract work, which in turn will create other agents. To allow these agents to work together, there is a need to support high-performance and robust group communication structures for inter-agent communication. The new problem is in dealing with mobility, where the communicants may not stay in the same place during an extended communication, causing delays in message delivery. As an example of how performance can be improved, one approach is to use anticipatory protocols that deliver messages by trying to predict the most likely location of agents (based on their past and most recent locations, and agent-provided “itineraries”). We are well-equipped to address this problem, as our systems faculty have strong expertise in the area of group communication. Flaviu Cristian and Keith Marzullo have focused on the reliability aspects [40][42][41][43][8][37][94], while Joseph Pasquale and George Polyzos have focused on the performance aspects, especially in the context of real-time multimedia communications [114][115][84][116][148][85][149][147]. G.2.4 A Token Economy for a Computational Marketplace There are major economical benefits to resource sharing. In our case, a small number of powerful multiprocessors, high capacity storage, connected by a high-speed network, will benefit many projects, each of which will easily make use of a large fraction of them for the relatively short durations they need them. However, there will be times when resource availability is scarce. How do we deal with periods of over-demand. One major experiment we will carry out is to view the UCSD Active Web resources as a “computational marketplace”: resources are available for use, and time on them can be bought and sold. Tokens which act as currency will be distributed to the various groups according to department33
developed policies (via the PIs’ representation of the various groups). The tokens will then be used to automatically purchase time on resources. This model fits very well with agent-based computing, as agents are the appropriate transport for the tokens. To use a set of resources, a user provides the application agent(s) with tokens. When an agent arrives at a resource server, it may have to pay. Importantly, the UCSD Active Web itself will not impose any particular pricing policy. Rather, it will only provide mechanisms (e.g., a token micro-payment system as the exchange medium, mechanisms for metering resource usage, electronic contracts, fine-grained resource control for predictable, and in some cases guaranteed, performance, etc.), and it is up to the manager of the server to develop a pricing policy. These policies can certainly be different for different servers, just as one would expect in a World Wide Active Web, where different service providers would want to dictate and experiment with their own policies. For example, one of our research groups developing a new service may wish to make it free initially (or even pay users to use it), and later establish a flat-rate, or pay-per-use policy. G.2.5 Security Issues G.2.5.1 Access Control: Secure Capabilities Access control in the UCSD Active Web is based on access control lists and secure capabilities. Support for these mechanisms is provided by the middleware kernel. Access control works as follows. Servers contain traditional access control lists, which simply describe who is allowed to do what. To establish usage of a resource, an initial request is sent to a resource server. The sender of the request is authenticated (part of the middleware kernel), and then checked against the access control list. If permission is to be given, a secure capability is returned. The secure capability is an unforgeable token that accompanies all future requests for actual resource usage. A secure capability encodes the following information: (1) an identifier to uniquely identify the resource server from which it came; (2) what functions are permitted; (3) what portion, or how much of the resource can be accessed; (4) an expiration date (after which the secure capability becomes useless); (5) transference, i.e., whether the secure capability can be transferred to someone else.
34
Secure capabilities form the basis for secure and limited time-access to resources. To actually use a resource, a client will send a request that includes the secure capability to the resource server. Such a transfer is made secure via encryption-based protocols. The secure capability tells the server whether the request is permitted (e.g., permission to execute the function, to access the resources, etc.). For example, the user may have bought a certain number of disk blocks via the initial request by transferring funds to the server, and receiving in return a secure capability encoding the number of blocks that can be accessed, and an expiration date beyond which the blocks can no longer be used. G.2.5.2 Protecting Agents and Servers from Each Other There has been a great deal of activity, both in academia and commercially [55][56][89][117], to provide assurances that foreign mobile code cannot harm the servers on which they run. However, just protecting servers from potentially malicious agents is not enough to make mobile code a trustworthy reality. The converse problem to server security is agent security: protecting agents from potentially malicious servers. The risks of depending on the results of mobile code computation could be dramatic, especially if the results are in turn inputs to commercial decisions. Secure mobile code is an extremely good problem area for motivating advanced security research! Two of our security faculty, Bennet Yee and Mihir Bellare, have defined the Forward Integrity (FI) property for message authentication codes (MACs), designed several data authentication schemes that have this property [152], and provided proofs of the exact security of one of these schemes [19]. The security of normal MAC functions [14] completely breaks down if the privacy of the MAC generator is violated and the secret keys revealed. In contrast, MACs with the FI property provide the following assurance: data for which a MAC is generated in an earlier “epoch” is tamper-proof (other than complete deletion) even if the adversary completely violates the privacy of the log generator. FI MACs form the basis for our approach to secure agent-based computing. Our goal is to provide integrity of computation. The key is to generate a FI MAC on the result of the computation at each server prior to moving to the next one, so that intermediate results cannot be modified by a later server without being detected. G.2.5.3 Electronic Money To support our token economy, agents need to carry with them electronic money or some other credentials, e.g., cryptographic keys or passwords, to pay for resources from the servers. This
35
motivates the need for agent privacy: neither malicious servers nor network-based adversaries should be able to rob an agent of its electronic money or credentials. Running all or part of the agent within a physically secure environment in the form of secure coprocessors [140][141][142][151] is a strong and practical method of achieving agent privacy from malicious servers. G.3 Faculty Research Now that we have provided an overview of the basic UCSD Active Web architecture and its features, we describe the faculty research projects that will make use of, and motivate why they need, this system. We have organized these projects into the six areas presented earlier. This grouping is simply to emphasize the different types of demands being made on the UCSD Active Web. As far as the actual work being carried out, there a high degree of overlap and collaboration between these groups. For example, the Content-based Search and Multimedia groups have much in common, as might be expected, and some faculty in the latter group are actually also doing contentbased search (but focusing on video and images, rather than more traditional symbolic data). The Security and Systems groups have integrated efforts on security mechanisms that can efficiently be implemented in network and operating system software currently being developed. Many more examples can be cited, and will be evident in the descriptions that follow. G.3.1 SYSTEMS Faculty: Brad Calder, Flaviu Cristian, T. C. Hu, Keith Marzullo, Joseph Pasquale, George Polyzos, Dean Tullsen The Systems group is researching a variety of issues in operating systems, network communications, distributed systems, computer architecture, and compilers. As related to the UCSD Active Web, these issues include: operating system mechanisms that provide fine-grained control over low-level resources to support performance-controlled environments for agents; efficient multicast for inter-agent group communication; improving availability and providing fault tolerance for network-accessible distributed services; high-performance multithreaded computer architectures, which are especially appropriate for hosting large numbers of processes for parallel applications and for hosting agents; compilation of Java agent code to support mobility over multiple heterogeneous architectures.
36
G.3.1.1 Operating Systems Providing Fine-Grained Resource Control Joseph Pasquale is currently investigating operating systems mechanisms to support fine-grained allocation of resources. These mechanisms are required for resource servers that need to partition resources over many processes that require a high degree of control over their performance. This control allows such processes to flexibly adapt their behavior over time, based on changing requirements. This is especially important for the UCSD Active Web, where agents will arrive at servers and possibly pay for resources to achieve a certain level of performance. Since resource requirements will change over time (e.g., the work required by an agent that filters video will change based on the varying sizes of compressed frames), agents will want to only pay for what they need. Our work is rooted in research we have carried out over the past ten years on supporting applications that have predictable performance requirements. We have investigated new operating system software structures to improve I/O and IPC performance for applications requiring communication of very large data objects, e.g., hundreds of megabytes or more. The more important classes of applications supported by this work include multimedia (e.g., images, video, and audio) and scientific (e.g., large state-based simulations of physical processes). We have explored a number of interesting points in the design space of I/O (including networking) and IPC system software, including: • Multi-structured File System (MFS) design, which integrates three separately organized diskarray storage structures (mirrored for fast response, striped for high throughput, and logging for reliability) to achieve high performance, reliability, and scalability [103] • Container Shipping, which provides mechanisms that allow a process to efficiently and safely transfer data between protection domains without causing physical copying, using virtual memory remapping techniques in a user-controllable way [113] • Peer-to-peer I/O, based on a new paradigm for computing that is I/O-centric. In peer-to-peer I/ O, mechanisms are provided that allow the construction of in-kernel data paths, directly connecting source and sink devices [48] • Leave-in-Time, a new packet-switching service discipline that provides distributional end-toend performance (e.g., delay) bounds [50], building on new results for performance bounds of other service disciplines that were unknown, and in some cases, believed to not exist [51]
37
• Software implementation techniques for improving throughput and latency in TCP/IP, based on one of the most detailed studies of where time is spent in a commonly used TCP/IP implementation [83] • The Multimedia Multicast Channel (MMC), a group communication abstraction unique for its cable-TV style communication paradigm that allows a source and a set of heterogeneous receivers to effectively co-operate independently, and is naturally supported by efficient openloop reduced-feedback control methods [114] • Filter Propagation, another unique feature of the MMC whereby a data-stream filtering software module can propagate from a receiver toward a source along a path in a multicast tree, e.g., to reduce bandwidth consumption [115]. Some of these projects will directly support the UCSD Active Web. For example, Container Shipping and Leave-in-Time will serve as building blocks for the UCSD Active Web’s operating system and network software, respectively. Other projects will benefit from the infrastructure. An Multi-Structured File System requires large numbers of disks (tens to hundreds) to be effective; in the past, we could only simulate the performance improvements. The large storage capabilities of the UCSD Active Web will allow us to build real MFS-based designs. Our past implementations of filter propagation were based on ad hoc designs for code movement; the UCSD Active Web’s support for agents and their ability to execute on network routers will allow a more systematic approach to implementing filter propagation. G.3.1.2 Networking and Multicasting George Polyzos has considered problems associated with the real-time dissemination of multimedia information through heterogeneous networks [116], including wireless networks and mobile hosts [148][149][147]. His research has addressed issues related to quality-of-service based multicast routing [84][85], terminal and path heterogeneity [114][118], traffic analysis and modeling [34], prioritized traffic and congestion control [114][68][118], and error control for interactive applications over wireless networks [68][67]. Much of this work is experimental, using diverse technologies such as ATM, Internet, and wireless channels. However, because of the lack of appropriate network testbeds and specialized multimedia terminals, previous experiments have been at a very small scale. The availability of a high-speed programmable network connecting a heterogeneous mix of resources will allow the testing of ideas and designs in a more realistic environment.
38
In addition, T. C. Hu is doing theoretical work on connectivity and communication of networks, developing new telephone routing algorithms based on distributed control. To experimentally evaluate these algorithms, he would like to execute them on a simulated large-scale telephone network using traces containing real data of past phone call connections, all of which requires a large amount of computer power and memory space. G.3.1.3 Availability and Fault Tolerance One of the goals of the Active Web is to support access to critical applications. Such applications have requirements for high reliability and high availability, which implies failures in the service must be masked. The TEAM research project of Flaviu Cristian and Keith Marzullo has been exploring protocols that support masking failures in replicated services. An important facet of this research is to provide efficient implementations of these protocols that are based on sound and realistic system models. We consider three kinds of applications: (1) those that must provide their timely service in spite of a bounded number of failures per time unit; (2) those that can transit in a timely manner to a “failsafe” state when too many failures occur per time unit; (3) those that can offer “best effort” degraded services when too many failures occur per time unit. For each paradigm, we have implemented a suite of system services that allow for easy construction of highly-available services [49][43][41][37][36][35]. We are currently investigating their integration into one approach that would support complex, wide-area real-time applications. We are currently extending our work into failure models that occur in the more wide-open environment of Web-based applications, including Byzantine failures (modeling Trojan horses) and failures of trust (modeling misuse and anomaly detection resulting from intrusion). The TEAM project is also interested in building systems that can manage their own execution. This is an important function of large distributed applications where failures must be repaired rather than simply masked. Our early work on the problem has both explored techniques based on table-driven recovery [39] and on code instrumentation [94]. Both techniques were successful and were incorporated into products (by IBM and Stratus, respectively), but the area is still very poorly understood and researched. We are now examining embedding tools into programming languages (both Erlang and Legion) for similar management purposes.
39
Since we are interested in distributed real-time applications, the UCSD Active Web’s fast computers and low-latency network are needed for evaluating systems that purport to achieve low end-toend delays. To provide and test new hard-real time communication protocols, the availability of a programmable routers is advantageous. G.3.1.4 High-Performance Processor Architecture This group, led by Brad Calder and Dean Tullsen, does research in computer architecture, focusing on next-generation high-performance processor architectures. The current focus is on (1) multithreaded architectures, including simultaneous multithreading, utilizing both software-generated and hardware-generated threads, (2) branch penalty hiding, and (3) aggressive instruction and thread speculation. Simultaneous multithreading [132][133] is a technique that allows the processor to significantly increase processor utilization by dynamically sharing nearly all processor resources each cycle among several running threads. This work is heavily experimental in nature, based on detailed cycle-by-cycle instruction-level simulation. Critical to this work are good simulation tools and fast and abundant compute resources. An illustration of a our resource requirements is described in Section D.3: we require the execution of 1015 instructions just to emulate a single configuration of an architecture and a workload. G.3.1.5 Compiling Agents To support mobile agents at the application level, compilers must enable agent migration, and provide the needed support for security and resource management, at a sufficient level of performance for the user. Research by Brad Calder will explore these issues for the Java language. The ability to migrate will be added to Java using bytecode to bytecode and source to source translation tools. See Section G.2.3.1 for additional information. Other work being considered is Just-in-time adaptive computing and migration for Java, such as the reordering of code and object file format to start immediate execution and faster compilation upon migration [57][69].
40
G.3.2 SECURITY Faculty: Mihir Bellare, Russell Impagliazzo, Bennett Yee The Active Web brings a host of security concerns to be addressed. How can a user verify the authenticity of information carried by an agent, or the source of origin of an agent? How can you protect yourself against agents that carry viruses into your computer? How can you prevent people from collecting information about yourself and selling this to others? How can you equip an agent with the ability to pay for goods? How can resources be shared globally while still ensuring availability to critical applications? These questions go beyond the scope of simple cryptographic patches laid over the current Internet protocols. (In particular, the cryptographic standards of Internet II, while a first step towards network security, will not address these issues.) Our security group brings the required integrated expertise in protocol design, systems security, electronic payments, and fault tolerance, to address these problems. The development of the UCSD Active Web will be a unique and ideal testbed to experiment with new approaches to security for the real Web in a relatively tame, low-risk environment. Additionally, the UCSD Active Web will support new mechanisms for implementing protocols, such as independent security servers and agent-based protocols. One of our most important decisions in designing the UCSD Active Web is determining how access to resources is shared among users. As discussed earlier, we will experiment with a token economy, for which electronic payments are central. The UCSD Security Group has been exploring a variety of electronic payment mechanisms. G.3.2.1 iKP Mihir Bellare was part of the group that designed a protocol suite known as iKP [15], for the purpose of credit card-based payments over the Internet. Using these protocols, a user can pay an Internet merchant with a standard credit card, but in such a way that even the merchant doesn’t learn the credit card number. Nonetheless, the merchant can verify the transaction with the acquiring bank. Thus the security of both user and merchant is protected. The iKP suite influenced the development of SET, a standard for electronic credit card based payments proposed by Visa and Mastercard. iKP received IBM’s “Outstanding Technical Achievement Award” in August 1996. OAEP [17] is a key ingredient of both iKP and SET. It is a public key based encryption scheme, providing a means to encapsulate data before transmission in such a way that no one except the
41
intended recipient can recover it. The salient feature of this protocol is that it is fast and also proven secure. OAEP has been quite widely adopted, and is being proposed as an IEEE standard. G.3.2.2 Revocable Privacy and Security Markus Jakobsson, a Ph.D. student of Russell Impagliazzo, explored another approach in his thesis [77] and related conference papers [79][80]. In this approach, independent servers, some chosen by the customer, others by the bank, work together to ensure revokable privacy and security. This approach can prevent electronic payments from being used to gather information on customer’s spending habits, while at the same time preventing money laundering and other criminal activities. The approach presents an intriguing framework for security via distributed computation by security servers that represent the customer ’s interests while being accountable to the legal system, (much like attorneys). Jakobsson and Keith Marzullo both were part of an SDSC project to implement a payment system for the U.S. Patent Office. G.3.2.3 Secure Coprocessors Bennet Yee has used a third approach, a tamper-proof chip with processing power called a “secure coprocessor,” as a cornerstone for building electronic payment systems in physically exposed environments. He has applied this idea to develop a secure postal metering system for the U.S. Postal Service, reported in [70][134]. Secure coprocessors are a central part of our UCSD Active Web security strategy. The idea of secure coprocessors, developed in Yee’s thesis [151], potentially has wide applicability to many other areas of security for the Active Web. An example is the mutual distrust problem for agent-based computing: just as servers fear agents will attempt to subvert them, the agent’s user might be concerned that the server might subvert the agent. For example, if an agent is to find the cheapest airline ticket, it would be in the airline server’s interest to “brainwash” a running agent so that it believes that all earlier quoted prices from competitors are higher. This problem is being addressed by Yee’s Sanctuary project [152], which, among other mechanisms, is considering use of secure coprocessors to ensure agent reliability. G.3.2.4 HMAC Imagine what would happen if your e-mail were modified on its way to the recipient; hackers could put words in your mouth that you never intended. The problem becomes even worse when
42
the message contains code to be run on the recipient’s machine, and that you are responsible for. Providing “integrity” of data transmitted over the net is an important problem, to which we have devoted extensive research. Bellare and others have designed a message authentication code (MAC) called HMAC [86], described and proved secure in [14]. It enables a sender to tag his data with a value computed as a function of the data and the secret key. The receiver, holding the same key, can verify the correctness of the tag upon receipt. HMAC has been adopted as an Internet standard [64], so that it could be part of the next generation of secure internet protocols. It is also implemented in a host of current commercial software security packages, including B-SAFE (of RSA corporation) and SSL. Finally, it received IBM’s “Outstanding Innovation Award” in March 1997. As mentioned above, a MAC requires the sender and receiver to a priori share a secret key under which the correctness of the MAC is verified. Stronger is a digital signature, which, like a handwritten signature, is publicly verifiable and non-repudiable. Efficient and secure mechanisms are much sought; one such is PSS [18]. Again designed using the provable security approach, it combines high speed with theoretically justified security. Some standard bodies have expressed an interest in the technology. Recently, Bellare and Yee have developed a special kind of MAC with the Forward Integrity property. In the context of mobile agents, this means that partial results computed prior to visiting a malicious server cannot be tampered with by those later malicious servers, even though the malicious servers have complete access to an agent’s internal state. Such forward integrity MACs have wider applications. In the context of systems security, forward integrity is a highly desirable property for system audit logs: the audit log contains, among other things, records of intrusion attempts, and the first target of an intruder is the audit logs, to erase all evidence of the intrusion. By using forward integrity MACs to protect the audit logs, any attempt to modify log entries will be detected [19]. Other issues concerning establishing identity and authenticity are examined in [58][66]. G.3.3 MULTIMEDIA Faculty: Walt Burkhard, Ramesh Jain, Yannis Papakonstantinou, P. Venkat Rangan The Multimedia group at UCSD carries out research in the areas of video servers, video streaming, content-based search of video, and video mixing. Video is rapidly emerging into the computing
43
mainstream. Technological advances in hardware, standards, pattern recognition, networks, and database systems provide the necessary basis for systems where multimedia data accompany conventional numeric and textual data. However, the size of video data, the numerous sites from which it will be available, and the complexity of the queries pose a challenge on both the hardware and the software. While video is typically accompanied by audio, text, images, and perhaps other forms of information, the problems of dealing with video dominate (as far as our work is concerned) because of its huge size. For example, a single movie, after compression, can easily consume a gigabyte of storage. The data rate for MPEG-compressed video ranges from 1-10 Mbps. These demands require the power of systems at the levels provided by our requested infrastructure, and good strategies for how to use this power. This includes MMX processing capabilities, MPEG-II encoding/decoding, fast network rates (many tens of Mbps for multiple video streams), and large storage capacities (a terabyte for storing a 1000 unit video library). G.3.3.1 Video Servers and Streaming Venkat Rangan is investigating digital video on-demand servers focusing on the following requirements
of
media
playback:
intra-media
continuity
and
inter-media
synchronization
[119][122][124]. Multimedia presentations such as lectures, news, documentaries, can all be represented as multimedia ropes, each of which may consist of audio and video strands (a strand is a continuously recorded sequence of media units; a collection of strands intertwined by synchronization information is a rope [125]). A large-scale multimedia server must service multiple user requests simultaneously, hence the need for a multiprocessor-based server [123]. In the best scenario, all the users request the retrieval of the same media strand and the multimedia server only needs to retrieve the media strand once from the disk and then multicast it to all the users. However, more often than not, different users may request retrieval of different media strands; even when the same media strand is being requested by multiple users, there may be phase differences among their requests (such as each user retrieving a different portion of the strand at the same time). We have developed a Quality Proportional Servicing algorithm, which sets the number of media blocks retrieved during each round for each request to be proportional to its playback rate, and then uses a dynamic staggered toggling technique by which the numbers of media blocks retrieved during successive rounds are fine tuned individually to achieve the servicing of the maximum number of users simultaneously [137]. The quality proportional servicing algorithm is highly effi-
44
cient: whereas the initial computation of the number of media blocks retrieved during each round takes a constant amount of time (i.e., O(1)), the complexity of the staggered toggling algorithm varies linearly with the number of users. The UCSD Active Web infrastructure will allow us to experimentally evaluate algorithms such as this using realistic video workloads. G.3.3.2 Personal Service Agents As the multimedia content on the Web grows in size, video information will be distributed amongst several storage servers: smaller servers will serve as temporary caches for larger servers that act as the primary repositories for information. Since different groups of users may be interested in different classes of information, optimal distribution of information, close to potentially interested users, can reduce the frequency of network transmissions that are necessary. Moreover, information caching can also amortize storage costs amongst users with similar interests, thereby yielding important performance benefits. Storage and network resource optimizations are most likely when users specify their requests in advance. Alternatively, we have recently started work on intelligent Personal Service Agents (PSA’s) that can monitor users’ preferences and predict their future needs [121][120]. Based on these predictions, PSA’s can judiciously schedule the retrieval of media strands from storage servers and the subsequent transmission so as to minimize costs borne by users. For instance, a PSA can retrieve media strands a priori and cache them either at the user ’s site or at a neighboring server, during a period when the network and server are relatively underutilized [106]. Resource optimizations can also be performed by considering requests from multiple users simultaneously. For instance, when several users have similar preferences, both in their choice of strands to display and in their viewing times, a storage server can multicast strands of the users’ common choice. In cases when users’ preferred viewing times do not match, PSA’s can cache the strand of interest at a neighborhood server rather than incur the cost of repeated transmission all the way from a central server to the users. In the process, PSA’s amortize retrieval and transmission costs amongst several users, thereby reducing the cumulative service costs borne by users. In practice, there are interesting tradeoffs between the cost of renting storage space at a neighboring server and the cost of repeated transmissions from a distant central server, which must be evaluated by PSA’s before deciding to cache media strands at neighborhood servers. For instance, if the hourly cost of renting storage at a neighborhood server is.03 cents/ MByte, the storage rental charges for an hour-long, MPEG-II encoded video strand that utilizes 1.8 Gbytes of storage space
45
will be 54 cents/hour. On the other hand, if the cost of leasing a 155 Mbit/sec link to a distant, central server is 50 cents/minute, every time the video strand is transported between the two servers, the cost incurred will be $1. In this case, whether or not a PSA should cache the video strand at the neighboring server will depend upon the anticipated pattern of user accesses to that segment. If the video strand is expected to be accessed once every hour on average, it would be prudent to cache it at the neighborhood server. However, if user accesses are expected to be less frequent, retransmission of the video from the central server would be preferred. The storage-network optimization is carried out at the time when a PSA schedules the retrieval of multimedia ropes from storage servers. In general, the storage costs may vary from server to server, depending upon its storage capacity, transfer rate, current utilization, etc. Likewise, transmission costs may vary from one network link to another. Factors such as link bandwidth and the real-time performance requirements, such as delay, delay jitter, and loss bounds will determine the transmission costs. To provide multimedia services at an attractive cost, we are investigating efficient information distribution and caching strategies by means of which PSA’s can determine when, where, and for how long media strands must be cached, so as to minimize the cumulative storage and transmission cost, amortized over all the users interested in viewing the strands. Such an optimal caching schedule must not only account for differences in storage and transmission costs, but must also adapt to any changes that may occur in these costs (for instance, servers and networks are in greater demand during “peak hours”, and hence, more expensive to use). Cost changes may occur even on-the-fly; for example, a storage server, upon anticipating an overcommitment of server resources, may increase the storage cost so as to discourage further demand. The UCSD Active Web’s distributed structure, large storage capacity, multimedia server and enduser capabilities, token economy, support for agents, and programmable network, will allow us to experimentally determine the effectiveness of these different strategies. Walt Burkhard is also investigating large-scale data layout for multimedia applications, focusing on reliability issues [7] [38]. In addition to experimenting with its storage facilities, the UCSD Active Web will provide a common testbed for both of these approaches to distributed multimedia storage research, supporting a large number of (possibly competing) storage-intensive applications that can be observed and used to test the various new storage techniques being developed.
46
G.3.3.3 Content-based Search of Video Yannis Papakonstantinou is investigating research in (1) extraction of object features from video, (2) indexing distributed video information, (3) querying video by content, and (4) the use of agents in information collection. The agent paradigm is most suitable to accomplishing these goals. Instead of transferring voluminous video data to the client site the client can send agents to the sites where video resides, parse the video, extract content, and send back summarized metadata to the client site. Our agents will be based on the under-development COMIX system which extracts objects and logical states, collectively called metadata, from MPEG-2 encoded video streams. The metadata are useful for user queries. The system’s highlight is that both extraction and querying are expressed and directed by user-provided high-level statements, which are modifications of SQL statements. In particular, user-provided condition-action statements, called hints, control the use of the basic image recognition, object extraction and comparison modules. Hints allow the implementor to develop agents that control the extraction process in a high level way and at the same time enhance the quality and efficiency in two ways. First, hints focus the extraction routines to the objects and routines of interest to the query. Second, hints allow the user to encode semantic knowledge of the particular domain and use it to enhance both the efficiency and the quality of extraction. However, in order to realize the efficiency of hints we will have to avoid the bottleneck of having to fetch voluminous video data to the client site. This possibility is realized by the agent architecture that allows agents carrying the hints to be evaluated at the site where video originates, or at intermediate “video server” sites. Our prototypes have already exemplified the importance of state of the art hardware in order to achieve reasonable performance. Our eventual goal is to provide real-time feature extraction, querying, and indexing of (non-trivial size) video streams, something that will finally be possible with our requested infrastructure. G.3.3.4 Multiple Perspective Interactive Video Ramesh Jain is developing Multiple Perspective Interactive Video (MPI Video) [101], as a framework for the management of and interactive access to multiple streams of video data capturing different perspectives of related events. At any given time, we can only see our immediate
47
environment from one perspective. To get other perspectives, we must move our eyes. To explore the environment from other viewpoints, we have to physically move. Similarly, a scene is captured from a limited perspective using a camera. Using a powerful information system to mediate between viewers and multiple cameras, it is possible to provide “gestalt” vision, which is more than any individual camera [76]. A viewer can then see the scene from any position and may walk through a dynamic scene without disturbing the events in the scene [81]. MPI Video has dominant database and hypermedia components that allow a user to not only interact with live events but browse the underlying database for similar [128] or related events or construct interesting queries [63]. MPI Video is extremely compute and memory intensive. At this point in time, the entire system is implemented on a single high-performance SGI machine, and requires many hours to process only minutes of 3-stream video. Our ultimate goal is to have real-time processing so users can view scenes as they are acquired and to be able to interact, e.g., modify the viewing angle or move about, and see the resulting changes within human reaction times (tens of milliseconds). MPI Video processing lends itself naturally to parallelism and distribution. In a real Active Web environment, the processing would actually take place near the sources of each video stream, thereby limiting the amount of network bandwidth required. The UCSD Active Web infrastructure will allow us to test various strategies for distributing the computations on the different multiprocessors (and to take advantage of multiple MMX processors). G.3.4 CONTENT-BASED SEARCH Faculty: Rik Belew, Gary Cottrell, Charles Elkan, Yannis Papakonstantinou, Victor Vianu The search for content is difficult because rich media mean different things to different people. No single data attribute, for example the presence of a particular word in a textual document, is sufficient for us to infer that content is guaranteed to be relevant to a user. A number of researchers in CSE are approaching this fundamental problem from a variety of perspectives, including artificial intelligence, databases, and information retrieval. The UCSD Active Web infrastructure will allow research on a number of fronts to be applied on unprecedented scales, as well as be combined on shared datasets and problems. A common theme in the following sections is the desire to use agent-based computing, as provided by the UCSD Active Web, for content-based search.
48
G.3.4.1 Activating Genome Databases As an example of an Active Web application, consider the existing MEME system [11] [10] designed by Charles Elkan and Tim Bailey. This software discovers patterns called motifs in protein or DNA sequences provided by biologists. MEME uses a highly computationally expensive, mathematically sophisticated learning algorithm to converge automatically on motifs that are most likely to be biologically significant. Since it is unknown in advance where the motifs appear, and indeed whether a motif appears at all in a given protein or DNA sequence, O(N) different candidate motifs must be considered for a dataset of size N. With the processing and memory capabilities of our requested Digital multiprocessor cluster, these candidate motifs can be examined concurrently, which makes it feasible to discover motifs in datasets of thousands of sequences, the size of the whole genome of a bacterium. A related tool called MAST searches for instances of these motifs in standard molecular biology databases such as Genbank. The fact that MEME is made available as an interactive Active Web application converts the passive data filling genomic databases into an interactive partnership for biologists. As the databases’ contents change, MEME’s networked design allows it to provide a consistent, up-to-date Active Web interface. In the future biologists will be able to receive automatic notification whenever a new sequence of interest is published. G.3.4.2 Querying Distributed, Semistructured Data Many applications require handling a mix of data of diverse types, ranging from highly structured relational data to “semistructured” data whose structure is very loose or even unknown. While semistructured data occurs in various contexts, the fast-growing set of Web-based applications places particular emphasis on a globally distributed environment. Victor Vianu is investigating languages for querying semistructured, globally distributed data, and agent-based evaluation techniques for such queries. The requirements and abilities of current Web “crawler” agents are quite restricted, and some evaluation strategies require agent capabilities not yet available. For example, some query evaluation strategies require sites to make available persistent storage to software agents; others require extended communication capabilities among agents [2]. Future research will study the interplay between distributed query evaluation strategies, agent capabilities, new measures of query complexity appropriate to a distributed, agent-based model of computing. The UCSD Active Web infrastructure’s support for agents allows us to experiment with such strategies.
49
G.3.4.3 Distributed Query Mediation The focus of Yannis Papakonstantinou’s research has been on mediator systems that provide integrated access to heterogeneous information found in Web sources. Mediators collect information (relevant to the user request) from the sources and combine it by resolving inconsistencies and removing redundancies [108][109][1][107]. For example, a mediator may collect information from multiple bibliographic sites and present an integrated view to the user [108]. We investigate agent-based mediators as a solution to the performance problems that the number of sites, the weak structure of the Web, and the volume of data causes to centralized mediators. First, the agents will perform as many as possible expensive query processing and filtering operations at the site of the information provider, hence avoiding transmission of voluminous data to a single site [107]. Second, the agents can efficiently move to an alternative information provider whenever their original target site is unavailable or inefficient. Finally, agents provide a powerful solution
to
the
problem
of
the
limited
query
capabilities
of
the
sources
[136][112][135][107][111][110]. Optimal solutions to the above problems typically involve impractical NP-complete algorithms, hence making necessary the development and experimental testing of suboptimal solutions [3]. Our Active Web infrastructure will provide the experimental foundation for these experiments, given its processing capabilities and distributed structure, which would otherwise be impossible. G.3.4.4 Adaptive Search Agents Neural networks and genetic algorithms have both demonstrated themselves to be robust, generalpurpose machine learning techniques. It is not surprising that both can become important technologies for allowing information seeking agents to adapt to the rapidly changing contents of the Web, as well as to the changing needs of users they serve. In a system called ARACHNID, Rik Belew and Filippo Menczer have used previous work from Artificial Life to characterize the Web as an environment within which agents, controlled by simple neural networks, navigate over links (i.e., HTML
anchors) to find documents relevant to a user [99][98][97]. As these agents begin to “travel” across the Web, their neural networks allow them to learn which document features are most correlated with the user’s likes and dislikes. Over time, those that have been particularly successful at satisfying the user’s requests are allowed to “reproduce” according to a modified genetic algorithm. Offspring agents begin searching on docu-
50
ments near the parent at that time. Over time, the system evolves agents whose neural networks are especially sensitive to those document features most salient to the portion of the Web in which they live. To date, all of the ARACHNID experiments have been done on test corpora resident on a single host, with all the agents’ processes sharing this single computational resource. However, ARACHNID agents were designed from the outset in anticipation of the ability to run their (neural network) programs remotely on servers actually containing the documents. The UCSD Active Web provides us with a real testbed to allow the execution of remote ARACHNID processes. Running ARACHNID on the UCSD Active Web will also provide a serious test for the various system and security issues our other faculty are investigating. G.3.4.5 Adaptive Lenses Gary Cottrell and Rik Belew are developing a methodology for automatically adapting the parameters of an information retrieval system [12][138][13]. We have applied this technique to the construction of user lenses that are a user community-centered view of the information available. They “focus” the user’s query and the representation of the documents based on relevance feedback from the user community. The approach is based on prior work in our lab on an objective function for information retrieval based on a rank-order statistic. The basic idea is to have the system’s ranking of the documents match the user or user community’s ranking. By performing gradient descent on this objective function, we may adjust the parameters of the retrieval system based on relevance feedback. As described in Section D.3, Adaptive Lenses have large-scale processing and memory requirements (i.e., 1015 instructions/indexing operation), justifying the processing power and memory capacities of the requested infrastructure. We also wish to apply agent-based computing to Adaptive Lenses. The individual agents would not have to each process as many documents, nor worry about the entire “global” vocabulary. Our current design calls for agents that focus on approximately one thousand keywords each, and process 10,000 documents each. This means that their analog to the indexing operation can be done using 10 MB of memory per agent. At the same time, populations of hundreds of such agents are needed (to satisfy each user), and each constantly changing as it forages over new documents. Assuming 10 of our faculty engage in this experiment, 1000 agents can be expected to be simultaneously active. Thus, each agent can be expected to require 100 billion instructions prior to its communicating relevant leads back to the user’s client.
51
G.3.4.6 Reasoning about Verification Concerning the trustworthiness of agents, we must develop methods for verifying resource requests in real-time. Currently the best example of this is the logical reasoning performed by a Java interpreter as it checks a downloaded applet to ensure it uses authorized resources on the machine receiving the download. Charles Elkan has shown how efficient specialized algorithms could verify logically whether the accesses and updates requested by different users of a database system could possibly interfere [46]. An important advantage of this approach to security is that it can be built on top of existing software for implementing accesses and updates, without requiring any internal modifications of that software. This is work in collaboration with our security group. G.3.5 SCIENTIFIC METACOMPUTING Faculty: Scott Baden, Fran Berman, Larry Carter, Jeanne Ferrante, Ben Rosen, Rich Wolski Many researchers who work with applications that require huge computational resources share a common vision: harnessing the collective power of cooperating computers on a large network to work on a single application. The potential is for cheap (by using otherwise idle resources), efficient (since computation can be moved to the data) computations on a scale that is simply impossible on any single platform available to the researchers. Today’s realizations of this vision are limited to three somewhat restrictive types of demonstrations: • “Embarrassingly parallel” problems with low communication requirements. Notable examples include finding the 35th Mercenne prime and cracking 48-bit RSA and 56-bit DES • Tightly-coupled homogeneous workstations, such as Berkeley’s NOW or Wisconsin’s COW, or more loosely-coupled intra-departmental systems, e.g., [62] • Carefully scheduled couplings of two or three supercomputers, like those demonstrated by NPACI partners But metacomputing in the future holds the promise of distributing complex applications among the heterogeneous, dynamically-changing resources of the Active Web. Before this vision becomes a reality, we must solve many research issues. Some of these (such as security, reliability, and accounting) are common to all Active Web applications, but some are unique to large applications, i.e., applications that raise significant synchronization issues, that involve important communication-computation tradeoffs, and that may have a single job run for days or weeks. Since the
52
requirements are so large, it is important to make an economical choice of computational resources. How do you schedule resources for such an application? How do you write a program that can run on an unspecified, and possibly changing computing configuration? G.3.5.1 Faculty Projects The AppLeS project [20][21], lead by Fran Berman and Rich Wolski, is developing both methodology and tools for building application level schedulers for programs executed in wide-area heterogeneous systems. AppLeS uses available application-specific information and application-driven performance criteria to perform adaptive scheduling. Since predictions are only valid in a given time period, AppLeS uses dynamic system information provided by the Network Weather Service. The Network Weather Service (NWS) [143][144], developed by Rich Wolski, performs active or passive monitoring of resource availability. Based on this information, it forecasts future performance, adaptively selecting from a variety of performance models. The forecast information is available to users and to autonomous agents such as application-level schedulers. The KeLP project [52][53], lead by Scott Baden, provides high performance run-time support for adaptive data structures that can be customized to classes of scientific applications. Adaptive data structures arise in irregular applications that concentrate computational power in the “interesting” parts of a problem, and even in uniform applications that are run on complex computing environments such as collections of SMPs. By managing the data decompositions and underlying data motion, KeLP permits the applications to adapt to the operating conditions of both the application and the architecture. Larry Carter has helped develop the Parallel Memory Hierarchy (PMH) model [4][5] of computation to quantify complex, heterogeneous computing platforms. The resources that are relevant to scheduling large applications include not only computation power, but also memory hierarchy characteristics, multiple levels of parallelism, communication latencies and bandwidths, and secondary and tertiary storage. The PMH model is used in writing generic programs [6] that adjust to the characteristics of the target computer, and the model is also used to guide hierarchical tiling. The Hierarchical Tiling project [27], led by Larry Carter and Jeanne Ferrante, is developing compiler techniques for exploiting the increasingly complex computer architectures, with their trends towards multiple levels of memory and parallelism. Hierarchical Tiling combines partitioning of computation for locality and parallelism with greater compiler control of data movement and stor-
53
age. Since the UCSD Active Web will be even more complex and heterogeneous than current single architectures, studying multi-level tradeoffs [71][100] is increasingly important. G.3.5.2 Application Experience Primarily through our participation in the NPACI partnership, our faculty has access to and experience with many large applications. To highlight a few: • AppLeS prototypes are being developed for the molecular interaction program DOT (in cooperation with Scripps Research Institute and SDSC) and Synthetic Aperture Radar Atlas (SARA), which gathers, maintains and disseminates global satellite imaging data. • KeLP Application Program Interfaces have been built and applied to structured adaptive finite difference methods for first principles simulation of real materials (with chemist John Weare, UCSD), and to particle methods. • Ben Rosen, in conjunction with Ken Dill (Pharmaceutical Chemistry, UCSF) and Andy Phillips (CSci. USNA, and SDSC) are developing new parallel algorithms and software for computational biology applications [126][44]. An example is predicting molecular structure by computing the global minimum of the molecular energy function. • Keith Marzullo is working on fault tolerance in the Nile project [95][104][96], an NSF National Challenge project initiated at Cornell to build a wide-area execution environment for CLEO high-energy physicists. The applications process massive amounts of data culled from tens of terabytes of experimentally gathered measurements resulting from electron-positron collisions. • Scott Baden and Larry Carter are coordinating their work by using KeLP and memory hierarchy optimizations on a computational fluid dynamics program that has been developed in UCSD’s Applied Mechanics and Engineering Sciences department. G.3.5.3 Resource Needs By necessity, our experiments have used a disparate variety of computing platforms. The proposed infrastructure would allow us to conduct experiments that are more realistic and more relevant to future metacomputing architectures, to do so more conveniently, and, by providing a single location for our currently diverse experimental platforms, would naturally lead to a closer and more synergistic cooperation. The features we need for our experimental work, and which would be provided by this proposal, include: 54
• Multiple levels of parallelism, such as provided by the cluster of Digital Rawhide SMP nodes. This is particularly relevant to our KeLP, Hierarchical Tiling, and generic programming research efforts. This experimental platform is good because the four processors at each level (that is, a total of 16 processors) are the minimum needed for there to be contention between two pairs of communication paths, something we wish to observe and study • Heterogeneity, particularly for AppLeS, but also for advancing Hierarchical Tiling. We need to have a research environment that is a microcosm of the diversity of resources that are part of the future Active Web. • Determining the impact of protocols for high-speed communication networks (for AppLeS, KeLP, NWS). This is supported by our availability of programmable routers. It is necessary that we have a high degree of control over the system, so we can install experimental schedulers, introduce artificial loads, do extensive system monitoring, and respond to many other unpredictable requirements of system research. That other research groups are serving as competitors for resources is viewed as desirable to us, as it will allow us to gain experience and evaluate the system management and resource sharing policies and mechanisms of the UCSD Active Web. With a common platform, we will be able to work together to understand how to write and support large-scale scientific applications, and how to aggregate and schedule large numbers of resources to support their requirements. Some of the research that will be facilitated by this project include the following: • Adaptive programs: A user’s application program has a competitive advantage if it can run efficiently on a wide variety of different computing configurations, since it will be able to use the least expensive resources available. It is an even greater advantage if a long-running program can run on a dynamically-changing configuration. Portable libraries such as SCALAPACK allow some degree of retargetability, but new techniques are needed for irregular computations and for a dynamic platform. We will experiment with (at least) two methods: a multi-tier KeLP model that facilitates Active Web-based programming on a dynamic heterogeneous computing environment; and an extension of hierarchical tiling that generates programs that adjust to dynamical computing resources. • Application schedulers: For a large application to run on the Active Web, a coordinated collection of resources needs to be allocated to the task. Rather than using traditional system-pro-
55
vided schedulers, we will experiment with the “computational marketplace” model. The AppLeS approach is to allow each application to supply a scheduling agent that represents the requirements and preferences of the user. In addition to providing a challenging testbed for AppLeS experiments, the common infrastructure will raise the important system issue of having multiple, active scheduling agents (a “Bushel of Apples”) competing for available resources. • Middleware for Scientific Metacomputing: Interaction between AppLeS, KeLP, and Hierarchical Tiling will be facilitated by having a common model of processing capabilities, memory, and communication capabilities of a heterogeneous system. A standard model will be developed, using the PMH model as a starting point. In a multithreaded programming environment, a “choreographer” thread can be used to monitor the progress of a computation, to look for opportunities to perform run-time optimizations. Brad Calder, Larry Carter, Jeanne Ferrante and Dean Tullsen will be investigating the design of such choreographer threads in connection with their joint work on the Tera MTA; the results will be incorporated into the UCSD Active Web. The NWS (Network Weather Service) will be installed on our infrastructure to monitor current resource usage and to predict future availability. G.3.6 COMPUTER AND SOFTWARE ENGINEERING Faculty: C.K. Cheng, Joseph Goguen, William Griswold, William Howden, Alex Orailoglu The faculty in the Computer and Software Engineering group need to execute tools for the design, analysis, simulation, and testing of large programs (i.e., descriptions of hardware/VLSI and software). They view the UCSD Active Web infrastructure as providing previously unavailable levels of computing power and memory storage for their tools, and multimedia-enabled end-user computers for their graphical user interfaces. G.3.6.1 High Performance Interconnect-Dominated Circuit Analysis and Synthesis The recent advent of deep sub-micron technology and new packaging schemes demands advanced CAD tools that can yield high performance designs for Multi-Chip Modules (MCMs), PCBs and high density ICs. The growing size of the circuits on the one hand and the shrinking design rules on the other have caused a shift in the global characteristics of circuit designs, from being active devices-dominated to being interconnect-dominated. The key problem with such designs is that wires can no longer be considered perfect conductors. This problem alone has such a rippling
56
effect on the entire design process that it calls for substantially different approach to design automation. C. K. Cheng is working on interconnect dominated analysis and physical layout synthesis to extend constraints and broaden problem formulations to deal with such real world requirements. With respect to analysis, the simulation of interconnects and the estimation of multiple level power consumption is accelerated [93][145]. The increased speed of the analysis enables the optimization of the whole system design. For layout synthesis, we are investigating floorplanning and mixed-mode placement [30][29][64][153][65][90][45][146]. We are also working on high performance routing synthesis to reduce interconnect delay [92][72][75] and guarantee signal integrity in terms of clock skew [33][32], crosstalk and differential pair net routing requirements [26]. Circuit complexity is expected to keep growing in the next ten years, and is growing faster than design capabilities [127]. For example, the number of transistors per chip will increase to 100 million in year 2004. To work with such large circuits, our high-performance layout synthesis tools as described above will require 100 Gbytes of memory, assuming each transistor uses 1 KB. The UCSD Active Web infrastructure will allow the development of hierarchical memory servers that would support such requirements, and will facilitate our investigation of hierarchical and distributed design methodologies to reduce complexity [64][139][150][88][87]. G.3.6.2 Reconfiguration and On-Line Diagnosis Tools It is essential that large scale computing systems be hardened against not only a single failure, but also be resistant to multiple failures. Large scale computing systems should be able to survive as long as there is even a single unit capable of executing system functionality (albeit at reduced performance). Previously embedded fault-resilient reconfiguration capabilities were achieved through insertion of spare units. In contrast to spare-based systems, Alex Orailoglu is identifying reconfiguration schemes capable of fully utilizing all system resources, at any possible configuration, thus optimizing simultaneously performance and resilience [28] [105]. We are also developing online diagnosis techniques, based on judicious mapping of computations to computational resources [66]. Such resilient, polymorphic configurations, capable of self-diagnosis and self-healing are typically considered inordinately expensive. While redundancy is required for fault detection, no additional redundancy is required for fault identification. By judiciously mapping operations to appropriate computational units, fault identification is achieved at
57
almost-zero cost. By duplicating the algorithmic description instead of the structural instantiation, compiler techniques can be used to obviate even this minimal cost of fault detection, resulting in “zero cost resilience to infinite number of faults”. These techniques require large-scale processing and memory resources, a welcome provision of our requested infrastructure. G.3.6.3 Adapting and Restructuring of Large Software Applications William Griswold is studying the problem of adapting existing software applications so that the value of their software assets can be preserved in a dynamic marketplace. The principal difficulty with such adaptations is that they are often global in nature because the original designers were unable to anticipate the present changes and modularize them. Two areas of investigation are being emphasized: global analysis and coordination of multiple global changes. Global analysis of software is expensive not only because of the algorithms used (e.g., iterative dataflow analysis), but also (and perhaps primarily) because of the memory resources required for performing such an analysis over an entire program (as opposed to a single file, as a compiler typically does) [9]. The UCSD Active Web will provide an excellent infrastructure for building memory-intensive applications, and would also allow experimenting with parallelism in global analysis applications. Coordination of multiple changes is important because of the labor-intensive nature of reengineering an application. Hundreds of restructuring changes might be required in order to impose an adequate design on an aging system. When divided amongst many programmers, these changes can easily overlap with each other on a file-by-file or function-by-function basis due to the lack of modularity of the changes. Such overlaps can result in programmers attempting to make simultaneous changes on the same component, wasting time and risking conflicting changes. One approach to overcoming this problem is to increase a programmer ’s awareness of potentially conflicting changes, perhaps through a visualization of every other programmer ’s work with respect to this programmer’s work. We will use the UCSD Active Web’s support for mobile agents to facilitate such awareness, by having agents actively track each programmer’s work and build a per-programmer view of others’ work. The end-user multimedia network computers we are requesting will support such interfaces.
58
G.3.6.4 Software Confidence Software confidence measures the extent to which a software product can be expected to fulfill user expectations. One approach involves the use of different kinds of testing methods. William Howden is working on the development of more effective confidence measurement formulae and the use of confidence acceleration techniques. Acceleration methods reduce the time required to measure confidence, using ideas such as parallel testing and fault modeling. Research in confidence measurement requires the use of flexible computing resources. For example, the fault modeling approach depends on the use of sophisticated tools for automatically generating and running tests and measuring fault coverage. This will involve large numbers of tests and depend on large memory resources and fast cycle times. Alternatively, the parallel approach depends on the availability of large numbers of separate processes that can run tests in parallel. The theory of confidence measurement and acceleration is still under development, but the basic groundwork has been laid [73], and it is now necessary to carry out extensive experimentation. The availability of the UCSD Active Web will support the ability to execute tests numbering in the millions per experiment with fast turnaround time (within minutes), providing an improvement of at least two orders of magnitude than our current capabilities. G.3.6.5 Tools for Distributed Cooperative Work Joseph Goguen is concerned with human aspects of computer technology, as well as with technical aspects of meaning. Much of his current research is focused on tools to support distributed cooperative work over the Web, especially the design, construction and verification of computer systems. An important technique is algebraic semiotics, a combination of algebraic semantics and social semiotics, which is used to develop new data structures to support the logical, cognitive and social aspects of distributed cooperative work, especially theorem proving [58][61][59]. We are also concerned with requirements capture and analysis, particularly its social side; with social, cognitive, and mathematical models of computation; and with social aspects of science and technology, including ethical issues. In addition, we are developing a new approach to the specification and verification of concurrent, distributed systems, called hidden algebra, which is intended to ease mechanical proofs [60]. An important use of the UCSD Active Web will be to carry out experiments in distributed cooperative proving, where ten to twenty people can work together on proofs using our tools, which
59
include: a proof assistant that generates proof websites, integrating technical, cognitive and social aspects; the OBJ3 theorem prover; a remote OBJ3 server; and a specialized database. Computer support for distributed cooperative proving is a new concept, and experiments with so many people have never been tried before; these experiments are expected to stress these tools and methods in unexpected ways, and thus suggest how to improve them. The proposed experiments will require the use of multimedia network computers that will support the graphical user interface, and powerful multiprocessor-based compute servers to support the computationally-intensive theorem proving. References [1]
S. Abiteboul, H. Garcia-Molina, Y. Papakonstantinou, R. Yerneni, “Fusion Query Optimization,” Submitted for publication.
[2]
S. Abiteboul, and V. Vianu, “Regular Path Queries with Constraints,” In 16th ACM SIGACTSIGMOD-SIGART Symp. on Principles of Database Systems -- PODS 97, 1997. (Full paper invited to special issue of JCSS, to appear.)
[3]
S. Adali, S. Candan, Y. Papakonstantinou, V.S. Subrahmanyan. “Query Caching and Optimization in Mediator Systems”. In ACM SIGMOD 96.
[4]
B. Alpern, L. Carter, E. Feig, and T. Selker, “The Uniform Memory Hierarchy Model of Computation,” Algorithmica 12(2-3), August-September 1994.
[5]
B. Alpern, L. Carter, and J. Ferrante, “Modeling Parallel Computers as Memory Hierarchies,” Programming Models for Massively Parallel Computers, Giloi, W. K., S. Jahnichen, and B. D. Shriver ed., IEEE Press, 1993.
[6]
B. Alpern, L. Carter, and J. Ferrante, “Space-Limited Procedures: A Methodology for Portable High-Performance,” In Proc. of International Working Conference on Massively Parallel Programming Models, 1995.
[7]
G.A. Alvarez, W.A. Burkhard, F. Cristian, “Tolerating Multiple Failures in RAID Architectures with Optimal Storage and Uniform Declustering,” In Proc. 24th Annual Intl. Symp. on Computer Architecture, pages 62-72, 1997.
[8]
S. Armstrong, A. Freier, K. Marzullo, Multicast Transport Protocol. Internet RFC 1301, February 1992.
[9]
D. C. Atkinson and W. G. Griswold, “The Design of Whole-Program Analysis Tools,” In Proceedings of the 18th International Conference on Software Engineering, March 1996.
[10] T. L. Bailey and C. Elkan, “Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers,” In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (ISMB’94), pp. 28-36. Stanford, California, August 1994.
60
[11] T. L. Bailey and C. Elkan, “The Value of Prior Knowledge in Finding Motifs with MEME,” In Proceedings of the Third International Conf. on Intelligent Systems for Molecular Biology (ISMB’95). Cambridge, England, July 1995. [12] Brian Bartell, Garrison W. Cottrell, and Richard K. Belew, “Automatic combination of multiple ranked retrieval systems,” In Proceedings of the Special Interest Group on Information Retrieval, Dublin, Ireland, ACM Press, July 1994. [13] Brian Bartell, Garrison W. Cottrell, and Richard K. Belew, “Representing documents using an explicit model of their similarities,” Journal of the American Society for Information Science, 1995. [14] M. Bellare, R. Canetti and H. Krawczyk, “Keying hash functions for message authentication,” In Advances in Cryptology - Crypto 96 Proc., Lecture Notes in Computer Science Vol. 1109, N. Koblitz ed., Springer-Verlag, 1996. [15] M. Bellare, J. Garay, R. Hauser, A. Herzberg, H. Krawczyk, M. Steiner, G. Tsudik and M. Waidner. iKP - A Family of Secure Electronic Payment Protocols. In Proc. 1st USENIX Workshop on Electronic Commerce, 1995. [16] M. Bellare, R. Impagliazzo, and M. Naor, “Does parallel repetition reduce the error in computationally sound protocols?” In Proceedings of 38th Annual Symposium on Foundations of Computer Science, IEEE, 1997. [17] M. Bellare and P. Rogaway, “Optimal asymmetric encryption,” In Advances in Cryptology Eurocrypt 94 Proceedings, Lecture Notes in Computer Science Vol. 950, A. De Santis ed., Springer-Verlag, 1995. [18] M. Bellare, P. Rogaway, “The exact security of digital signatures: How to sign with RSA and Rabin,” Advances in Cryptology-Eurocrypt 96 Proc., Lecture Notes in Computer Sci. Vol. 1070, U. Maurer ed., Springer-Verlag, 1996. [19] Mihir Bellare and Bennet S. Yee, “Forward Integrity for Secure Audit Logs,” in preparation. [20] F. Berman and R. Wolski, “The AppLeS Project: A Status Report,” In Proc. 8th NEC Research Symp., May 1997. [21] F. Berman, R. Wolski, S. Figueira, J. Schopf, and G. Shao, “Application-Level Scheduling on Distributed Heterogeneous Networks,” In Proc. Supercomputing ’96, November 1996. [22] B. Bershad, S. Savage, P. Pardyak, E. Sirer, M. Fiuczynski, D. Becker, C. Chambers, and S. Eggers, “Extensibility, Safety and Performance in the SPIN operating system,” Proc. 15th ACM Symp. on Operating System Principles (SOSP), December 1995, 267--284. [23] B. Calder, D. Grunwald, and B. Zorn, “Quantifying Behavioral Differences Between C and C++ Programs,” Journal of Programming Languages, Vol. 2, No. 4, 1994. [24] Brad Calder and Dirk Grunwald, “Reducing indirect function call overhead in C++ programs,” In Proceedings of the 21st Annual ACM Symposium on Principles of Programming Languages (POPL ‘94), pp. 397-408, Jan. 1994.
61
[25] B. Calder and D. Grunwald, “Reducing Branch Costs via Branch Alignment,” In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), pp. 242-251, Oct. 1994. [26] R.J. Carragher, C.K. Cheng, X.M. Xiong, M. Fujita, and R. Paturi, “Solving the Net Matching Problem in High-Performance Chip Design,” IEEE Trans. on CAD, pp. 902-911, Aug. 1996. [27] L. Carter, J. Ferrante, S. Flynn Hummel, B. Alpern, and K.S. Gatlin, “Hierarchical Tiling: A Methodology for High Performance,” UCSD Tech Report CS96-508, Nov 1996. [28] W. Chan and A. Orailoglu, “High Level Synthesis of Gracefully Degradable ASICs,” In European Design & Test Conference, March 1996. [29] C.K. Cheng, “Linear Placement Algorithms and Applications to VLSI Design,” Networks, vol. 17, pp. 439-464, Winter 1987. [30] C.K. Cheng and E.S. Kuh, “Module Placement Based on Resistive Network Optimization,” IEEE Trans. on Computer-Aided Design, vol. CAD-3, pp. 218-225, July 1984. [31] D. Cheriton and K. Duda, “A caching model of operating system kernel functionality,” Proc. 1st USENIX Symp. on Operating Systems Design and Implementation(OSDI), November 1994, pp. 179--194. [32] N.C. Chou and C.K. Cheng, “Wire Length and Delay Minimization in General Clock Net Routing,” In IEEE Int. Conf. on Computer-Aided Design, pp. 552-555, Nov. 1993. [33] J. Chung and C.K. Cheng, “Skew Sensitivity Minimization of Buffered Clock Tree,” In IEEE ICCAD Conf., pp. 280-283, Nov. 1994. [34] K.C. Claffy, H.-W. Braun, and G.C. Polyzos, “A Parameterizable Methodology for Internet Traffic Flow Profiling,” IEEE Journal on Selected Areas in Communications, vol. 13, no. 8, pp. 1481-1494, October 1995. [35] M. Clegg and K. Marzullo, “Clock Synchronization in Hard Real-Time Distributed Systems,” University of California, San Diego Department of Computer Science and Engineering Technical Report CS96-478, 1996. [36] M. Clegg and K. Marzullo, “Predicting Physical Processes in the Presence of Faulty Sensor Readings,” In Proc. 27th Annual Intl. Symposium on Fault-Tolerant Computing, Seattle, WA, USA, 24-27 June 1997, pp. 373-378. [37] M. Clegg and K. Marzullo, “A Low-Cost Processor Group Membership Protocol for a Hard Real-Time Distributed System,” In Proceedings of 1997 Real-Time Systems Symposium, San Francisco, CA, USA, December 1997. [38] A. Cohen and W.A. Burkhard, "Segmented Information Dispersal (SID) for Efficient Reconstruction in Fault-Tolerant Video Servers", In Proceedings of the ACM International Multimedia Conference, pages 277-286, 1996. [39] F. Cristian, “Automatic reconfiguration in the presence of failures,” Software Engineering Journal, IEE and British Computer Society, March 1993, pp 53-60.
62
[40] F. Cristian, “Synchronous and Asynchronous Group Communication,” Communications of the ACM, April 1996, pp. 88-96. [41] F. Cristian, “Group, Majority and Strict Agreement in Timed Asynchronous Distributed Systems,” In Int. Symp. on Fault-Tolerant Computing, Sendai, Japan, June 1996. [42] F. Cristian, “On the Semantics of Group Communication,” 4th International Symposium on Formal Techniques in Real Time and Fault Tolerant Systems, Sept 11-13, 1996, Uppsala, Sweden. [43] F. Cristian, S. Mishra and G. Alvarez, “High-performance asynchronous atomic broadcast,” Distributed Systems Engineering, June 1997, 4(2):109-128. [44] K. A. Dill, J. B. Rosen and A.T.Phillips, “Protein Structure and Energy Landscape Dependence on Sequence using a Continuous Energy Function,” Journal of Computational Biology 4, 1997, pp. 227-239. [45] J. Dufour, R. McBride, P. Zhang, and C.K. Cheng, “A Custom Cell Placement Tool,” In ASP/DAC, Jan. 1997, pp. 271-276, Chiba, Japan. [46] C. Elkan, “Reasoning about Action in First-Order Logic,” In Proc. of the 9th Biennial Conf. of the Canadian Society for Computational Studies of Intelligence (CSCSI’92). Vancouver, May 1992. Morgan Kaufmann Publishers. [47] D. Engler, M. Kaashoek, and O'Toole, Jr., J., “Exokernel: an operating system architecture for application-level resource management,” In Proc. 15th ACM Symp. on Operating System Principles (SOSP), December 1995, pp. 251--266. [48] K. Fall and J. Pasquale, “Improving continuous-media playback performance with in-kernel data paths,” In Proc. IEEE Intl. Conf. on Multimedia Computing and Systems (ICMCS), Boston, MA, June 94, pp. 100-109. [49] C. Fetzer and F. Cristian, “Fail-Awareness: An Approach to Construct Fail-Safe Applications,” In 27th Annual International Symposium on Fault-Tolerant Computing, Seattle, Washington, June 25-27, 1997. [50] N. Figueira and J. Pasquale, “Leave-in-time: a new service discipline for real-time communications in a packet-switching network,” In Proc. ACM Comm. Archit. and Protocols Conf. (SIGCOMM), Cambridge, MA, Sept. 95. [51] N. Figueira and J. Pasquale, “An upper bound on delay for the VirtualClock service discipline,” IEEE/ACM Transactions on Networking, 3(4), August 95, pp. 399-408. [52] S. J. Fink and S. B. Baden, “Runtime Support for Multi-Tier Programming of Block-Structured Applications on SMP Clusters,” In Proc. 1997 Intl. Scientific Computing in ObjectOriented Parallel Environments Conf. (ISCOPE ‘97), Dec. 1997. [53] S. J. Fink, S. R. Kohn, and S. B. Baden, “Flexible Communication Mechanisms for Dynamic Structured Applications,” In 3rd Intl. Workshop on Parallel Algorithms for Irregularly Structured Problems, Aug. 1996, pp. 203-215. [54] B. Ford, M. Hibler, J. Lepreau, P. Tullmann, G. Back, and S. Clawson, “Microkernels Meet Recursive Virtual Machines,” Proceedings of OSDI '96, 1996.
63
[55] J. Steven Fritzinger and Marianne Mueller, “Java Security,” Sun Microsystems, 1996, Published as http://www.javasoft.com/security/whitepaper.ps. [56] General Magic, Inc, “An Introduction to Safety and Security in Telescript,” General Magic, Inc., 1995, Published as http://www.genmagic.com/Telescript/security.html. [57] Nikolas Gloy, Trevor Blackwell, Michael D. Smith, Brad Calder, “Procedure Placement Using Temporal Ordering Information”, In IEEE 30th International Symposium on Microarchitecture, Dec. 1997. [58] Joseph Goguen, “Extended Abstract of `Semiotic Morphisms’,” In Intelligent Systems: A Semiotic Perspective, Volume II, ed. John Albus, Alex Meystel and Richard Quintero, National Institute of Standards and Technology, Gaithersberg MD, 20-23 October 1996, pages 26-31. [59] Joseph Goguen, Kai Lin, Akira Mori, Grigore Rosu, and Akiyoshi Sato, “Distributed Cooperative Formal Methods Tools,” In Proceedings, Automated Software Engineering, Lake Tahoe, 3-5 November 1997. [60] Joseph Goguen and Grant Malcolm, “A Hidden Agenda,” UCSD Technical Report CS97538, May 1997. [61] Joseph Goguen, Akira Mori, and Kai Lin, “Semiotics, ProofWebs and Distributed Cooperative Proving,” In Proceedings, User Interfaces for Theorem Provers `97, Sophia Antipolis, 1--2 September 1997, pages 24-34. [62] A. S. Grimshaw, “Easy to Use Object-Oriented Parallel Programming with Mentat,” IEEE Computer, May 1993 , pp. 39-51. [63] Amarnath Gupta and Ramesh Jain, “Visual Information Retrieval.” Communications of the ACM, 40 (5), pp. 71-79, May 1997. [64] T. Hamada, C.K. Cheng and P. Chau, “An Efficient Multi-Level Placement Technique Using Hierarchical Partitioning,” In IEEE Trans. on Circuits and Systems, vol. 39, pp. 432-439, June 1992. [65] T. Hamada, C.K. Cheng, and P. Chau, “PRIME: A Timing-Driven Placement Using A Piecewise Linear Resistive Network Approach,” In ACM/IEEE Design Automation Conf., pp. 531-536, June 1993. [66] S.N. Hamilton and A. Orailoglu, “Microarchitectural Synthesis of ICs with Embedded Concurrent Fault Isolation,” In Proceedings of the 27th International Symposium on Fault-Tolerant Computing, pages 329--338, June 1997. [67] J.K. Han, and G.C. Polyzos, “Networking Applications of the Hierarchical Mode of the JPEG Standard,” In Proc. IEEE IPCCC’96, Phoenix, AZ, pp. 58-64, March 1996. [68] J.K. Han, and G.C. Polyzos, “Multi-Resolution Layered Coding for Real-Time Image Transmission: Architectural and Error Control Considerations,” Real-Time Imaging, Academic Press, in press.
64
[69] Amir H. Hashemi, David R. Kaeli, and Brad Calder, “Efficient Procedure Mapping Using Cache Line Coloring”, In Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1997. [70] N. Heintze, J. D. Tygar, and B. Yee, “Designing cryptographic postage indicia,” In Proc. of ASIAN-96, Invited paper, 1996. [71] K. Hogstedt, L. Carter, and J. Ferrante, “Determining the Idle Time of a Tiling,” In Proc. of the ACM Symposium on Principles of Programming Languages, January 1997. [72] X. Hong, T. Xue, C.K. Cheng, E.S. Kuh, and J. Huang “Performance-Driven Steiner Tree Algorithms for Global Routing,” In ACM/IEEE Design Automation Conf., pp. 177-181, June 1993. [73] W.E. Howden, “Confidence based reliability and statistical coverage estimation,” In Proceedings of the International Symposium on Software Reliability Engineering, IEEE, Albuquerque, November, 1997. [74] C.A. Hsieh, J.C. Gyllenhaal, and W.W. Hwu, “Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary Results,” In Proc. of the 29th Annual Intl. Symposium on Microarchitecture (Micro ‘96), Dec. 1996. [75] J. Huang, X. Hong, C.K. Cheng, and E.S. Kuh, “An Efficient Timing-Driven Global Routing Algorithm,” In ACM/IEEE Design Automation Conf., pp. 596-600, June 1993. [76] Ramesh Jain, “Telepresence in Education--Building the Universal University,” Educom Review, pp. 49-55, Volume 32, No. 3, May/June 1997. [77] M. Jakobsson, “Privacy vs. Authenticity,” Ph. D. Dissertation, U.C. San Diego, 1997. [78] M. Jakobsson, K. Sako and R. Impagliazzo, “Designated verifier proofs and their applications,” In Advances in Cryptology - EUROCRYPT ‘96, Springer-Verlag, 1996. p. 143-54. [79] M. Jakobsson, M. Yung, “Revokable and versatile electronic money,” In 3rd ACM Conference on Computer and Communications Security, 1996. pp. 76-87. [80] M. Jakobsson and M. Yung, “Distributed `magic ink’ signatures,” In Advances in Cryptology - EUROCRYPT ‘97, Springer-Verlag, 1997. [81] Arun Katkere, Saied Moezzi, Don Y. Kuramura, Patrick Kelly and Ramesh Jain, “Towards Video-based Immersive Environments,” Multimedia Systems, pp. 69-85, Spring 1997. [82] Arun Katkere, Jennifer Schlenzig, Amarnath Gupta and Ramesh Jain, “Interactive Video on WWW: Beyond VCR-like Interfaces,” In Fifth International World Wide Web Conference (WWW5), Paris, France, May 6-10 1996. [83] J. Kay and J. Pasquale, “The importance of non-data touching processing overheads in TCP/ IP,” In Proc. ACM Communications Architectures and Protocols Conf. (SIGCOMM), San Francisco, CA, September 93, pp. 259-269. [84] V.P. Kompella, J. Pasquale, and G. Polyzos, “Multicast Routing for Multimedia Communication,” IEEE/ACM Transactions on Networking, vol. 1, no. 3, pp. 286-292, June 1993.
65
[85] V.P. Kompella, J. Pasquale, and G.C. Polyzos, “Optimal Multicast Routing with Quality of Service Constraints,’’ Journal of Network and Systems Management, vol. 4, no 2, pp. 107131, June 1996. [86] H. Krawczyk, M. Bellare, R. Canetti, “HMAC: Keyed-Hashing for message authentication,” Internet RFC 2104, Feb. 1997. [87] M.T. Kuo and C.K. Cheng “A New Network Flow Approach for Hierarchical Tree Partitioning,” In ACM/IEEE Design Automation Conf., pp. 512-517, June 1997. [88] M.T. Kuo, L.T. Liu, and C.K. Cheng, “Network Partitioning into Tree Hierarchies,” In ACM/IEEE Design Automation Conf., June 1996, pp. 477-482. [89] Jacob Y. Levy and John K. Ousterhout, “A Safe Tcl Toolkit For Electronic Meeting Places (Extended Abstract),” In Proceedings of the First USENIX Workshop in Electronic Commerce, pp. 133-135, July 1995. [90] J. Li, J. Lillis, L.T. Liu, and C.K. Cheng, “New Spectral Linear Placement and Clustering Approach,” In ACM/IEEE Design Automation Conf., June 1996, pp. 88-93. [91] J. Liedtke, “On micro-kernel construction,” Proc. 15th ACM Symp. on Operating System Principles (SOSP), December 1995. [92] J. Lillis, C.K. Cheng, T.T. Lin, and C.Y. Ho, “New Performance Driven Routing Techniques with Explicit Area/Delay Tradeoff and Simultaneous Wire Sizing,” In ACM/IEEE Design Automation Conf., June 1996, pp. 395-400. [93] F.J. Liu, J. Lillis, and C.K. Cheng, “Design and Implementation of a Global Router Based on a New Layout Driven Timing Model with Three Poles,” In IEEE Int. Symp. on Circuits and Systems, 1997. [94] K. Marzullo, R. Cooper, M. Wood, K. Birman. “Tools for Monitoring and Controlling Distributed Applications,” IEEE Computer, Aug. 1991, 24(8):42-51. [95] K. Marzullo, M. Ogg, A. Ricciardi, A. Amoroso, F. Calkins, and E. Rothfus, “Nile: WideArea Computing for High Energy Physics,” In Proc. of Seventh ACM SIGOPS European Workshop, 1996, pp. 49--54. [96] K. Marzullo, M. Ogg, and A. Ricciardi. “The Nile System Architectures,” In Proc. of the Eleventh International Conference on Systems Engineering, July 1996, pp. 414-419. [97] F. Menczer, “ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery”, Ed. D. Fisher, Morgan Kaufman: Machine Learning: Proc. of the 14th Intl. Conf. (ICML97) , 1997. [98] F. Menczer, R. K. Belew, and W. Willuhn, “Artificial Life Applied to Adaptive Information Agents”, Ed. C. Knowblock and A. Levy and S-S. Chen and G. Wiederhold, AAAI Spring Symposium Series: Information gathering from heterogeneous, distributed environments, 1995. [99] F. Menczer, W. Willuhn, and R. K. Belew, “An Endogenous Fitness Paradigm for Adaptive Information Agents”, In Proc. Third International Conference on Information and Knowl-
66
edge Management (CIKM’94) - Workshop on Intelligent Information Agents, December 1994, NIST, Gaithersburg, MD, (http://www.cs.umbc.edu/~cikm/1994/iia/). [100] N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt, “Quantifying the Multi-level Nature of Tiling Interactions,” In Proc. of Tenth International Workshop on Languages and Compiler for Parallel Computing, August 1997, to be published as Springer-Verlag Lecture Notes in Computer Science. [101] Saied Moezzi, Arun Katkere, Don Y. Kuramura and Ramesh Jain, “Reality Modeling and Visualization from Multiple Video Sequences,” IEEE Computer Graphics & Applications, pp. 58-63, Nov. 1996. [102] A. B. Montz, D. Mosberger, S. W. O'Malley, L. L. Peterson, T. A. Proebsting, J. H. Hartman, “Scout: A communications-oriented operating system,” Dept. of Computer Science, U. of Arizona, Tech. Rept. number 94-20, June 1994. [103] K. Muller and J. Pasquale, “A high-performance multi-structured file system design,” In Proc. 13th ACM Symposium on Operating System Principles (SOSP), Asilomar, CA, October 91, pp. 56-67. [104] M. Ogg and A. Ricciardi, “Nile: Reliable, Large-Scale Cluster Computing with Replicated CORBA Objects,” In Proc. of OOPSLA ‘97 Workshop on Dependable Objects, 1997. [105] A. Orailoglu, “Microachitectural Synthesis of Gracefully Degradable, Dynamically Reconfigurable ASICs for Multiple Faults,” In 1996 International Conference on Computer Design, pp. 112--117, October 1996. [106] C.H. Papadimitriou, Srinivas Ramanathan, and P. Venkat Rangan, “Information Caching for Delivery of Personalized Video Program on Home Entertainment Channels,” In Proceedings of IEEE International Conference on Multimedia Computing and Systems, Boston, pages 214-223, May 14-19, 1994. [107] Y. Papakonstantinou, “Query Processing in Heterogeneous Information Systems,” Stanford Univ. Thesis, 1997. [108] Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. “Object Fusion in Mediator Systems”. In VLDB 96. [109] Y. Papakonstantinou, H. Garcia-Molina, J. Ullman. “MedMaker: A Mediation System Based on Declarative Specifications”. In ICDE 96. [110] Y.Papakonstantinou, A. Gupta, H. Garcia-Molina, J. Ullman. “A Query Translation Scheme for Rapid Implementation of Wrappers”. In DOOD 95. [111] Y.Papakonstantinou, A. Gupta, L. Haas. “Capabilities-Based Query Rewriting in Mediator Systems”. In PDIS 96. Selected in the “Best of PDIS”. Extended version to appear in DAPD. [112] Y. Papakonstantinou, A. Gupta, L. Haas, “Capabilities-Based Query Rewriting in Mediator Systems,” DAPD ‘98. [113] J. Pasquale, E. Anderson, and K. Muller, “Container Shipping: operating system support for intensive I/O applications,” IEEE Computer, Vol. 27, No. 3, March 94, pp. 84-93.
67
[114] J. Pasquale, G. Polyzos, E. Anderson, and V. Kompella, “The Multimedia Multicast Channel,” Journal of Internetworking: Research and Experience, Vol. 5, No. 4, December 1994, pp. 151-162. [115] J. Pasquale, G. Polyzos, E. Anderson, and V. Kompella, “Filter propagation in dissemination trees: trading off bandwidth and processing in continuous media networks,” In Proc. 4th International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), Eds. D. Shepherd, G. Blair, G. Coulson, N. Davies and F. Garcia, Lecture Notes in Computer Science, Vol. 846, Springer-Verlag, 1994. [116] J. Pasquale, G. C. Polyzos, and G. Xylomenos, “The Multimedia Multicast Problem,” Multimedia Systems, ACM/Springer-Verlag, in press. [117] Holger Peine and Torsten Stolpmann, “The Architecture of the ARA Platform for Mobile Agents,” In Proceedings of the First International Workshop on Mobile Agents, In Kurt Rothermel and Radu Popescu-Zeletin, Editors, Springer-Verlag: Lecture Notes in Computer Science, Volume 1219, April, 1997. [118] G.C. Polyzos, , and K. Taylor, “A Prototype Video Dissemination Application over ATM,” In Proc. IEEE ICC’95, Seattle, WA, pp. 1262-1266, June 1995. [119] Srinivas Ramanathan and P. Venkat Rangan, “Adaptive Feedback Techniques for Synchronized Multimedia Retrieval over Integrated Networks,” IEEE/ACM Transactions on Networking, 1(2), pages 246-260, April 1993. [120] Srinivas Ramanathan and P. Venkat Rangan, “System Architectures for Personalized Multimedia Services,” IEEE Multimedia, 1(1), pages 37-46, February 1994. [121] Srinivas Ramanathan, Harrick M. Vin, and P. Venkat Rangan, “Towards Personalized Multimedia Dial-Up Services,” Computer Networks and ISDN Systems, Pages 1305-1322, 1994. [122] P. Venkat Rangan and Srinivas Ramanathan, “Feedback Techniques for Continuity and Synchronization in Multimedia Information Retrieval,” ACM Transactions on Information Systems, 13(2), Pages 145-176, April 1995. [123] P. Venkat Rangan and Harrick M. Vin, “Designing File Systems for Digital Video and Audio,” In Proc. of the 13th Symposium on Operating Systems Principles (SOSP'91), Operating Systems Rev., 25(5), pages 81-94, Oct 1991. [124] P. Venkat Rangan, Harrick M. Vin, and Srinivas Ramanathan, “Designing an On-Demand Multimedia Service,” IEEE Communications Magazine, 30(7), pages 56-65, July, 1992. [125] P. Venkat Rangan and Harrick M. Vin, “Efficient Storage Techniques for Digital Continuous Multimedia,” IEEE Transactions on Knowledge and Data Engineering, 5(4), pages 564573, August, 1993. [126] J. B. Rosen, K. A. Dill and A. T. Phillips, “CGU:An Algorithm for Molecular Structure Prediction,” IMA Vols.in Mathematics and its Applications, Large Scale Optimization, Part III:Molecular Structure and Optimization, Springer-Verlag, 1997. [127] Semiconductor Industry Association National Technology Roadmap, 1994.
68
[128] Simone Santini and Ramesh Jain, “Similiarity Queries in Image Databases.” In IEEE International Conference on Computer Vision and Pattern Recognition, San Francisco, CA, June 1996. [129] Amitabh Srivastava and Alan Eustace, “A System for Building Customized Program Analysis Tools,” 1994 Programming Language Design and Implementation, ACM, pp. 196-205, June 1994. [130] Amitabh Srivastava and David W. Wall, “A Practical System for Intermodule Code Optimizations at Link-time,” Journal of Programming Languages, March 1992. [131] D. L. Tennenhouse, J. M. Smith, W. D. Sincoskie, D. J. Wetherall, and G. J. Minden, “A Survey of Active Network Research,” IEEE Communications Magazine, January 1997. [132] D.M. Tullsen, S.J. Eggers, and H.M. Levy, “Simultaneous Multithreading: Maximizing OnChip Parallelism,” In 22nd Annual International Symposium on Computer Architecture (ISCA), pp. 392-403, June 1995. [133] D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm, “Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor,” In 23nd Annual International Symposium on Computer Architecture (ISCA), pp. 191-202, May 1996. [134] U. S. Postal Service, “Information Based Indidia Program (IBIP),” Postal Secure Device (PSD) Spec., 1996. [135] V. Vassalos, Y. Papakonstantinou, “Describing and Using Query Capabilities of Heterogeneous Sources”. In VLDB 97. [136] V. Vassalos, Y. Papakonstantinou, “Expressive Capabilities Description Languages and Query Rewriting Algorithms, ” Submitted for publication. [137] Harrick M. Vin and P. Venkat Rangan, “Designing a Multi-User HDTV Storage Server,” IEEE Journal on Selected Areas in Communications, 11(1), pages 153-164, January 1993. [138] C.C. Vogt, G.W. Cottrell, R.K. Belew, and B.T. Bartell, “Using Relevance to Train a Linear Mixture of Experts,” In The Fifth Text Retrieval Conference, Ed. D. Harman, Gaitherberg, MD, 1996. [139] Y.C. Wei and C.K. Cheng, “Ratio Cut Partitioning for Hierarchical Designs,” IEEE Trans. on Computer-Aided Design, vol. 10, pp. 911-921, July 1991. [140] Steve H. Weingart, “Physical Security for the mu ABYSS System,” In Proceedings of the IEEE Computer Society Conference on Security and Privacy, pp. 52-58, 1987. [141] Steve R. White and Liam Comerford, “ABYSS: A Trusted Architecture for Software Protection,” In Proceedings of the IEEE Computer Society Conference on Security and Privacy, pp. 38-51, 1987. [142] Steve R. White, Steve H. Weingart, William C. Arnold, and Elaine R. Palmer, “Introduction to the Citadel Architecture: Security in Physically Exposed Environments,” IBM Thomas J. Watson Research Center, Distributed security systems group, RC16672, Version 1.3, March 1991.
69
[143] R. Wolski, “Dynamically Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service,” In Proc. of the 6th High-Performance Distributed Conference, August 1997. [144] R. Wolski, N. Spring and C. Peterson, “Implementing a Performance Forecasting System for Metacomputing: The Network Weather Service,” In Proc. Supercomputing ’97, November 1997. [145] X.M. Xiong and C.K. Cheng “Interconnect and Output Driver Modeling of High Speed Designs,” In IEEE Int. Conf. on ASIC, pp. 507-510, Sept. 1993. [146] J. Xu, P.N. Guo, and C.K. Cheng “Cluster Refinement for Block Placement,” In ACM/IEEE Design Automation Conf., pp. 762-765, June 1997. [147] G. Xylomenos, and G.C. Polyzos, “IP Multicasting for Wireless Mobile Hosts,” In Proc. MILCOM’96, pp.933-937, Washington, DC, November 1996. [148] G. Xylomenos, G. and G. C. Polyzos, “IP Multicast for Mobile Hosts,” IEEE Communications Magazine, vol. 35, no. 1, pp. 54-58, January 1997. (Special Issue on Internet Technology). [149] G. Xylomenos, and G.C. Polyzos, “IP Multicasting for Point-to-Point Local Distribution,” In Proc. INFOCOM’97, Kobe, Japan, pp. 1382-1389, April 1997. [150] S. Z. Yao, C. K. Cheng, D. Dutt, S. Nahar, and C. Y. Lo, “A Cell-Based Hierarchical Pitchmatching Compaction Using Minimal LP,” IEEE Trans. on CAD, pp. 523-526, April 1995. [151] B. Yee. “Using secure coprocessors,” Ph.D. Dissertation, CMU Tech Report CMU-CS-94149. [152] B. Yee. “A Sanctuary for Mobile Agents,” UCSD Tech Report CS97-537. An earlier version appeared in Proceedings of the DARPA Workshop on Foundations for Secure Mobile Code, 1997. [153] D. Zaleta, J. Fan, B.C. Kress, S.H. Lee, and C. K. Cheng, “Optimum Placement for Optoelectronic MultiChip Modules and the Synthesis of Diffractive Optics for Multichip Module Interconnects”, Applied Optics, pp. 1444-1456, March 1994.
70
H. STAFF CREDENTIALS
H.1 Curriculum Vitaes of Principal Investigators
JOSEPH PASQUALE (Principal Investigator) Professor, Department of Computer Science and Engineering, University of California, San Diego CONTACT INFORMATION • Postal: Dept. of Computer Science and Engineering, U.C. San Diego, La Jolla, CA 92093 • Telephone: (619) 534-2673 (office), (619) 534-7029 (fax) • Internet: [email protected] (email), http://www-cse.ucsd.edu/users/pasquale (www) RESEARCH INTERESTS • Operating systems: kernel structure, multimedia and agent-based computing, I/O, file systems • Networks: Internet computing, real-time communications, multicasting • Decentralized control: multi-agent coordination, effects of delayed communication EDUCATION • Ph.D. Computer Science, University of California, Berkeley, 1988. • M.S. Electrical Engineering & Computer Science, Massachusetts Institute of Technology, 1982 • B.S. Electrical Engineering & Computer Science, Massachusetts Institute of Technology, 1982 EMPLOYMENT • • • • •
Professor, University of California, San Diego, 1996 to present Associate Professor, University of California, San Diego, 1983 to 1996 Assistant Professor, University of California, San Diego, 1987 to 1993 Member of Technical Staff, Bell Communications Research (Bellcore), Summer 1984 Member of Technical Staff, AT&T Bell Laboratories (MIT Co-op), Summers 1981, 82, 83
AWARDS • • • • • • • •
J. Robert Beyster Chair in Engineering, 1996-present Defense Science Study Group V, Institute for Defense Analyses, 1996-1997 CSE Best Teacher Award, UCSD, 1992 IBM Faculty Development Award, 1991 TRW Young Investigator Award, 1991 PYI: Presidential Young Investigator Award, National Science Foundation, 1989 IBM Doctoral Fellowship, 1985 and 1986 Eta Kappa Nu Outstanding Teaching Assistant of the Year, 1983
71
SERVICE • ACM Distinguished Doctoral Dissertation Award Committee, 1993-1997 (Chairman in 1996) • Program Committees: ACM SIGMETRICS (1998); IEEE INFOCOM (1997); Network and Operating System Support for Digital Audio and Video, NOSSDAV (1997); ACM ComputerSupported Cooperative Work, CSCW (1995 and 1996); IEEE Autonomous Decentralized Systems, ISADS (1995 and 1996); ACM SIGCOMM (1995); ACM Multimedia (1994) • Contributor, “Strategic Directions in Computing Research,” ACM Computing Surveys, 96. • Contributor, “Research and Development for the National Information Infrastructure: Technical Challenges,” Interuniversity Communications Council Inc., 1994 • Contributor, “Scope and Future Direction of Computer Science and Technology,” National Academy of Sciences Computer Science and Telecommunications Board, 1994 • Chairman, NASA Science User Network (NSUN) Working Group, 1993-1995 • NSF Division of Information, Robotics, and Intelligent Systems Advisory Committee, 1992 • Member of ACM (‘84 to present) and IEEE (‘84 to present) RELATED PUBLICATIONS • Joseph Pasquale, “Towards Internet Computing,” ACM Computing Surveys, Special Issue: Strategic Directions in Computing Research, Vol. 28 (4ES), December 1996, http://www.acm.org/ pubs/citations/journals/surveys/1996-28-4es/a215-pasquale. • David D. Clark, Joseph Pasquale, et al., “Strategic Directions in Networks and Telecommunications'' ACM Computing Surveys, Special Issue: Strategic Directions in Computing Research, Vol. 28, No. 4, December 1996, pp. 679-690. • Joseph Pasquale, Eric Anderson, and Keith Muller, “Container-shipping: Operating System Support for Intensive I/O Applications,” IEEE Computer, Vol. 27, No. 3, March 94, pp. 84-93. • Keith Muller and Joseph Pasquale, “A High-Performance Multi-Structured File System Design,'' Proc. 13th ACM Symp. on Operating System Principles (SOSP), 1991, pp. 56-67. • Jonathan Kay and Joseph Pasquale, “Profiling and Reducing Processing Overheads in TCP/IP,'' IEEE/ACM Transactions on Networking, December 1996, pp. 817-828 (earlier version appeared in Proc. SIGCOMM ‘93). OTHER PUBLICATIONS • Joseph Pasquale, “Decentralized Control Using Randomized Coordination: Dealing with Uncertainty and Avoiding Conflicts,” In G.M. Olson, T.W. Malone and J.B. Smith (Eds.), Coordination Theory and Collaboration Technology. Mahwah, NJ: Lawrence Erlbaum Assoc., 1998. • Joseph Pasquale, George Polyzos, and George Xylomenos, “The Multimedia Multicast Problem,” To appear in ACM Multimedia Systems. • Norival Figueira and Joseph Pasquale, “Leave-in-time: a new service discipline for real-time communications in a packet-switching network,” Proc. SIGCOMM, 1995, pp. 207-218. • Joseph Pasquale, George Polyzos, Eric Anderson, and Vachaspathi Kompella, “The Multimedia Multicast Channel,” Journal of Internetworking: Research and Experience, Vol. 5, No. 4, December 1994, pp. 151-162. • Vachaspathi Kompella, Joseph Pasquale, and George Polyzos, “Multicast routing for multimedia applications,” IEEE/ACM Trans. on Networking, Vol. 1, No. 3, June 93, pp. 286-292.
72
RECENT COLLABORATORS • Brad Calder (UCSD), Yannis Papakonstantinou (UCSD), George Polyzos (UCSD), Bennet Yee (UCSD) GRADUATE STUDENTS • 8 Ph.D. graduates: Eric Anderson, ‘95; Edward Billard, ‘92; Kevin Fall, ‘94; Norival Figueira, ‘95; Alexander Glockner, ‘91; Jonathan Kay, ‘95; Vachaspathi Kompella, ‘93; Keith Muller, ‘90 • 17 M.S. graduates: Eric Anderson, ‘91, Sylvie Bessette, ‘89, Edward Billard, ‘90, Barry Brown, ‘97, Mark Bubien, ‘91, Fred Chong, ‘96, Kevin Fall, ‘91, Norival Figueira, ‘93, Vachaspathi Kompella, ‘90, Daniel Kraiman, ‘91, Scott McMullan, ‘93, Tom Nguyen, ‘95, Vishesh Parikh, ‘90, Dipti Ranganathan, ‘92, Keith Swensen, ‘89, Robert Terek, ‘91, Inge Winkler, ‘91 • 7 current graduate students: Jee Hea An, Stephane Belmon, Jack Dietz, Eugene Hung, David Moore, Max Okumoto, Anthony Tang GRADUATE ADVISORS • Ph.D advisor: Domenico Ferrari (U.C. Berkeley); M.S. advisor: Marvin Minsky (MIT)
73
RICHARD K. BELEW (Co-Principal Investigator) Associate Professor, Dept. of Computer Science and Engineering, Univ. of California, San Diego CONTACT INFORMATION • Postal: Dept. of Computer Science and Engineering, U.C. San Diego, La Jolla, CA 92093-0114 • Telephone: (619) 534-2601 (office), (619) 534-7029 (fax) • Internet: [email protected] (email), http://www-cse.ucsd.edu/users/rik (www) RESEARCH AREAS • Machine learning, especially in subsymbolic representations such as connectionist networks and Genetic Algorithms • Artificial Intelligence • Free-text information retrieval • Legal and library applications EDUCATION • Ph.D. Computer Science, Univ. Michigan, Ann Arbor, MI, 1986 • M.S. Computer Science, Univ. Michigan, Ann Arbor, MI, 1984 • B.S. Mathematics, Univ. Minnesota, Minneapolis, MN, 1979 RPROFESSIONAL ACTIVITIES • • • • • • • •
Associate Editor, Evolutionary Computation Journal Editorial Board, Journal Adaptive Behavior Editorial Board, Artificial Life Journal Program Co-chair, Artificial Life (1998, forthcoming) Program Co-chair, Foundations of Genetic Algorithms (1996) Program Co-chair, Intl. Conf. on Genetic Algorithms (1991) Program Committee, Intl. Conf. on R&D in Information Retrieval (1992, 1994, 1995) Program Committee, 14th Intl. Conf. on R&D in Information Retrieval (SIGIR ‘92)
RELATED PUBLICATIONS • F. Menczer, W. Willuhn, and R. K. Belew, “An endogenous fitness paradigm for adaptive information agents,” In Proc. Third International Conference on Information and Knowledge Management (CIKM'94) - Workshop on Intelligent Information Agents, NIST, Gaithersburg, MD, December 1994, (http://www.cs.umbc.edu/ cikm/1994/iia/). • F. Menczer, R. K. Belew, and W. Willuhn, “Artificial life applied to adaptive information agents,” In C. Knowblock, A. Levy, S-S. Chen, and G. Wiederhold, editors, AAAI Spring Symposium Series: Information gathering from heterogeneous, distributed environments, AAAI, 1995.
74
• F. Menczer and R. K. Belew, “From complex environments to complex behaviors,” Adaptive Behavior, 4(3-4) 1996. • B. T. Bartell, G. W. Cottrell, and R. K. Belew, Automatic combination of multiple ranked retrieval systems,” In Proceedings of SIGIR, Dublin, 1994. • B. T. Bartell, G. W. Cottrell, and R. K. Belew, “ Learning the optimal parameters in a ranked retrieval system using multi-query relevance feedback,” In Proceedings of the Symposium on Document Analysis and Information Retrieval, Las Vegas, 1994. OTHER PUBLICATIONS • R. K. Belew and M. Vose, Foundations of Genetic Algorithms IV, Morgan Kaufman, 1997. • R. K. Belew and M. Mitchell, “Adaptive Individuals in Evolving Populations: Models and Algorithms,” volume XXVI of Santa Fe Institute Studies in the Science of Complexity, Addison-Wesley, 1996. • T. Kammeyer, R. K. Belew, and S. Gill Williamson, “Evolving compare-exchange networks using grammars,” Artificial Life, 2(2):199-237, 1995. • C. Rosin and R. K. Belew. ,”A competitive approach to game learning,” In A. Blum and M. Kearns, editors, Proc. 9th Annual ACM Workshop on Computational Learning Theory, pages 292--302. ACM, 1996. • Rosin, C.D., Halliday, R.S., Hart, W.E., and Belew, R.K. "A Comparison of Global and Local Search Methods in Drug Docking," In T. Baeck, ed., Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA97), Morgan Kaufmann, San Francisco, CA. 1997 LIST OF RECENT COLLABORATORS • Federico Cecconi (CNRS - Rome), Marti Hearst (Univ. California - Berkeley), Melanie Mitchell (Santa Fe Inst.), Art Olson (The Scripps Research Inst.), Jude Shavlik (Univ. Wisconsin), Chuck Taylor (Univ. California - Los Angeles) LIST OF GRADUATE STUDENTS • Daniel Rose (Apple Computer), John McInerney (Encyclopedia Britannica), William Hart (Sandia National Labs), Amy Steier (Encyclopedia Britannica), John Hatton (SIL), Chris Rosin (The Scripps Research Inst.) GRADUATE ADVISOR • Stephen Kaplan (Univ. Michigan), Paul Scott (Univ. Essex)
75
JEANNE FERRANTE (Co-Principal Investigator) Professor and Chair, Dept. of Computer Science and Engineering, Univ. of California, San Diego CONTACT INFORMATION • Postal: Dept. of Computer Science and Engineering, U.C. San Diego, La Jolla, CA 92093-0114 • Telephone: (619) 534-8406 (office), (619) 534-7029 (fax) • Internet: [email protected] (email), http://www-cse.ucsd.edu/users/ferrante (www) RESEARCH INTERESTS • Intermediate representations for optimizing and parallelizing compilers • High performance optimizations for machines with multiple levels of memory and parallelism, and for multithreaded machines EDUCATION • Ph.D. in Mathematics, Massachusetts Institute of Technology, 1974. • B.A. in Mathematics, New College of Hofstra, 1969 EMPLOYMENT • • • • •
Professor, University of California, San Diego, 1994 to present Research Staff Member, IBM's T.J. Watson Research Center, 1978-1994 Visiting Research Scientist, University of Colorado, Boulder, 1992-1993 NSF Visiting Professorship for Women, UC, Berkeley, 1985-1986 Assistant Professor, Mathematics, Tufts University, 1974-78
RECENT ACCOMPLISHMENTS • ACM Fellow, 1996 • Co-inventor of Explicit Data Placement (XDP), a set of intermediate language constructs that allow compiler control of both data and data ownership transfers • Co-inventor of Sparse Data Flow Evaluation Graphs, which allow efficient demand-driven solution of forward or backward data flow problems • Co-developer of algorithm to efficiently compute Static Single Assignment form, a representation that yields faster, more compact and powerful program optimizations • Co-inventor of Program Dependence Graph, program representation of essential control flow and data flow order of a program that exposes maximal parallelism RELATED PUBLICATIONS • N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt, “Quantifying the Multi-level Nature of Tiling Interactions,” Tenth International Workshop on Languages and Compiler for Parallel Computing, August 1997.
76
• K. Hogstedt, L. Carter, and J. Ferrante, “Determining the Idle Time of a Tiling,” Conference Record of the ACM Symposium on Principles of Programming Languages, January 1997. • L. Carter, J. Ferrante, S. Flynn Hummel, B. Alpern, and K.S. Gatlin, “Hierarchical Tiling: A Methodology for High Performance,” UCSD Tech Report CS96-508, November 1996. • L. Carter, J. Ferrante and S. Flynn Hummel, “Hierarchical Tiling for Improved Superscalar Performance,” Ninth International Parallel Processing Symposium, Santa Barbara, CA, April 1995. • L. Carter, J. Ferrante and S. Flynn Hummel, “Efficient Multiprocessor Parallelism via Hierarchical Tiling,” ISIAM Conference on Parallel Processing for Scientific Computing, February 1995. OTHER PUBLICATIONS • R. Cytron, and J. Ferrante, “Efficiently computing phi-nodes on-the-fly,” ACM Transactions on Programming Languages and Systems, (17:3) May 1995. • J.-D. Choi, R. Cytron, and J. Ferrante, “On the Efficient Engineering of Ambitious Program Analysis,” IEEE Transactions on Software Engineering, vol. 2, no. 2, February 1994. • L. Carter, J. Ferrante and V. Bala, “XDP: A compiler intermediate language extension for the representation and optimization of data movement,” International Journal of Parallel Programming, (22:5) October 1994. • R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, “Efficiently Computing Static Single Assignment Form and the Control Dependence Graph,” ACM Transactions on Programming Languages and Systems, October 1991. • J. Ferrante, K. Ottenstein, and J. Warren, “The Program Dependence Graph and its Use in Optimization,” ACM Transactions on Programming Languages and Systems, vol. 9, pp. 319--349, July 1987. RECENT COLLABORATORS • Brad Calder (UCSD), Val Donaldson (UCSD), Dirk Grunwald (U. Colorado), Harini Srinivasan, Dean Tullsen (UCSD) GRADUATE STUDENTS • Val Donaldson (Postdoc), Karin Hogstedt (Advisee), Nick Mitchell (Advisee) GRADUATE ADVISOR • Albert Meyer (MIT)
77
RUSSELL IMPAGLIAZZO (Co-Principal Investigator) Associate Professor, Dept. of Computer Science and Engineering, Univ. of California, San Diego CONTACT INFORMATION • Postal: Dept. of Computer Science and Engineering, U.C. San Diego, La Jolla, CA 92093-0114 • Telephone: (619) 534-1332 (office), (619) 534-7029 (fax) • Internet: [email protected] (email), http://www-cse.ucsd.edu/users/russell (www) RESEARCH INTERESTS • Cryptography, security EDUCATION • Ph.D. in Mathematics, University of California, Berkeley, 1984-1989 • B.S. in Mathematics, Wesleyan University, 1981-1984 EMPLOYMENT • Associate Professor, University of California, San Diego, 1996 to present • Assistant Professor, University of California, San Diego, 1991 to 1996 • Postdoctoral Fellow, University of Toronto, 1989-1991 GRANTS AND AWARDS • Alfred P. Sloan Research Fellow, 1994 • Israel-US Bi-National Science Foundation Grant (for collaboration with Prof. Noam Nisan, Hebrew University), 1993-1995 • NYI: NSF Young Investigator Award, 1992-1997 • IBM Faculty Development Award, 1991 SERVICE • Program Committees: Foundations of Computer Science, 1992; Crypto, 1993 ; Crypto, 1994; Structures in Complexity Theory, 1995; Foundations of Computer Science, 1996; Symposium on the Theory of Computing, 1998. • Member, NSF CAREER Advisory Committee, 1995 RELATED PUBLICATIONS • Bellare, M., Impagliazzo, R., and Naor, M., “Does parallel repetition lower error in computationally sound protocols?,” to appear in FOCS `97. • T. Carson and R. Impagliazzo, “Exploring search spaces of graph bisection problems with gowith-the-winners algorithms'', in preparation.
78
• Di Crescenzo, G., Ferguson, N., Impagliazzo, R., and M. Jakobsson, “How to Forget a Secret,” manuscript, 1997. • R. Impagliazzo and M. Naor, “Efficient Cryptographic Schemes Provably as Secure as Subset Sum,” Journal of Cryptology, Vol. 9, No. 4, (Autumn, 1996), pp. 199-216. • Jakobsson, M., Sako, K., and Impagliazzo, R., “Designated verifier proofs and their applications,” EUROCRYPT '96. OTHER PUBLICATIONS • A. Dimitriou and R. Impagliazzo, “Towards a rigorous analysis of local optimization algorithms,” STOC Proceedings, 1996, pp. 304-313. • J. Hastad, R. Impagliazzo, L. Levin, and M. Luby, “Construction of a pseudo-random generator from any one-way function,” To appear in SIAM Journal on Computing. • R. Impagliazzo, R. Paturi, and M. Saks, “Size-depth trade-offs for threshold circuits,” SIAM Journal on Computing, Vol. 26. No. 3, pp. 693-707. • Impagliazzo, R., and Rudich, S., “Limits on the Provable Consequences of One-Way Permutaions,” To appear in Journal of Cryptology. • Impagliazzo, R., and Wigderson, A., “P=BPP if E requires exponential circuits: Derandomizing the XOR lemma,” STOC Proceedings, 1997, pp. 220-229. RECENT COLLABORATORS • D. Aharanov (Hebrew U.), M. Agarwal, E. Allender (Rutgers), P. Beame (U. Wash.), M. Bellare(UCSD), M. Ben-Or (Hebrew U.), S. Buss (UCSD), M. Clegg(UCSD), A. Clementi (U. Roma), S. Cook (U. Toronto), J. Edmonds (York), N. Ferguson (DigiCash), F. Fich(U. Toronto), A. Gupta (Simon Frasier), J. Hastad (Royal Acad.), B. Kapron (U. Victoria), V. King (U. Victoria), J. Krajicek (Math. Inst. Prague), M. Kutylowski, L. Levin (Boston U.), M. Luby (ICSI), N. Nisan (Hebrew U.), M. Naor (Weizmann), R. Paturi (UCSD), T. Pitassi (U. Arizona), P. Pudlak (Math. Inst. Prague), R. Raz (Weizmann), S. Rudich (CMU), K. Sako, M. Saks (Rutgers), J. Sgall (Math. Inst. Prague), A. Urqhart (U. of Toronto), A. Wigderson (Hebrew U.), and T. Yamakami (Princeton) GRADUATE STUDENTS • 1 Post-doctoral fellow: Toniann Pitassi, (now at U. of Arizona, co-advised with C. Papadimitriou) • 3 Ph. D.'s : Markus Jakobsson (Ph.D. UCSD, 1997; now at Bell Labs), Anatasios Dimitriou (Ph. D., UCSD, 1996; now at CTI, U. of Patras), Goran Gogic (Ph. D., UCSD, 1996; co-advised with C. Papadimitriou, U.C. Berkeley) • 1 M.S.: Thomas Tillinghast (M.S.,UCSD, 1997) • 2 current Ph. D. advisees, UCSD: Giovanni Di Crescenzo and Theodore Carson GRADUATE ADVISOR • Manuel Blum (U.C. Berkeley)
79
P. VENKAT RANGAN (Co-Principal Investigator) Professor, Department of Computer Science and Engineering, University of California, San Diego CONTACT INFORMATION • Postal: Dept. of Computer Science and Engineering, U.C. San Diego, La Jolla, CA 92093-0114 • Telephone: (619) 534-5419 (office), (619) 534-7029 (fax) • Internet: [email protected] (email), http://www-cse.ucsd.edu/users/venkat (www) BRIEF BIOGRAPHY Dr. P. Venkat Rangan founded the Multimedia Laboratory at the University of California, San Diego. Dr. Rangan and the UCSD Multimedia Laboratory are well known for research contributions in the areas of: • Multimedia on-demand Servers • Media Synchronization • Multimedia Mixing and Communication Multimedia Collaboration Dr. Rangan's research results have been reported in numerous publications in leading conferences and journals. He has also carried out experimental implementations, has demonstrated digital video on-demand servers and tele-conferencing systems over a high-speed metropolitan area network, and holds two patents for optimal video on-demand delivery systems for metropolitan area networks. His research has been supported by the NSF, a consortium of industrial sponsors, and California's MICRO program. EDUCATION • Ph.D. in Computer Science, University of California, Berkeley, 1988. • B.Tech., Indian Institute of Technology, Madras, India, 1984 EMPLOYMENT • Professor, University of California, San Diego, 1996 to present • Associate Professor, University of California, San Diego, 1994 to 1996 • Assistant Professor, University of California, San Diego, 1989 to 1994 AWARDS • • • •
ACM Fellow (1997) NYI: NSF Young Investigator Award (1993) NCR Research Innovation Award (1991) President of India Gold Medal (1984)
80
SERVICE • Program Chairman, 1996 International Conference on Multimedia • Program Chairman of ACM Multimedia '93: First International Conference on Multimedia • Member of Program Committees of IEEE Multimedia, VLDB (Very Large Data Bases), ACM Sigmetrics, SPIE, IWACA, DMSA, Photonics, Networks, MDMS, TINA, Visual Information Systems, MMSD, NOSSDAV, and Digital Libraries, to name a few • Editor-in-Chief and Managing Editor of the ACM/Springer-Verlag journal: Multimedia Systems • Member of Editorial Board of IEEE Multimedia, MTA, IJAST, IEEE Network, JOCS, Interactive Multimedia, and Morgan-Kaufman multimedia series • Multimedia Technology Advisor to the Secretary, Federal Department of Electronics (DOE), Government of India • Visiting professor, Supercomputer Education and Research Center, Indian Institute of Science • Program Chairman, 1997 Indo-US Bilateral Conference on Multimedia PATENTS • P. Venkat Rangan, “System for Efficient Delivery of Multimedia Information,” December 1996 • Christos H. Papadimitriou and P. Venkat Rangan, “System for Multimedia Information Delivery,” January 1997 RELATED PUBLICATIONS • P. Venkat Rangan, Srihari Sampath-Kumar, and P. Sreeranga Rajan, “Continuity and Synchronization in MPEG,” IEEE Journal on Selected Areas in Communications, special issue on Multimedia Synchronization, January 1996. • V. Shastri, V. Rajaraman, H. Jamadagni, and P. Venkat Rangan, “Design Issues and Caching Strategies for CDROM-based Interactive Multimedia Applications,” Proceedings of SPIE: Multimedia Computing and Networking, San Jose, California, January 29-31, 1996. • Srinivas Ramanathan and P. Venkat Rangan, “System Architectures for Personalized Multimedia Services,” IEEE Multimedia Magazine, Vol. 1, No. 1, February 1994, Pages 37-46. • Srinivas Ramanathan, Harrick M. Vin, and P. Venkat Rangan, “Towards Personalized Multimedia Dial-up Services,” Computer Networks and ISDN Systems, 1994, Pages 1305-1322. • Jim Gemmell, Harrick M. Vin, Dilip Kandlur, P. Venkat Rangan, and Larry Rowe, “Multimedia Storage Servers: A Tutorial and A Survey,” IEEE Computer, Vol. 28, No. 5, May 1995, Pages 40-51. OTHER PUBLICATIONS • Srinivas Ramanathan, P. Venkat Rangan, Harrick M. Vin, and Srihari SampathKumar, “Enforcing Application-Level QoS by Frame-Induced Packet Discarding in Video Communications,” Computer Communications, special issue on Multimedia Systems, 1995. • Srinivas Ramanathan and P. Venkat Rangan, “Adaptive Feedback Techniques for Synchronized Multimedia Retrieval over Integrated Networks,” IEEE/ACM Transactions on Networking, Vol. 1, No. 2, April 1993, Pages 246-260.
81
• M. Vin and P. Venkat Rangan, “Designing a Multi-User HDTV Storage Server,” IEEE Journal on Selected Areas in Communication, Vol. 11, No. 1, January 1993, Pages 153-164. • P. Venkat Rangan, Srinivas Ramanathan, and Srihari SampathKumar, “Feedback Techniques for Continuity and Synchronization in Multimedia Information Retrieval,” ACM Transactions on Information Systems, Vol. 13, No. 2, April 1995, Pages 145-176. • P. Venkat Rangan and Harrick M. Vin, “Efficient Storage Techniques for Digital Continuous Multimedia,” IEEE Transactions on Knowledge and Data Engineering - Special Issue on Multimedia Information Systems, Vol. 5, No. 4, August 1993, Pages 564-573. RECENT COLLABORATORS • none that are not already indicated as co-authors above GRADUATE STUDENTS • Ph.D. students: Harrick Vin, Srinivas Ramanathan • M.S. students: John Lindwall, Eric Pilmore, Brian Steuer, Dong-Young Oh, Art Villaneuva, Jon Garrett GRADUATE ADVISOR • Domenico Ferrari (U.C. Berkeley)
82
H.2 Biographies of Faculty Investigators Scott B. Baden is an Associate Professor of Computer Science and Engineering at the University of California, San Diego and is also a Senior Fellow at the San Diego Supercomputer Center. He received the B.S. degree magna cum laude in electrical engineering from Duke University in Durham, North Carolina in 1978, and the M.S. and Ph.D. degrees in computer science from the University of California, Berkeley, in 1982 and 1987, respectively. He was a post-doc in the Mathematics Group at the University of California’s Lawrence Berkeley Laboratory between 1987 and 1990. Dr. Baden’s current research interests are in software support for high perfor mance and scientific and parallel scientific computation: programming methodology, adaptivity techniques, load balancing, and performance. Dr. Baden has developed software infrastructure for adaptive applications, including the two software tools KeLP and LPARX. Rik Belew, Co-Principal Investigator, see page 74. Mihir Bellare is an Associate Professor in the Department of Computer Science and Engineering at UCSD, specializing in cryptography, computer security, and complexity theory. Mihir received his BS (in Mathematics) from the California Institute of Technology in 1986, and his Ph.D (in Computer Science) from the Massachusetts Institute of Technology in 1991. From 1991 to 1995 he was at the IBM T.J. Watson Research Center, Hawthorne, New York. He joined UCSD in 1995. He is a recipient of a 1996 Packard Foundation Fellowship in Science and Engineering, an NSF CAREER award, an IBM Outstanding Technical Achievement award, and an IBM outstanding innovation award. He has published about 30 papers in cryptography and 15 in complexity theory. His main current interest is the application of provable security to the design of practical protocols. Several of his protocols have become industrial or Internet standards. Francine Berman is Professor of Computer Science and Engineering at U. C. San Diego and a Senior Fellow at the San Diego Supercomputer Center. Her research focuses on metacomputing, and programming environments and tools which support high-performance computing. Dr. Berman served as Program Chair of the 1995 Heterogeneous Computing Workshop and has participated as a member of numerous Conference Committees. Dr. Berman serves on the External Advisory Committee for the Center for Research in Parallel Computation, the Review Committee for the Math and Computer Science Division at Argonne National Laboratory, the Steering Committee for the CRA Committee on the Status of Women in Computer Science and Engineering, and on the editorial boards of IEEE Transactions on Parallel and Distributed
83
Computing, the Journal of Parallel and Distributed Computing, SIAM Review, and as Area Editor for Metacomputing at The Journal of Supercomputing. Walter A. Burkhard, Professor of Computer Science and Engineering, founded and directs the Gemini Storage Systems Laboratory. Students and faculty of the Gemini Laboratory have carried out research in the area of replicated file and redundant array of independent disks data organizations; recent work has centered on efficient video movie delivery from disk arrays. Other noteworthy results include dynamic voting algorithms and witness copies within replicated files as well as the use of MDS error correcting codes in disk array data organization to obtain improved storage reliability. The segmented information scheme SID for video servers provides excellent fault-free and degraded performance and efficient use of storage resources. Recent work on disk array declustering holds promise of excellent performance during both fault-free and degraded modes of operation. He has served as visiting scientist at the IBM Almaden Research Center in San Jose. Dr. Burkhard served as founding chairman of the UCSD Computer Science and Engineering Department. He is an IEEE senior member. Brad Calder is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, San Diego. His research interests include compilers, computer architecture, and the interaction between compilers, architecture and systems. Brad was a Principal Engineer at Digital Equipment Corporation’s Western Research Lab from August 1995 through December 1996. He graduated from the Department of Computer Science at the University of Colorado, Boulder with a PhD in Computer Science in 1995. In addition, he graduated from the University of Washington with a BS in Computer Science in 1991, and a BS in Mathematics in 1991. Larry Carter received the A.B. degree from Dartmouth College in 1969 and the Ph.D. degree from the University of California at Berkeley in 1974. He worked at IBM’s T.J. Watson Research Center for nearly 20 years in the areas of probabilistic algorithms, compilers, VLSI testing, and high-performance computation. In September 1994, Dr. Carter took a joint appointment as a Professor in the CSE Department at UCSD and a Senior Fellow at the San Diego Supercomputing Center. He has over 50 conferences and journal publications, holds three U.S. patents, and is the co-inventor of universal hashing and the parallel memory hierarchy model of computation. His current research interests include scientific computation, performance programming, parallel computation, and machine and system architecture for high-performance computing.
84
Chung-Kuan Cheng received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, and the Ph.D. degree in electrical engineering and computer sciences from University of California, Berkeley in 1984. From 1984 to 1986 he was a senior CAD engineer at Advanced Micro Devices Inc. In 1986, he joined the University of California, San Diego, where he is currently a Professor in the Computer Science and Engineering Department. He is an associate editor of IEEE Trans. on Computer Aided Design since 1994. He is a recipient of the best paper award, IEEE Trans. on Computer-Aided Design, 1997. His research interests include network optimization and design automation on microelectronic circuits. Garrison W. Cottrell is Professor of Computer Science and Engineering at the University of California, San Diego. He obtained the BS degree in Mathematics and Sociology from Cornell University in 1972, and MS and PhD degrees from the University of Rochester in 1981 and 1985, respectively. After a postdoctoral position with David Rumelhart at UCSD, he joined the CSE Department there. His research is strongly interdisciplinary. It concerns using neural networks as a computational model applied to problems in cognitive science and artificial intelligence, engineering and biology. He has had success using them for such disparate tasks as modeling how children acquire words, studying how lobsters chew, and adaptive information retrieval. Flaviu Cristian is Professor of Computer Science and Engineering at the University of California, San Diego. He received his PhD from the University of Grenoble, France, in 1979. After carrying out research in operating systems and programming methodology in France and working on the specification, design, and verification of fault-tolerant software in England, he joined IBM Research in 1982. While at IBM, he worked in the area of fault-tolerant distributed systems and protocols. His leadership in the design of the Advanced Automation System, a real-time faulttolerant distributed system for Air Traffic Control was recognized by the highest IBM technical award: a Corporate Award in 1989. After joining UCSD in 1991, he founded the Dependable Systems Laboratory where he and his collaborators design and build support services for providing high availability in distributed systems. Charles Elkan is Associate Professor of Computer Science and Engineering, and his main research interests are in artificial intelligence. With students and colleagues, he has worked recently on algorithms for DNA and protein sequence analysis, data mining methods for business applications, methods of formalizing commonsense knowledge, and other topics. A unifying theme in this work is scalability through parallelism. A Cray T3E version of the MEME software developed by Prof. Elkan and his Ph.D. student Timothy Bailey is now used daily by biologists worldwide to find motifs in large DNA and protein datasets. Among other earlier 85
work, a paper with his Ph.D. student A. Hekmatpour entitled Categorization-Based Diagnostic Problem Solving in the VLSI Design Domain won the best paper award at the 1993 IEEE Conference on Artificial Intelligence for Applications. This year Prof. Elkan’s BNB software took first place out of 45 entries in the data mining competition at the International Conference on Knowledge Discovery in Databases (KDD’97). Jeanne Ferrante, Co-Principal Investigator, see page 76. Joseph Goguen is Professor of Computer Science & Engineering at the University of California, San Diego, and Director of its Program in Advanced Manufacturing ($1.2M from NSF) and Meaning and Computation Lab, from 1996. >From 1988, he was the Professor of Computing Science, Fellow of St. Anne’s College, and Director of the Centre for Requirements and Foundations at Oxford University. Previously he was Senior Staff Scientist at SRI International and a Senior Member of the Center for the Study of Language and Information at Stanford; and before that a full professor at UCLA; he has also taught at Berkeley and Chicago, and held fellowships at IBM Research and Edinburgh University. Professor Goguen’s Bachelor degree is from Harvard and PhD from Berkeley, and he has been a distinguished lecturer at the Universities of Syracuse and Texas. He is best known for his research on abstract data types, requirements, and specification, and is now also working on distributed cooperative work, user interfaces, and a large ($13.2M) Japanese national project building a environment for his OBJ specification language. William Griswold is an Associate Professor in the Department of Computer Science and Engineering at the University of California, San Diego. He received his Ph.D. in Computer Science from the University of Washington in 1991, and BA in Mathematics from the University of Arizona in 1985. He is a member of the program committee for the International Conference on Software Engineering in 1997 and 1998. His research interests include software evolution and design, compiler technology, and programming languages. William Howden has been a faculty member at UCSD since 1974, where he is a full Professor in the Department of Computer Science and Engineering. He has published journal papers in the areas or artificial intelligence, system design, program understanding, and software testing and analysis. His contributions to testing and analysis include substantial work in symbolic evaluation, fault based testing, functional testing, statistical testing, and static analysis. His current work involves the development of methods for providing formal measures of confidence in a software system. Dr. Howden has published numerous journal and conference articles, was the co-editor of the best selling tutorial Software Testing and Validation Techniques, and was the
86
author of the book Functional Program Testing and Analysis. He has served on national and international committees and is a founding member of the IFIPS working group on Dependable and Fault Tolerant Computing. T. C. Hu has been a faculty member at UCSD since 1974, where he is a full Professor in the Department of Computer Science and Engineering. He received the M.S. degree from University of Illinois in 1956, and the Ph.D. degree from Brown University in 1960. During 1960-66, he was a Research Mathematician at IBM Research, and during 1966-74, he was a Professor at the University of Wisconsin. His interests are in networks, operations research, and VLSI circuit layout. Russell Impagliazzo, Co-Principal Investigator, see page 78. Ramesh Jain is currently a Professor of Electrical and Computer Engineering, and Computer Science and Engineering at University of California at San Diego. His current research interests are in multimedia information systems, interactive video, image databases, machine vision, and intelligent systems. He was the founder and the Chairman of Imageware Inc and is the founding chairman of Virage. Currently he is the President and CEO of Praja Inc., a company to commercialize MPI Video technology. Ramesh is a Fellow of IEEE, AAAI, and Society of PhotoOptical Instrumentation Engineers, and member of ACM, Pattern Recognition Society, Cognitive Science Society, Optical Society of America, and Society of Manufacturing Engineers. He has been involved in organization of several professional conferences and workshops, and served on editorial boards of many journals. Currently, he is the Editor-in-Chief of IEEE Multimedia, and is on the editorial boards of Machine Vision and Applications, Pattern Recognition, ACM/Springer Journal of Multimedia Systems, Multimedia Tools and Applications Journal, Journal of Digital Libraries, and Image and Vision Computing. He received his Ph.D. from IIT, Kharagpur in 1975 and his B.E. from Nagpur University in 1969. Keith Marzullo is an Associate Professor in the Computer Science and Engineering Department at the University of California, San Diego. Professor Marzullo has also been an assistant and associate professor at Cornell University in the Computer Science department and a member of the technical staff at Xerox Palo Alto. His research interests are in fault-tolerant distributed systems. His work has found its way into industrial standards and into commercial products, and he has consulted for several visible projects including the IBM Air Traffic Control System. Professor Marzullo is also an associate editor for IEEE Transactions on Software Engineering.
87
Alex Orailoglu received the S.B. degree from Harvard College, cum laude in Applied Mathematics, in 1977. He has received the M.S. degree in Computer Science from the U. of Illinois, Urbana, in 1979 and the Ph.D. degree in Computer Science, from the U. of Illinois, Urbana, in 1983. Prof. Orailoglu has been a member of the faculty of the Computer Science and Engineering department at the University of California, San Diego since 1987. Prof. Orailoglu has published more than 50 papers in the areas of Computer-Aided Design, Test and Fault Tolerance. Prof. Orailoglu leads research efforts at UC San Diego in the area of Synthesis of VLSI Designs. His research interests include Design Automation of Self-Testable VLSI Designs, and CAD of Fault-Tolerant ICs. Yannis Papakonstantinou is an Assistant Professor in the Computer Science and Engineering department at University of California, San Diego. He received a BS in electrical engineering from the National Technical University of Athens, Greece, in 1990. From Stanford University he received in 1994 a MS and in 1997 a PhD in Computer Science. His main focus is the integration of heterogeneous information sources. His research interests also include query processing issues in multimedia and on-line analytical processing. Joseph Pasquale, Principal Investigator, see page 71. George C. Polyzos is an Associate Professor of Computer Science and Engineering at UCSD. He has received the Diploma in Electrical Engineering from the National Technical University, Athens, Greece and the M.A.Sc. in Electrical Engineering and the Ph.D. in Computer Science from the University of Toronto. His research interests include communication network design and computer and communications systems performance evaluation. His current research focuses on wireless multimedia networks and efficient multicast. He has published papers in the areas of random access protocols, local and metropolitan area networks, supercomputer I/O characterization, Internet traffic and performance characterization, network and operating system support for multipoint multimedia communications, and efficient multicast routing for delay sensitive applications. He has served as guest editor for IEEE JSAC and Computer Networks and ISDN Systems and on program committees for many conferences and workshops. He was area TPC member in charge of Multimedia Networking for IEEE INFOCOM’97. P. Venkat Rangan, Co-Principal Investigator, see page 80. J.B. Rosen is Professor Emeritus, Computer Science & Engineering, University of Minnesota, and Adjunct Professor, Computer Science and Engineering, UCSD. He is also a Senior Fellow at SDSC. He received his PhD in Applied Mathematics from Columbia University in 1952. He
88
has done extensive research on large-scale numerical nonlinear optimization methods and applications. His current interests include parallel algorithms for global optimization, with application to molecular structure prediction using simplified continuous energy functions to model the molecules. He is also doing research on structure preserving approximation and parameter estimation in nonlinear systems, based on minimizing the residual error in the one, two or infinity norm. A third area of current research is the numerical optimal control and parameter estimation in systems described by nonlinear parabolic PDEs. This research is currently supported by two NSF grants. He has supervised the research of 20 PhD students in Computer Science, and published over 100 papers. Dean Tullsen is an Assistant Professor in the Computer Science and Engineering department at University of California, San Diego. His research in computer architecture includes simultaneous multithreading, novel branch architectures, compiling for multithreaded and other highperformance processor architectures, and cache and memory subsystem design. He received a 1997 CAREER award from NSF. He received his PhD from University of Washington in 1996 on simultaneous multithreading, aided by fellowships from Intel, Microsoft, and the Computer Measurement Group. He also has an M.S. from UCLA and spent four years as a computer architect for Bell Labs. Victor Vianu received his PhD in Computer Science from USC in 1983. Since then, he has been on the faculty of UC San Diego and is now Professor of Computer Science. His interests include database theory, logic and complexity, and knowledge bases. His most recent research focuses on querying globally distributed data, spatial databases, and active databases. Vianu’s publications include over 60 refereed research articles and a graduate textbook on database theory. He has given numerous invited talks and served as General Chair of SIGMOD and Program Chair of the PODS conference. Bennet S. Yee is an Assistant Professor in the Department of Computer Science and Engineering at the University of California at San Diego. He received his B.S. degrees in Computer Engineering and in Mathematics from Oregon State University, and his Ph.D. degree in Computer Science from Carnegie Mellon University. Dr. Yee's research interests are in electronic commerce, computer security, cryptography, and operating systems. His security work is wide ranging and includes postal indicia/revenue security, security architecture design, cryptographic protocol design, and operating system security. His work on secure coprocessors has opened a new architectural sub-arena, and it has drawn considerable interest from IBM, Microsoft, National Semiconductor, and other companies. Dr. Yee is the recipient of a Faculty Development Award from National Semiconductor Corp. 89
I. RESULTS FROM PRIOR AWARDS
UCSD has never received a prior award for an NSF CISE Research Infrastructure Grant, consequently there are no results to report.
90