and TCAD that allows users to access and run existing software tools via standard ... The current system configuration uses a dedicated HP-9000/C110 for the front-end, and distributes most runs among approximately ten shared compute-servers .... Resource monitoring research is geared towards the design of a scalable ...
PUNCH: A Software Infrastructure for Network-Based CAD Nirav H. Kapadia, Mark S. Lundstrom, and Jose´ A. B. Fortes {kapadia, lundstro, fortes}@purdue.edu 1. Introduction PUNCH, the Purdue University Network Computing Hubs, is an infrastructure for network-based VLSI CAD and TCAD that allows users to access and run existing software tools via standard world-wide web browsers. Tools do not have to be written in any particular language, and access to source-code is not required. The PUNCH infrastructure is geographically distributed, but this is transparent to users, who can run tools wherever they reside. PUNCH can be logically divided into multiple discipline-specific ‘‘hubs’’ (see Figure 1). Currently, there are four hubs that contain tools for semiconductor technology, VLSI design, computer architecture, and parallel processing. A fifth hub is devoted to tools that were developed with the support of the Semiconductor Research Corporation. These hubs contain over thirty tools from eight universities and four vendors, and serve more than 500 users from Purdue, across the US, and in Europe. The current system configuration uses a dedicated HP-9000/C110 for the front-end, and distributes most runs among approximately ten shared compute-servers located at Purdue, Illinois at Urbana-Champaign, Maryland, and Texas at Austin. During the past three years, PUNCH users have logged approximately one million hits and have performed over fifty thousand simulations. PUNCH can be accessed at ‘‘http://www.ecn.purdue.edu/labs/punch/’’; courtesy accounts are available. Our talk will focus on the system architecture, design philosophy, functionality, experiences, and the ways in which the project can be leveraged by the SRC community.
Figure 1: The Purdue University Network Computing Hubs: home page and list of hubs.
2. Motivation One of the key results of SRC projects such as ours is the development of software for use in the semiconductor industry. We have observed several barriers to the transfer of university software to industry. One is simply that engineers in industry must become aware of the software and of its capabilities. Technology-transfer short-courses can help, but they only reach a small portion of the intended audience, and it is difficult to follow through after the course. Also, the tools may only be needed on a short-term basis for solving problems of immediate concern. At such a point, there is rarely time to acquire, install, and learn to operate a universitydeveloped tool. The effort required to install a tool and learn how to use it can be justified only if an engineer in industry is convinced of its usefulness. Even then, the software must be continually maintained and updated as the code develops and as computer systems evolve. In many cases, much of this effort could be avoided if the software could be easily accessed and operated remotely. With the advent of ubiquitous networks, the model of each individual user having his/her own copy of the software installed on his/her machine can be replaced by a model in which specialized software is installed and maintained at specific sites and accessed through the network. We believe that this model has the potential to encourage collaboration and promote the use of university-developed software. Many of the systems that currently allow computing on the web target specific tools. Such solutions tend to be non-reusable in spite of the fact that they involve a significant amount of duplicated effort. From this perspective, it is highly desirable to separate the network-computing infrastructure from the tools with a view to create a generic core that can be reused for a large class of tools. Functionally, this is equivalent to designing a multi-user operating-system for networked resources that provides user-transparent file, process, and resource management functions, handles security and access-control across multiple administrative domains, and manages state information (session management). PUNCH currently provides these services for tools with text-based and graphical (currently limited to X-based tools) user-interfaces. Over the years, we have found it to be an extremely useful resource for students and collaborators, and a highly flexible testbed for network-computing research. 3. PUNCH From a user’s perspective, PUNCH is a WWW-accessible collection of simulation tools and related information (see Figure 2). It allows geographically dispersed tools to be indexed and cross-referenced, and makes them available to users world-wide. The infrastructure hides all details associated with the remote invocation of tools from its users. Functionally, PUNCH allows users to: a) upload and manipulate input-files, b) run programs, and c) view and download output - all via standard WWW browsers. It provides a context-sensitive help facility that assists users in the use of the tools and the infrastructure itself. Access to information and resources via PUNCH can be personalized and/or restricted according to user-specific needs and access-rights. Finally, the use of artificial intelligence technology allows PUNCH to predict run-times for tools. This information is primarily used for ondemand resource-management (e.g., longer runs are routed to faster machines). A detailed description of PUNCH is available in [1, 2, 3]. Running a typical simulation on PUNCH is a three-step process. The first step involves the creation of the input file(s) required for the relevant simulation. In the second step, users define the input parameters (e.g., command-line arguments, etc.) for the program and start the simulation. Finally, after the simulation is complete, users can see, post-process (see Figure 2), and download the results via the web-based front-end. PUNCH runs programs in a ‘‘background’’ mode by default. This means that the user’s browser-window is freed up as soon as the run has been successfully initiated. The PUNCH infrastructure can be divided into two parts (see Figure 3). The front-end primarily deals with data management and user-interface issues. It allows users to interact with the network-computing infrastructure (SCION) via standard web browsers, and generates customized views of available resources for each class of users. The ‘‘hub engine’’ serves as PUNCH’s user-transparent middleware. It consists of a collection of hierarchically distributed servers that co-operate to provide on-demand network-computing. This part of the infrastructure addresses the following issues: management of the run-time environment, security, control of resource access and visibility, and demand-based scheduling of available resources. The infrastructure can support arbitrary hardware and software resources (the current implementation provides limited supported for GUI-based tools). PUNCH allows tools to be organized and cross-referenced
Figure 2: The Silicon Lab on the Semiconductor Simulation Hub; and a sample output post-processor for a device simulator. The plot shows the electron concentration profile for a simulated MOSFET.
according to their domain. Resources can be added incrementally using a resource-description language specifically designed to facilitate the specification of tool and machine characteristics. For example, a new machine can be incorporated into PUNCH simply by specifying its architecture (make, model, operating system, etc.) and starting a server on it. Similarly, a new tool can be added by ‘‘telling’’ PUNCH the tool’s location, its input behavior (e.g., command-line arguments), what kinds of machines it can run on (e.g., Ultra-Sparc), and how it fits into the logical organization of software resources (e.g., device simulation tool). The web pages needed for the tool are generated automatically from HTML templates. Each of these tasks is typically accomplished in less than thirty minutes. 4. Accomplishments The earliest implementation of PUNCH was operational in April 1995. Since Fall 1996, PUNCH has been used to provide access to tools in several undergraduate and graduate courses, including a distance-education course at Purdue, a course each at Berkeley and Illinois at Chicago, and another in Israel. In addition, PUNCH was successfully employed for a four-university technology-transfer short course (May 1997) for the SRC. The system has found increasing usage over the past few months; April 1998 alone accounted for 163,302 hits and 12,979 runs. Results from user-surveys indicate that the system performs well under the highly peaked usage patterns (very high usage in the hours before homeworks and projects are due) characteristic of an academic environment. (The current system configuration uses a dedicated HP-9000/C110 for the front-end, and distributes most runs among approximately ten shared compute-servers located at Purdue and other universities.) 5. Research Issues Current PUNCH-related network-computing research topics include demand-based scheduling, resource monitoring, metacomputing and resource-aggregation, fault-management, and user-interface design. Work on
Figure 3: Organization of the PUNCH infrastructure.
scheduling is aimed at exploring the impact of a demand-driven environment on existing scheduling policies. (PUNCH allows on-demand management of existing software and hardware resources by delaying the binding of a user’s command to a specific implementation and machine until run-time, at which point the requirements of the given run can be analyzed.) Resource monitoring research is geared towards the design of a scalable system that can monitor and predict the availability and reliability characteristics of a large number of resources - the goal is to allow a demand-driven scheduler to request resources with specific characteristics (e.g., an Ultra-SPARC with 250MB memory that is likely to be free for the next 30 minutes). Metacomputing work is geared towards allowing users to logically ‘‘chain’’ distributed tools. Work on fault-management involves the ability to detect abnormal conditions and automatically take corrective actions. Finally, work on user-interface design is aimed at making network-computing transparent to end-users and tool installers, and providing support for tools with graphical interfaces (a proof-of-concept implementation has already been demonstrated). 6. Conclusions The PUNCH infrastructure has been successfully implemented and applied to education, research, and technology-transfer. Over the years, we have found it to be an extremely useful resource for students and collaborators, and a highly flexible testbed for network-computing research. The ideas and solutions presented in this paper are based on (and validated by) our experiences in scaling PUNCH from a research project to a ‘‘live’’ system that is regularly used by several hundred students each semester. Acknowledgements. The development of PUNCH was partially funded by the AT&T Foundation, the National Science Foundation under grants DMR-9400415, CDA-9617372, EEC-9700762, and MIPS-9500673, and by equipment grants from Intel (under the Technology for Education 2000 program), Microsoft, and the SRC.
References [1] Nirav H. Kapadia, Carla E. Brodley, Jose´ A. B. Fortes, and Mark S. Lundstrom. ‘‘Resource-Usage Prediction for DemandBased Network-Computing’’. In Proceedings of the Workshop on Parallel and Distributed Systems (APADS). October 1998. West Lafayette, Indiana. To appear. [2] Nirav H. Kapadia and Jose´ A. B. Fortes. ‘‘On the Design of a Demand-Based Network-Computing System: The Purdue University Hubs’’. In Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing. July 1998. Chicago, Illinois. To appear. [3] Nirav H. Kapadia, Jose´ A. B. Fortes, and Mark S. Lundstrom. ‘‘The Semiconductor Simulation Hub: A Network-Based Microelectronics Simulation Laboratory’’. In Proceedings of the 12th Biennial IEEE University Government Industry Microelectronics Symposium. July 1997, pages 72-77. Rochester, New York.