technical issues for our profession. Unfor- tunately, many people first heard of reverse engineering in the context of piracy: Programmers would disassemble.
THE PROFESSION
Reverse Engineering and the Computing Profession Cristina Cifuentes, Sun Microsystems Laboratories
R
everse engineering—the process of analyzing an existing system to identify its components and interrelationships so that we can create a representation of the system at a higher abstraction level— raises complicated legal, ethical, and technical issues for our profession. Unfortunately, many people first heard of reverse engineering in the context of piracy: Programmers would disassemble a third-party software program’s object code, then copy those instructions into their own software, catching a free ride on someone else’s effort. Yet how realistic is this scenario today, when programs sprawl across hundreds of megabytes? If we disassembled Microsoft Word, for example, we would get thousands or millions of instructions and endless reams of data. Which of those millions of instructions would we want to copy? And how do we guarantee that the code integrates well into our program? It may often be simpler to study the program’s behavior and write suitable code to implement it.
SOFTWARE REVERSE ENGINEEERING Although nearly any manufactured product can be reverse engineered, the practice generates the most controversy when it targets software. Developers often go through a process of understanding and refining the software’s source base to write a new piece of code and debug a program, whether in their own source base or that of a colleague’s. When making large changes to a system, managers evaluate existing software 168
Computer
• the software doesn’t function as anticipated. These examples imply both high- and low-level reverse engineering. High-level reverse engineering refers to abstracting design, architecture, or documentation from source code. Low-level reverse engineering refers to abstracting source code—whether in assembly or high-level form—from object code or assembly code, and thus the disassembly or decompilation of that code. In essence, reverse engineering involves program comprehension, program trans-
Recent actions against reverse engineering threaten to remove this valuable practice from the computing profession’s toolkit.
to determine the interrelationships between different parts of that system. When purchasing a software company, investors evaluate software to determine its worth and maintainability in the long run. When buying a database for a business need, a company evaluates the available options to determine their maintainability and scalability. Therefore, many aspects of software development and evaluation deal with some form of program comprehension and thus reverse engineering. Practical reasons for reverse engineering include the following: • the original programmers have long since departed, • the developers wrote the application in an obsolete language and it must now be migrated to a newer one, • the system lacks documentation, • the business relies on software that no one understands, • the company acquired the program as part of a corporate acquisition and thus lacks access to all the source code, • a program requires adaptations or enhancements, or
formation, and information abstraction. Most available tools aid program comprehension and visualization, or parse legacy languages or dialects of existing languages, for which the language’s specification may not even exist. Researchers in the Americas and Europe have devised several such tools, which the “State of the Art” sidebar describes.
REVERSE ENGINEERING AND COPYRIGHT A lawyer will tell you that reverse engineering consists of examining or pulling an article or piece of machinery apart to see how it works. Courts usually examine the legality of reverse engineering in the context of patents, asking if the reverse engineering of an article or machine infringes on a patent. Copyright protects literary and artistic works such as books, drawings, and musical compositions. For programs, copyright normally protects software. However, patents have grown in popularity recently—a trend that causes complications of its own, as Neville Holmes explored in “The Evitability of Software Patents” (Computer, Mar. 2000, pp. 30-34). Therefore, for software, the Continued on page 166
The Profession Continued from page 168
courts have had to examine the legality of reverse engineering in the context of copyright. In the early 1980s, copyright became the default form for protecting computer programs because such programs can be read. Copyright also covers object code even though such code cannot be read per se. Many countries have enacted specific legislation that extends copyright protection to computer programs as literary works. This practice has been adopted in two recent international treaties: • the 1993 agreement on TradeRelated Intellectual Property Rights, and • the 1996 World Intellectual Property Organization Copyright Treaty. Simply stated, copyright protects a program’s owners from the unauthorized reproduction and adaptation of that program. A reproduction is the creation of an exact copy, while adaptation creates a copy that does not duplicate the program exactly, such as an altered version of the original or its translation into another programming language.
Limits and exceptions Because the mere act of running a program in a computer creates a copy of it in memory, and this act technically infringes on the owner’s copyright, exceptions to copyright law have been estab-
lished so that such acts are not considered violations. Likewise, users usually have permission to make backup copies. The intermediate copying of a program while performing its disassembly has raised debate, however. In the seminal US case from the past decade—1992’s Sega v
Many countries have enacted specific legislation that extends copyright protection to computer programs as literary works. Accolade—the Ninth Circuit Court ruled that the fair use doctrine permitted the reverse engineering of Sega’s code to write compatible games. Specifically, the court ruled that fair use allowed the intermediate copying of the program into memory, and onto printouts and disks, while the reverse engineer strove to access the machine’s interface and write programs for the Sega machine. Since then, it has been widely held that the US courts permit the reverse engineering of software for interoperability purposes. In 1991, the European Union carefully drafted software copyright legislation for adoption by member countries. Specifically, Article 6 of the 1991 EU Directive on the legal protection of computer programs provides two exceptions that permit reverse engineering of computer programs:
• to let a product interoperate with another application or platform, or • to determine the source of a bug in some third-party code. These exceptions apply, however, only if the copyright owner has not made available in a reasonable amount of time the information that those doing the reverse engineering seek. In 2000, Australian copyright legislation incorporated these exceptions to reverse engineering, then added one for computer security testing—along the lines of that in the 1998 US’s Digital Millennium Copyright Act. The US Congress drafted the DMCA to deal with copyright in the digital world, enacting legislation to implement Article 11 of the WIPO Copyright Treaty. This article requires participating parties to effectively prevent circumvention of the technological measures authors use to preserve copyright. The parties must also outlaw circumvention of the technological protection measures that copyright owners use to prevent unauthorized copying. When it became apparent that this law would prevent encryption research and the security testing of computer programs and networks, the US added the security testing exemption to its legislation. Unfortunately, the wording of this exception has caused problems for bona fide researchers in the computer security area.
Sticking points Innovative technology for computer professionals
Circulation: Computer (ISSN 0018-9162) is published monthly by the IEEE Computer Society. IEEE Headquarters, Three Park Avenue, 17th Floor, New York, NY 100165997; IEEE Computer Society Publications Office, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314; voice +1 714 821 8380; fax +1 714 821 4010; IEEE Computer Society Headquarters,1730 Massachusetts Ave. NW, Washington, DC 20036-1903. IEEE Computer Society membership includes $14 for subscription of Computer magazine ($14 for students). Nonmember subscription rate available upon request. Single-copy prices: members $17.00; nonmembers $20.00. This magazine is also available in microfiche form. Postmaster: Send undelivered copies and address changes to Computer, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08855. Periodicals Postage Paid at New York, New York, and at additional mailing offices. Canadian GST #125634188. Canada Post Publications Mail (Canadian Distribution) Agreement Number 0487910. Printed in USA. Editorial: Unless otherwise stated, bylined articles, as well as product and service descriptions, reflect the author’s or firm’s opinion. Inclusion in Computer does not necessarily constitute endorsement by the IEEE or the Computer Society. All submissions are subject to editing for style, clarity, and space.
166
Computer
Many in the computer security community believe that the text used to frame the security-testing exception makes it very difficult to use, given that any findings uncovered through reverse engineering cannot be made available to others. For example, if a developer analyzes virus code and determines how to circumvent its techniques, that information cannot be shared. Even academic researchers, such as Edward Felten’s team—which sought to publish experimental results derived from reverse engineering digital watermarks on music—have received legal warnings from the SDMI not to publish those results. Some believe that by acting in this way, the SDMI prohibits free speech.
More aggressively still, Sony sued Connectix last year for reverse engineering and trademark dilution of Sony’s games when Connectix made a PlayStation emulator for the Mac environment. Emulators, popular and widely used since the 1960s, would seem safe from lawsuits 40 years later—and indeed this proved to be the case. Citing fair use, the Ninth Circuit Court held in favor of reverse engineering software to allow the running of different software products on different hardware platforms. The court explained that Connectix’s intermediate copying of Sony’s BIOS was a fair use for the purpose of gaining access to the unprotected elements of Sony’s software. If that software had been protected by a patent, however, the result might have been different.
SOCIAL AND PROFESSIONAL ISSUES The reverse engineering of computer software gives rise to many social and professional issues, most of which stem from the intellectual property system used to protect software. This system has generated considerable controversy, such as that which arose at a recent Santa Clara University symposium. During the one-day symposium, Donald Chisum wondered why—if we place such heavy restrictions on reverse engineering software—it’s okay to buy a TV, pull it apart, and reverse engineer it. On the other hand, Michael Lehmann questioned why we should limit the reverse engineering of software when the reverse engineering of any other literature has always been permitted. In reality, software is not a literary work per se, and therefore has become the first technology protected by both copyright and patents. This dual IP protection scheme has created some jarring inconsistencies that have led to confusion and sometimes paralyzing litigation. The emerging legalities of reverse engineering affect the profession by threatening or appearing to threaten customers and some researchers, causing an uncertainty that can lead to less innovation in the reverse engineering community. Computing professionals must educate the legal community about the technology itself, and it must be proactive in making comments about proposed legis-
State of the Art American and European researchers have developed several good high-level tools for reverse engineering. Tools such as Rigi (http://www.rigi.csc.uvic.ca), PBS (http://swag.uwaterloo.ca/pbs/), and GUPRO (http://www.gupro.de/) aid in program understanding and software architecture recovery. Other tools, such as SHriMP (http://www.csr.uvic.ca/shrimpviews), significantly contribute to understanding a large piece of software through different visualization techniques. Most tools focus on one aspect of reverse engineering. They may specialize in parsing code well, producing different types of graph views, or producing architecture diagrams in UML. Unlike program transformation tools such as compilers, which build an intermediate representation in memory and apply transformations to that representation internally, reverse engineering tools tend to cooperate with each other to support different parts of the reverse engineering process. The Graph eXchange Language (http://www.gupro.de/GXL/) has simplified reverse-engineering tool interoperability. GXL, based on XML, resulted from an international collaboration between researchers in academia and industry. An extensible language, it supports any graph-based data format. For example, you can describe a control flow graph or an abstract syntax tree in GXL. A key difficulty in using reverse engineering tools arises from their need to support a variety of languages or be capable of extension to support another language. Although building them has often proven to be a daunting task, tools have been successfully designed to convert Cobol code to C, and to translate C code to C++ or Java code. Some low-level reverse engineering tools have also been successful, including interactive commercial disassemblers such as IDA Pro (http://www.datarescue. com/idabase/ida.htm) and Sourcer (http://www.v-com.com/product/devsou1. html), which provide good quality assembly code for a variety of machines. During the Y2K crisis, a few companies provided decompilation services for Cobol binaries because many large organizations have vast legacy applications written in that language. Java decompilers have also been written—more easily than some other decompilers because the Java program’s binary format is not machine code but rather an intermediate representation called Java bytecodes. Otherwise, most decompilation techniques are supported manually: The engineer decompiles assembly code mentally and annotates the representation with its high-level equivalent.
lation that affects software in general. As the DMCA shows, much of the thinking behind this act derived from the entertainment industry’s influence, not the computing community’s. The legal community must educate the computing community in the meaning of the different intellectual property systems. It can clarify, for example, how copyright does not apply to each single byte of an object file but only protects parts of the program and certainly does not protect its ideas.
U
nless we develop better communication between the legal and computing professions, we will continue
to suffer the consequences of legislation such as the DMCA. By making the distribution of research results more difficult, such legislation stifles the sharing of ideas, causing an uncertainty that can lead to less innovation and fewer benefits to society. ✸ Cristina Cifuentes is a senior staff engineer at Sun Microsystems Laboratories. Contact her at Cristina.Cifuentes@sun. com. Editor: Neville Holmes, School of Computing, University of Tasmania, Locked Bag 1-359, Launceston 7250; neville.holmes@utas. edu.au
December 2001
167