Architectural Solution to Object-Oriented Programming - Springer Link

0 downloads 0 Views 178KB Size Report
this paper, an architectural solution to object-oriented programming in a Java ..... jHISC are based on its RTL model, and for JOP, they are obtained from [11] by.
Architectural Solution to Object-Oriented Programming Tan Yiyu1, Anthony S. Fong2, and Yang Xiaojian1 1

College of Information Science and Engineering, Nanjing University of Technology NO.5 Xinmofan Road, Nanjing, China [email protected] 2 Department of Electronic Engineering, City University of Hong Kong Tat Chee Avenue, Kowloon Tong, Hong Kong [email protected]

Abstract. Object-oriented programming has become a major trend in software development for large-scale software systems. However, the classic von Neumann architecture machines have certain limitations for object-oriented computing, such as system security and overhead. To address these limitations, architectural support on object-oriented programming has been introduced. In this paper, an architectural solution to object-oriented programming in a Java processor named jHISC is described, where a new object representation model is mapped into hardware directly and the object-oriented programming features is implemented through controlling the related fields in the object context. Moreover, the object representation model is designed to access object information in parallel to speed up object-oriented operation. Compared with PicoJava II, JOP, JDK1.5.0_05 interpreter and HotSpot JIT compiler, it has a great improvement on execution of Java programs. Keywords: Java, Java processor, Object-oriented programming.

1 Introduction Ever since the introduction of computer, hardware has become increasingly smaller, faster, and cheaper, whereas software has become larger, slower, and more expensive to build and maintain. Especially, with the rapid development of network and Internet, there is a high demand for developing highly reliable and easily maintainable application programs in a wide range of application domains. Object-oriented programming (OOP) has become firmly established as the methodology of choice for developing new systems due to its advantages, such as reusability, maintainability, flexibility and modularity. Object-oriented programming facilitates its advantages through introducing data encapsulation, information hiding, object inheritance, and polymorphism, thus it enhances the quality of software and reducing the software development cost, especially when amortized over several iterations. Currently, object-oriented programming has become a major trend in software development for large-scale software systems. L. Choi, Y. Paek, and S. Cho (Eds.): ACSAC 2007, LNCS 4697, pp. 387–398, 2007. © Springer-Verlag Berlin Heidelberg 2007

388

T. Yiyu, A.S. Fong, and Y. Xiaojian

Object-oriented programming is supported through compilation or virtual machine in classic von Neumann architecture machines. In the compilation approach, an application written in an object-oriented programming language, such as C++, is compiled into the executable native instructions. At the same time, a process will be created for the execution of the program. Different applications are executed in their own addressing spaces, and they are invisible from each other by using virtual memory system. Security protection mechanism is normally implemented with page or segment table, where access right information is maintained. In the virtual machine approach, a virtual machine is built on the top of operating system. The related object-oriented applications are executed through software emulation in the virtual machine. During execution, the virtual machine executes all the object operations, such as object creation, object communication, dynamic object linking, class loading, and so on. However, the classic von Neumann architecture machines have certain limitations for object-oriented computing. In the classic von Neumann architecture machines, a word can be treated as either an instruction or datum because no specific semantics associated with the contents of each word, therefore the system is easy to be attacked. To improve security, most object-oriented programming languages introduce data type and access right checks when data are accessed. However, the von Neumann architecture machines do not provide instructions to manipulate objects and simply retrieve the representations of objects from data. Therefore, although a secure objectoriented system is used, a machine-code programmer could directly access data and misinterpret or corrupt the secure objects. For example, viruses can access data directly through load/store instructions, cast them into memory addresses, and corrupt the host system by operating the data stored in the addresses. On the other hand, because hardware does not support data type and access right checks provided by object-oriented programming languages, such checks impose considerable overhead. For example, in the compilation approach, some code will be inserted to perform these checks, which will slow down the program execution. Although in the virtual machine approach, these checks can be performed by the virtual machine, two layers of software: virtual machine and operating system, introduce large overhead to system. Moreover, the object-oriented operations are executed through software emulation, which will slow down the execution. To address these limitations, architectural support on object-oriented programming has been introduced, where object operations were carried out directly by an objectoriented processor. Objects are managed by the object-oriented operating system with the protection features offered by the object-oriented processor. Therefore, object manipulation becomes more direct and secure. In this paper, architectural support on object-oriented programming in a Java processor is introduced. The rest of this paper is structured as follows. The previous related work on object-oriented processors is discussed in Section 2. Object representation model is described in Section 3. The implementation of object-oriented programming features is introduced in Section 4. The results of system performance estimation are presented in Section 5. Finally, conclusions are made in Section 6.

Architectural Solution to Object-Oriented Programming

389

2 Related Work Various solutions to architectural support on object-oriented programming have been provided in many previous machines. The Intel iAPX432 was the first commercial object-oriented architecture [1], which provided hardware support for data hiding, methods, inheritance, late binding and access protection. Despite these advanced features, it was sometimes from 2 to 23 times slower than an 8086 [2]. One reason is due to the architectural limitations, such as the lack of local data registers or a data cache, the fault-tolerant and asynchronous bus/memory interface which resulted in 25% to 40% of the access time consumed by wait state, and so on. Another main reason comes from object orientation, especially procedure calls and returns [3]. On the iAPX432, when an object is accessed, the capability specifier selects an access descriptor which contains access rights information and indices to retrieve the object descriptor from object tables. The object descriptor contains the base address and length of the referenced object [4]. The object-oriented operations are expensive because they need to maintain and traverse more complex addressing information obtained through table lookup, for example, a procedure call references memory 40 times and consumes 724 clock cycles altogether on the iAPX432 [3]. To improve the architectural support on object orientation, some techniques have been proposed. The SOAR architecture [5], which was based on RISC architecture and targeted Smalltalk programming language, employed register sets to hold data, and cached the destination addresses of objects to reduce table lookup during objectoriented operations. It also tagged words to distinguish integers and pointers to support generational garbage collection. The Caltech Object Machine (COM) [6], which was oriented to the late binding object-oriented programming languages, provided hardware method lookup and maintained addressing information in an associative context cache to speed up object-oriented operations. Moreover, an instruction translation look-aside buffer was used to translate a message name to a method address. The MUSHROOM architecture [7] absorbed some techniques proposed before, such as tagged memory and parallel tag-checking, register windows. In addition, it provided an object-based virtual memory system to support garbage collection and employed a novel object cache to maintain the real address information of object. REKURSIV processor [12] offered an object-oriented memory management unit to swap objects in and out of memory as needed because objects were represented directly and mapped into a persistent storage. There are many object-oriented processors for certain object-oriented programming languages. PicoJava II, developed by Sun Microsystems, targeted Java programming language and mainly performed object-oriented operations by software traps or microcode [8]. Anthony Fong proposed HISC architecture [13], which extended typical computer architecture to support object-oriented programming at the hardware level by introducing 128-bit operand descriptors to describe both object references and variables. In this paper, architectural support on object-oriented programming in a Java processor named jHISC is introduced. In jHISC, object is represented and mapped into hardware directly. And the object representation model is designed to access object information in parallel to speed up object-oriented operation.

390

T. Yiyu, A.S. Fong, and Y. Xiaojian

3 Object Representation Model An object consists of many fields in object-oriented programming system. Object representation model is critical because of its significant impact on the speed of accessing object. In general, to minimize the storage overhead, the object header is as small as possible and contains sufficient information about the object. Moreover, system should locate object fields quickly through an object reference. Inside an object, when a field is accessed, the base reference of the object firstly needs to obtain; then a field offset and some other information, such as access right, field type, etc., are needed. Once all the security and data type checks are passed, the field can then be accessed. All these depend on the object representation model and may be done serially or parallel. For example, in the stack-based implementation of a Java virtual machine, such as JDK1.0 and JDK1.1, they are done serially, which in turn affects the execution speed of Java programs. In jHISC, three kinds of contexts, namely instance, class, and method contexts, are mapped into the hardware architecture to represent different objects. The different object context structures and their relations are shown in Fig. 1.

Fig. 1. Different object structures and their relations

Except the object header (OH), an instance context includes Instance Header (IH) and Instance Data Space (IDS); a class context consists of Class Header (CH), Class Operand Descriptor Table (CODT), Class Property Descriptor Table (CPDT) and Class Data Space (CDS); a method context contains Method Header (MH), Method Code Space (MCS) and Local Variable Frame (LVF) for local variable storage. When an instance context is used to represent and array, it contains the Array data, which locates under the Instance Header. Inside the class context, CODT and CPDT store the class operand descriptors and class property descriptors, respectively. Different objects are recognized by the object header which format is shown in Fig. 2.

Architectural Solution to Object-Oriented Programming

391

ObjType [31:28]

ArrayType [27:25]

Lock [24]

Object Header (OH) GC Info [23:20]

DSSize [19:0] Class [31:0] ArraySize [31:0]

Fig. 2. Format of an object header

Inside the object header, the object type is stored in the field ObjType, such as method, instance, class and array; the field DSSize specifies the size of related data space, such as IDS, CDS; the field GCInfo stores information for hardware-based real-time garbage collections; the field Class holds a direct reference address to link an instance object with its affiliated class; the field Lock is used for multithreading; when the object is an array, the field ArraySize and ArrayType define the number and type of the elements in an array, respectively. Each object has a unique object context and a reference always points to the base address of object header after the object is resolved. In an object context, all components are stored continuously with each having a constant address offset to the object header, thus allowing the access of some components in parallel to reduce the access overhead. When an object is accessed, the related operand descriptor is read from the operand descriptor table to verify whether the object is resolved or not, then the specific object header is accessed through the direct address pointer stored in the CDS of current class. Along with the object accessing, the bound control checks, such as access permission, boundary and data type, are also carried out by hardware. Moreover, both class variables and instance variables are stored in the related data spaces, therefore they are accessed by their references directly and not accessed through an intermediate object handle as Sun’s JDK 1.0 and 1.1. 3.1 Descriptor Format In jHISC, 32-bit operand descriptor stores information about variables or references. Its uniform format is shown in Fig. 3, which consists of Address Field, Type Field, Static Flag, Access Modifier, Read-Only Flag, and Resolved Flag. Address Field provides byte offset to locate data in the corresponding data space. Access Modifier is used for security control, such as public, private, protect, and so on. TypeField stores the type of described data and eight types are defined for both primitive and reference types. Static Flag indicates where data are stored. For the non-static fields, their values are stored inside the Instance Data Space (IDS) while they are stored inside the Class Data Space (CDS) for the static fields. Read-Only Flag represents whether the target can be written or not. Resolved Flag indicates whether the reference is resolved or not. If not, the system will be trapped to the operating system routines for dynamic reference resolution.

392

T. Yiyu, A.S. Fong, and Y. Xiaojian

Fig. 3. Operand descriptor format

Two kinds of operand descriptors, class operand descriptor and class property descriptor, are defined to assert the resources accessed by the class and the properties owned by the class, respectively. Normally, a class operand descriptor contains the Address Field, Type Field and Resolved Flag while only the Resolved Flag are not included in a class property descriptor.

4 Implementation of Object-Oriented Programming Features Object-oriented programming has four key features distinguished with other programming paradigms, namely data abstraction and encapsulation, inheritance, and polymorphism. In jHISC, the object-oriented programming features are implemented by mapping the object representation model into hardware and controlling the corresponding fields into the object context, such as CPDT, CODT, CDS, etc. In this section, how to implement the key features of object-oriented programming is discussed. 4.1 Data Abstraction and Encapsulation Data abstraction denotes the essential characteristics of an object that distinguishes it from others. Data encapsulation hides all the implementation details of an object inside the class definition while presenting a well-defined interface to the outside world via the class’s methods. Moreover, data encapsulation provides the additional access control mechanism to ensure data to be accessed legally and safely. In jHISC, data are described by operand descriptors and their values or reference addresses are stored in the related data space (i.e. IDS, CDS, MCS and Array data). In a class object, its properties and the accessed resources are described by the class property descriptors and class operand descriptors, respectively. The references of other objects or variable values are stored in the class data space (CDS). For an instance object, the instance data are stored in the instance data space (IDS) directly. For an array object, the values of array elements are stored in the array data area. For a method object, the bytecode instructions are stored in the method code space (MCS). In the operand descriptors of an object, the field Access Modifier defines the access rights. Before a program accesses an object, it needs to pass the bound access control and data type checks. Unauthorized or malicious accesses are prohibited.

Architectural Solution to Object-Oriented Programming

393

4.2 Inheritance Inheritance allows a subclass to share the properties of its superclass to provide a mechanism for code sharing and reuse so that the programming development effort is reduced. A subclass may select which properties of its superclass to inherit. It may also extend its superclass by adding new properties and selectively overriding the existing properties of its superclass, which allows the subclass to be specialized and the superclass to be generalized [9][10]. In jHISC, variables and methods belonging to a class are described by the property descriptors, which reside in the CPDT of the class context. The inheritance feature will be implemented by appending the inherited properties from the superclass and the related addresses into the CPDT and CDS of the subclass context, respectively. In the example shown in Fig.4, the class ParentClass contains four integer variables, a, b, c, d and two methods, Method_A() and Method_B(). The class ChildClass extends from the class ParentClass. The corresponding object context structures are shown in Fig. 5. class ParentClass { public int a, b, c; public static int d; public void Method_A() { } public void Method_B() { } } class ChildClass extends ParentClass { public void Method_C() { } . } Fig. 4. A Java example about inheritance

In Fig. 5, all the methods and variables in the class ParentClass are inherited to the subclass ChildClass. The method code spaces of the inherited methods are shared between the two classes and only the method Method_C() is created for the subclass ChildClass specifically. For the static variable d, a direct address is stored in the CDS of class ChildClass, which points to the address where d is stored. Two special cases, method overloading and overriding, are met in inheritance. Method overloading allows two methods to have the same names, but with different signatures. In jHISC, the overloaded method is treated as a new method added to the subclass. Thus the related operand property descriptor and direct address about the overloaded method are appended to the end of the CPDT and CDS in the object context of subclass, respectively. In the case of method overriding, a subclass redefines the methods or variables inherited from its superclass. In jHISC, the overridden method reserves the descriptor

394

T. Yiyu, A.S. Fong, and Y. Xiaojian

in the same position in the superclass context, but the related descriptor in the subclass context is replaced by a new one. When a variable is declared as a different type in the subclass, the overridden variable is treated as a new variable in the subclass.

CH CDS CPDT

Class ParentClass

CPDT Word a Word b Word c Static W ord d Method_Ref Method_A() Method_Ref Method_B()

CDS word direct address direct address

Method_A() MCS . .

Class ChildClass CH CDS CPDT

Method_B() MCS

CPDT Word a Word b Word c Inherited Static Word d Method_Ref Method_A() Method_Ref Method_B() Method_Ref Method_C()

. .

CDS direct address direct address direct address direct address

Method_C() MCS . .

Fig. 5. Object context and relations in inheritance

4.3 Polymorphism Polymorphism allows many types to be treated as if they were one type, and a single piece of code to work on all those different types equally. It is supported by dynamically binding an object to the appropriate method, but the actual binding occurs at runtime. In a Java program about polymorphism shown in Fig. 6, the class Shape establishes a common interface to any class inherited from it. The derived classes override these definitions to provide unique behavior for each specific type of shape, which causes that the method draw() is defined in the classes Shape, Circle and Square, and each of them has its own implementation. In the class Shapes, the polymorphic method draw() is invoked. In jHISC, Polymorphism is performed by the resolution process and the corresponding object contexts and their relations are shown in Fig. 7. When the method main() is invoked, the instances a and b are created with the keyword new and the appropriate types (namely Square and Circle). But their instance references are upcasted to the Shape. When the method invocation a.draw() is executed, the instance a and the field reference Shape.draw() are needed. Since it is the first time to access the field reference Shape.draw(), system will trap to the dynamic resolution routine. From the instance a, the resolution process finds out that the reference Shape.draw() for instance a is bound to the method Square.draw(). The resolution process then sets

Architectural Solution to Object-Oriented Programming

395

up the related descriptors, field reference Shape.draw(), and the direct address pointing to the object header of class Square. After these, Invocation resumes with the resolved references and system executes instructions inside the invoked method. When the second method invocation, b.draw(), is executed, system will use the same field reference Shape.draw() and the instance b. Because the operand descriptor is resolved, no resolution is needed for the field reference Shape.draw(). However, during invocation, the access checking finds that the affiliated class of the instance b is Circle, and is different with the resolved class reference Shape, which is actually pointing to the class Square. Thus it will cause a trap to the resolution routine to find whether the class Circle is a subclass of Shape. If yes, the resolution process finishes. And the direct address for the field reference Shape.draw() is set up to point to the object header of the class Circle, then the execution resumes with the resolved reference. If no, an error occurs and exception handler will be performed. class Shape { public void draw() { } } class Square extends Shape { public void draw() { //

Suggest Documents