A Bare-Metal and Asymmetric Partitioning Approach to ... - IEEE Xplore

40

IEEE TRANSACTIONS ON SERVICES COMPUTING,

VOL. 7,

NO. 1, JANUARY-MARCH 2014

A Bare-Metal and Asymmetric Partitioning Approach to Client Virtualization Yuezhi Zhou, Member, IEEE, Yaoxue Zhang, Hao Liu, Naixue Xiong, Member, IEEE, and Athanasios V. Vasilakos, Senior Member, IEEE Abstract—Advancements in cloud computing enable the easy deployment of numerous services. However, the analysis of cloud service access platforms from a client perspective shows that maintaining and managing clients remain a challenge for end users. In this paper, we present the design, implementation, and evaluation of an asymmetric virtual machine monitor (AVMM), which is an asymmetric partitioning-based bare-metal approach that achieves near-native performance while supporting a new out-of-operating system mechanism for value-added services. To achieve these goals, AVMM divides underlying platforms into two asymmetric partitions: a user partition and a service partition. The user partition runs a commodity user OS, which is assigned to most of the underlying resources, maintaining end-user experience. The service partition runs a specialized OS, which consumes only the needed resources for its tasks and provides enhanced features to the user OS. AVMM considerably reduces virtualization overhead through two approaches: 1) Peripheral devices, such as graphics equipment, are assigned to be monopolized by a single user OS. 2) Efficient resource management mechanisms are leveraged to alleviate complicated resource sharing in existing virtualization technologies. We implement a prototype that supports Windows and Linux systems. Experimental results show that AVMM is a feasible and efficient approach to client virtualization. Index Terms—Virtual machine monitor, virtual machine, client virtualization, desktop virtualization, asymmetric partitioning

Ç 1

INTRODUCTION

W

the advent of cloud computing [1], numerous services can be easily developed and deployed in data centers [2], [3]. These diverse services can be accessed anywhere from a variety of clients, such as personal computers (PCs), tablet computers, and mobile phones, reducing the effort required to obtain such services. However, the analysis of service access and support platforms from a client perspective indicates that maintaining and managing service operating environments on clients remain a challenge for end users. In a typical enterprise scenario, the annual cost of managing a traditional PC can be up to five times the cost of deploying it [4]. ITH

. Y. Zhou and H. Liu are with the Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Room 3-529, FIT Building, Beijing 100084, P.R. China. E-mail: [email protected], [email protected]. . Y. Zhang is with the School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China and with the Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Room 3-529, FIT Building, Beijing 100084, P.R. China. E-mail: [email protected]. . N. Xiong is with the School of Information Technology, Jiangxi University of Finance and Economics, Nanchang, Jiangxi 330013, P.R. China and with the School of Computer Science, Colorado Technical University, 4435 North Chestnut Street, Colorado Spring, CO 80907. E-mail: [email protected]. . A.V. Vasilakos is with the Department of Computer and Telecommunications Engineering, University of Western Macedonia, Kozani 50100, Greece. E-mail: [email protected]. Manuscript received 13 May 2011; revised 14 Oct. 2012; accepted 4 Nov. 2012; published online 16 Nov. 2012. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TSC-2011-05-0044. Digital Object Identifier no. 10.1109/TSC.2012.32. 1939-1374/14/$31.00 ß 2014 IEEE

This situation worsens with increasing user demands on productivity and flexibility. A general approach to easing the burdens of maintaining and managing personal clients is to accomplish tasks such as patch management, antivirus updates, and software application monitoring with the use of management and security tools (e.g., Symantec Ghost [5], BMC BladeLogic Client Automation [6], Norton AintiVirus [7]). These tools relieve users of complicated management work. Nevertheless, the effectiveness of these tools is limited because they are installed and executed within an operating system (OS). This feature presents the following problems: 1) these tools fail when an OS component that they rely on is compromised; 2) they can be compromised by malware; and 3) they cannot effectively detect malware that has the ability to remain hidden from users (e.g., rootkit [8]). To improve the effectiveness of management and security, researchers propose several approaches [9], [10], [11], in which virtual machine (VM) technologies are used to install and run management software independent of a guest OS [12], [13]. However, these approaches cannot be widely deployed because of the large overhead imposed by a virtual machine monitor (VMM). The virtualization technologies commonly employed in clients can be classified into two categories: .

Client-hosted desktop virtualization exploits Type II VMMs, such as VMware Workstation [12] and Virtual PC [14], which enable a client to run multiple VMs with the help of a host OS. The use of VM can enable a seamless and simple experience for end users across different hardware platforms. Given the dependence on the host OS and VMM

Published by the IEEE Computer Society

ZHOU ET AL.: A BARE-METAL AND ASYMMETRIC PARTITIONING APPROACH TO CLIENT VIRTUALIZATION

layer, however, such client virtualization causes considerable overhead, resulting in diminished user experience. . Server-centric desktop virtualization approaches centralize an entire personal computing environment in data centers by creating and executing a virtual desktop in a virtual machine. Recently introduced products include Microsoft VDI [15], Citrix Xen Desktop [16], and VMware View [17]. These virtual desktops enable users’ access outside a data center, with a remote display protocol over multiple types of network connections. Because video frames have to be transferred over networks to end users, however, applications that require large network bandwidths (e.g., graphics applications) cannot be efficiently supported. Moreover, the overhead incurred from the aforementioned technology categories is larger than that of a native client. In this paper, we present an asymmetric VMM (AVMM), which is a bare-metal (Type I1) VMM with an asymmetric partitioning strategy that achieves near-native performance while facilitating easy support of value-added features, such as client management. Two types of asymmetric partitions are used: a user partition and a service partition. The user partition runs a single commodity OS (user OS) and the service partition runs a specialized OS (service OS) for value-added features. Given that end users seldom need to simultaneously run multiple OSs, AVMM supports the running of only a single user OS each time. This feature improves virtualization performance through two approaches: 1) Given that only a single user OS is running, AVMM assigns some input/ output (I/O) resources, such as graphics and audio devices, to be monopolized by the user OS. 2) The user OS, service OS, and VMM compete for underlying platform resources; thus, AVMM enables more efficient resource management and sharing mechanisms than do existing virtualization technologies. For example, AVMM divides memory into fixed parts and statically assigns them to the user OS, service OS, and VMM. Therefore, memory resource sharing is alleviated and the virtualization overhead is reduced. By contrast, existing VM technologies feature varying numbers of user OSs, thereby necessitating the dynamic sharing of memory among user OSs and VMMs. This process requires more complicated resource management and sharing mechanisms. AVMM creates a dedicated service partition that runs a specialized service OS, in which more complex and out-of-operating system (out-of-OS) service modules are implemented. In this paper, we focus specifically on the design and implementation of AVMM. We implement a preliminary prototype that supports Windows and Linux (run one at a time) on an Intel x86 machine, which is capable of newly developed hardwareassisted virtualization technology, i.e., Intel Virtualization 1. This term is proposed by Goldberg [18] to classify virtualization approaches. Type I indicates that a VMM runs directly on top of underlying hardware, whereas in Type II, a VMM runs on top of a host operating system.

41

Technology (or Intel VT) [19]. The contributions of this paper are as follows: First, AVMM provides an asymmetric partitioning-based bare-metal VMM to clients. Through the monopolized allocation of partial I/O devices and efficient resource management, AVMM alleviates resource sharing between the user OS and the service OS, thereby substantially reducing virtualization overhead. Second, AVMM employs a service partition that runs a customized OS to provide value-added features independent of the user OS. We demonstrate the benefit of this service partition in experiments that involve a centric and remote virtual disk feature. New value-added features can be implemented with minimal effort (e.g., about 500 lines of code for the virtual disk feature). The service OS also provides value-added features that can significantly improve user experience. Finally, we implement a preliminary prototype based on the asymmetric partitioning-based bare-metal concept, which leverages recent hardware-assisted virtualization technologies to simplify implementation. Our prototype supports Windows and Linux in the user partition without the need for modifications. We evaluate AVMM with micro- and application-level benchmarks and compare it with two representative client virtualization approaches, i.e., VMware [12] and Xen [13]. Results show that AVMM achieves a performance comparable to that of a regular PC and better than that of exiting virtualization approaches. New value-added features can also be easily implemented in AVMM. The remainder of the paper is organized as follows: Section 2 presents our main design goals and principles. Section 3 outlines the general approaches of AVMM. Section 4 describes the detailed design and implementation of AVMM. Section 5 presents the experimental results derived in accordance with the micro- and application-level benchmarks. Section 6 reviews related work, and Section 7 concludes the paper.

2

DESIGN GOALS AND PRINCIPLES

The design goals of AVMM are as follows: .

.

Maximum near-native performance. Performance overhead is always the core challenge in virtualization because of the indirection layer introduced by VMMs. In server virtulization, overhead can be mitigated by consolidating workloads on underutilized servers. By contrast, in client virtualization, overhead is easily observed, thereby substantially diminishing user experience. To provide a comparable end-user experience, AVMM tries to maximize performance and prevent virtualization overhead from being perceived by end users. Full compatibility with commodity OSs. Modifying OS kernels or replacing device drivers with specialized ones may achieve better performance, similar to that realized in para-virtualization [20]. However, injecting codes into the existing boot loaders or runtime kernel modules of widely accepted commercial OSs (e.g., Windows XP/7) is difficult. As an OS evolves,

42


more money is required to support it. Moreover, testing OS modification is a laborious task and may potentially cause new bugs. Similar to that implemented in full virtualization [20], therefore, full compatibility should be preserved as similar or better performance is obtained. . Facilitated provision of out-of-OS value-added services. Users eagerly anticipate new value-added features that bring more benefits to client virtualization, such as low cost, ease of system administration, and enhanced security. Enterprises and end users are most concerned about management and security. For example, system administrators are expected to have the capability to scan viruses, as well as efficiently monitor and manage machines. Today’s typical within-OS agent-based management mechanism is easily overset by end users or malware; AVMM address this problem by providing out-of-OS mechanisms that add enhanced management and security features. To achieve the design goals, we adopt the following design principles: 1.

2.

3.

Hardware-assisted virtualization. To obtain full compatibility while providing higher performance, AVMM employs the most recent hardware-assisted virtualization technologies. Hardware-assisted virtualization eliminates the need to emulate privilege instructions; this feature is similar to that presented by full virtualization approaches and simplifies VMM implementation. Thus, overall performance improves. Asymmetric and dedicated service partitioning. The underlying client platform is partitioned into two asymmetric partitions (i.e., virtual machines): user partition, which runs the commodity OS (user OS), and service partition, which runs a specialized OS (service OS) that is devoted to the service, control, or management functions of a system. The service partition facilitates easy and rapid development and deployment of new value-added service functions and modules. For example, traditional management functions that are implemented in the user OS as software agents (e.g., asset auditing, monitoring, and intrusion detection) can now be implemented in the service OS independent of the user OS (i.e., in an out-of-OS manner). This approach reduces system disruptions from endusers or deviant software modules. The service partition can also function as a device driver VM, as in Xen 2.0 [21], enabling the user OS and AVMM to drive I/O resources. Examples of such tasks include supporting diverse disk and network devices by leveraging existing device drivers. Single user OS and partial direct I/O device access. Platform resources may be monopolized or shared among partitions. To preserve unmodified user experience, only one user partition runs a commodity OS for end users each time. This way, most platform resources, such as most of the central processing unit (CPU) or memory resources, can be assigned to the user partition while the service

VOL. 7,


Fig. 1. Asymmetric partitioning.

partition runs with the minimum resources. Given that the virtualization of I/O devices is the primary cause of low application performance in client virtualization, we use a direct-access device method to realize maximum I/O access performance. A set of platform devices (e.g., graphics or audio device) that are sensitive to user experience are assigned to and monopolized by the user OS; thus, these devices can be directly accessed by the user OS without any AVMM intervention. This approach improves the efficiency and performance of I/O operations for the user OS.

3

GENERAL APPROACHES

In this section, we present an overview of AVMM and the general approaches that fall under it, including the asymmetric partitioning strategy.

3.1 Asymmetric Partitioning AVMM is based on the recent hardware-assisted virtualization technologies introduced by mainstream vendors. Examples are Intel VT [19] or AMD-V [22]. With this type of hardware virtualization, the instruction execution, access to privileged CPU registers, and I/O port access of a VM can be selectively trapped into the VMM by configuration. Thus, underlying platform resources, such as certain I/O devices, can be directly assigned and accessed by a VM, as required, without any VMM intervention. Fig. 1 shows the asymmetric partitioning structure of AVMM. As previously stated, the platform resources of a client are divided into a user partition and a service partition. In accordance with factors such as performance, reliability, and security, the underlying platform resources of a client are dedicated to one of the partitions or shared between them as required. To ensure maximum access performance of peripheral devices that are important to user experience, we can exclusively assign these resources to the user partition for direct access. The access permitted to dedicated directaccess devices is passed through AVMM without any intervention. Certain resources can also be assigned to the service OS for direct access. However, some critical platform resources that are crucial to system control and management, such as the programmable interval timer (PIT), programmable interrupt controller (PIC), and advanced programmable interrupt controller (APIC), must be virtualized, shared among partitions, and accessed via AVMM. CPU and memory resources must also be shared among partitions.


43

Fig. 3. Example of physical memory allocation.

Fig. 2. Overview of AVMM architecture.

Loading AVMM and the service OS before the user OS is loaded necessitates the modification of basic input/output system (BIOS) functions. These mechanisms are nonetheless transparent to the loading of the user OS. That is, the user OS can boot up normally without any modifications to its loading or boot-up programs.

3.2 Overview of AVMM The general architecture of AVMM is illustrated in Fig. 2. AVMM directly runs on hardware platforms and provides a full virtualization-like interface to the user OS; that is, no modification has been done to the user OS kernel. As in the main interface of other VMMs, that of AVMM can be classified into three components: CPU, memory, and I/O devices. In what follows, we discuss each subsystem and its general design within AVMM. Although our implementation is specific to an x86 machine with Intel VT, the aforementioned implementation can be readily used in other platforms and hardware virtualization technologies with or without minor modifications. 3.2.1 CPU In AVMM, the CPU is virtualized through architectural extensions of the modern CPU, such as Intel VT or AMD-V. Intel VT enables a more straightforward and robust CPU virtualization than does software-based virtualization. The user or service OS can be directly executed without simulation or emulation of privileged instructions, as in full virtualization, or without modifications, as in paravirtualization. The virtual CPU module in AVMM enables the abstraction of a processor to partitions, as well as the management of the virtual processor and associated virtualization events. This module saves the corresponding physical processor’s state and resumes accordingly when executing switches between partitions as VM Exit or VM Entry. To obtain full control of the platform, AVMM must intercept or trap certain special instructions or events that are critical to system control or management by specifying the hardwareenable data structure, i.e., the virtual machine control structure in Intel VT. These instructions or events, which may lead to VM Exit via AVMM, are configured as follows: .

Instructions whose execution changes the CPU state or status must be trapped and/or virtualized to

avoid affecting the normal execution of another partition. These instructions include CPUID, HLT, PAUSE, INVD, and INVLPG. . Access to privileged processor states (e.g., MOV CRx (CR0, CR3, and CR4), MOV DRx, RDMSR, WRMSR) should be intercepted. Read/write (R/W) functions for involved control or status registers (such as CR0, CR4, and the time stamp counter) may need to be shadowed; R/W access to such registers do not cause a VM Exit but return the shadowed values. . Certain exceptions and faults, such as page faults, should be intercepted on VM Exit. Accordingly, the corresponding virtualized exceptions and faults require injection into related partitions on VM Entry. . External interrupts are all intercepted on VM Exit and the corresponding virtualized interrupts are injected on VM Entry. The external interrupt vector cannot be configured for selective interception; thus, every interrupt is first handled by AVMM and then injected into one partition accordingly. The physical CPU is shared among AVMM, the user OS, and the service OS. To guarantee the performance of the user OS, most of the CPU time is assigned to the user OS. CPU time assignment is implemented by the CPU scheduler in AVMM.

3.2.2 Memory Given the fixed number and functions of partitions in AVMM, the main memory is subdivided into several contiguous areas and allocated to each partition. Each partition has a 0-origin memory. Memory accesses may be trapped or checked as required by AVMM to ensure security or enable virtual memory-mapped I/O (MMIO) functions. The user partition is allocated to a contiguous physical memory, starting from the absolute zero location to the maximum allowable value. This virtual = real mapping eliminates the performance overhead associated with address relocation and paging. Direct memory access (DMA) operations can also be directly executed without memory translation; thus, I/O access performance improves. The service partition and AVMM are also allocated to a contiguous physical memory, but the allocation begins from a fixed offset. Such allocation can also reduce the performance overhead incurred from memory paging and DMA operations, but only with a fixed memory offset. Fig. 3 illustrates the allocation of memory in an x86 client with a 32-bit processor. The physical memory is subdivided into several parts that are employed in the user OS, service OS, AVMM, and shared memory. The shared memory is a range of memory that are used to share data between the user OS, service OS, and AVMM. MMIO and Flash block are not real physical memory, but are mapped into the 4-GB memory space only for I/O operations and system BIOS functions.

44


To prevent a guest OS (i.e., user or service OS) from illegal or insecure memory access (e.g., accessing memory not owned by the guest OS), AVMM uses a shadow page table (PT) mechanism to translate the guest’s virtual address into a physical memory address instead of allowing the guest OS to directly translate its virtual address. This shadow PT can also be used to hide MMIO resources, as well as virtualize the MMIO operations of certain peripheral component interconnect (PCI) devices (e.g., network or disk devices from the user or service OS). This mechanism is comprehensively discussed later in the paper.

3.2.3 I/O Devices The virtualization of other devices is more complicated than that of CPU and memory. The other devices in an AVMM client are virtualized and shared among partitions or exclusively assigned to one partition, as desired. From the perspective of a guest OS, I/O devices can be classified into three specific categories. Virtual devices. Platform resources are critical to system control and management (such as PIT, PIC, and APIC) must be virtualized by AVMM. To avoid kernel modification of the guest OS, AVMM fully virtualizes these devices, which can be accessed by the user or service OS with existing physical device drivers. The virtual device models and physical drivers of these critical devices are implemented only in AVMM and can be accessed through standard I/O interfaces by guest OSs. Direct-access devices. The devices in an AVMM client are exclusively assigned to one partition and are monopolized by it. To improve the performance experienced by the enduser, especially the access performance of I/O devices, most devices other than the above-mentioned critical ones can be assigned to and directly accessed by the user OS. The I/O ports and MMIO operations of direct-access devices are not trapped into AVMM, thereby avoiding AVMM overhead. The DMA device is not virtualized and directly accessed to improve operation performance. The user OS can normally access DMA device because of virtual ¼ real memory mapping. However, the service OS has to handle a DMA operation, but only needs to add an offset to the allocated original memory address of the DMA operation. Disguised devices. Unlike the virtual device model of virtual devices, that of a disguised device is divided into two parts: one in AVMM and another in the service OS. The component of the device model that is located in AVMM traps the corresponding port I/O operations and passes them to the component located in the service OS. This trapping and passing procedure completes virtual device operations with existing physical device drivers in the service OS. Disguised devices present two advantages. First, they enable to easily support diverse devices by leveraging the existing device drivers in the service OS, thereby simplifying the construction of AVMM and improving system reliability. Second, the two main sources of system weakness and vulnerability—the disk and network devices—are disguised, enabling the user OS to provide enhanced features. For example, the split device driver in the service OS can facilitate the development of value-added applications, such as soft devices [23] and parallax [24]. This development enhances system reliability and security.

VOL. 7,


The disguised device may or may not have a corresponding physical resource in the underlying platform. For example, the disguised network device in a user partition may have a corresponding device in the AVMM client. However, the disguised disk device may have a corresponding disk located in another machine instead of in the local machine, as implemented in our prototype. These corresponding devices, if present, are also hidden from the user OS, as well as assigned to and accessed only by the service OS. From the perspective of the service OS, these corresponding devices are direct-access types. As previously described, a set of value-added filters or monitor drivers can be developed and deployed above the corresponding physical driver in the service OS, thereby improving system management or security. This issue is not the focus of the paper and will, therefore, not be discussed further.

4

IMPLEMENTATION DETAILS

Several implementation details concerning our design are discussed in this section.

4.1 Dynamic Proportional-Share CPU Scheduling The platform PIT is virtualized by AVMM. Therefore, the sharing and scheduling of a CPU can be implemented by assigning the timer’s slice to the user or service OS. To acquire a user OS performance similar to that without virtualization, we assign most of the CPU cycles (e.g., 90 percent) to the user OS and only a small proportion to the service OS (e.g., 10 percent). The service OS can set parameters for specifying the proportions of CPU cycles assigned to the guest OSs by calling in AVMM. The limitation of this fixed proportional-share scheduling is lack of flexibility. For example, the idle time of one guest OS cannot be allocated to another. Moreover, if a critical task that is to be carried out by the service OS cannot be executed in a timely manner, overall performance diminishes. To address this problem, we extend the fixed scheduling to a dynamic proportional-share mechanism by applying two heuristics designed to improve efficiency. These two heuristics are outlined as follows: 1.

2.

Identifying idle states and remising cycles. If the user OS is idle, AVMM runs the service OS. If the service OS is idle, AVMM schedules CPU time to the user OS. If both are idle, AVMM enters an idle state. If the guest OS becomes active again, AVMM allows the guest OS to regain remised cycles. Detecting busy states and seizing cycles. AVMM detects the busy state or potentially busy state of the service OS in which a critical task runs. If AVMM finds a busy service OS, it allows the service OS to obtain more CPU cycles. For example, when the user OS issues a disguised hard disk operation, which is trapped into AVMM and completed by the service OS, AVMM allows the service OS to instantly run in the next time slice, thereby finishing the request in a timely manner.

4.2 Shadow Page Table Mechanism To retain ultimate control of memory resources and protect access from and between partitions, AVMM virtualizes


memory access. Two approaches to virtualizing hardware memory in an x86 machine are used: the shadow PT and direct PT mechanisms. The shadow PT mechanism is used in VMware ESX [25] and needs no modifications to the guest OS. The direct PT mechanism, however, needs to modify the guest OS, as in Xen 1.0 [26]. Recall the goal of full compatibility, we use the shadow PT mechanism to virtualize memory access. In this approach, each guest OS freely allocates and maintains its PT structure (PTS), including components such as page directory (PD), PTs, and pages, as it would without virtualization. However, when a guest OS loads a CR3 register with the base address of its guest PD, this operation is trapped and the value is loaded into a virtual CR3 that is provided and maintained by AVMM. The real CR3 is loaded with the base address of the corresponding shadow PD that is created and maintained by AVMM. The shadow PTS is used by the processor and kept consistent with the guest PTS by AVMM. The shadow memory module in AVMM implements the shadow PT algorithm. The fixed and 1:1 virtual-to-physical allocation strategy in AVMM does not dynamically allocate and reclaim physical memory, but only validates and redirects the memory accesses of partitions; thus, the implementation of the shadow memory module is simplified. That is, the guest virtual address (i.e., guest physical address) installed in the virtual CR3, as well as the guest PD and PT entries used by the user OS, is directly mapped into the real physical address. The virtual address of the service OS is mapped to the physical address only with a fixed offset. Initially, the shadow PTS is created with all its entries marked invalid using the present (P ) flag bit in the entries. When a guest OS presents a virtual address for use, the shadow PTS is consulted instead, resulting in a page fault. This page fault is trapped into AVMM, which copies the corresponding entries from the guest PTS and then fills the shadow PTS. In an x86 machine, however, the processor automatically sets the accessed (A) bit and dirty (D) bit in the PD and PT entries. Ensuring consistency between the guest and shadow PTS necessitates populating the flag bits set in the shadow PTS into the guest PTS. As described above, the shadow PT is initiated as not present. Thus, when the shadow PT is first accessed by the guest OS, a page fault is generated. In handling this page fault, AVMM can set the P bit in the shadow PTS and the A bit in the corresponding guest PTS. The D bit is obtained by setting all entries in the shadow PTS as read-only until the D bit is set in the corresponding guest PTS. When a guest OS attempts to write a read-only page, whose corresponding guest is an R=W bit, a page fault occurs. AVMM then takes over control and sets the R=W bit in the shadow PTS and the D bit of the corresponding entries in the guest PTS. The guest OS is allowed to freely modify the guest PTS, leading to inconsistence in the shadow PTS. Resolving this inconsistency necessitates trapping the memory access-related events issued by the guest OS and transferring system control from the guest OS to AVMM. Subsequently, the shadow memory module takes control,

45

analyzes the source of the event, and accordingly resolves the inconsistency. The various events initiated by the guest OS that require trapping into AVMM can be classified into the following groups: 1.

2.

3.

Page fault. A page fault may be generated by a guest OS. An example is when the requested guest page is not in the guest PTS. In this case, the fault should be sent to the guest OS for handling. As previously stated, a page fault may also be generated by the inconsistency between the shadow and guest PTS. An example is when the shadow PD or PT entries are marked not present or when their R/W bits are not consistent with the bit in the guest entries. In this case, AVMM should maintain the shadow PTS in accordance with the guest PTS by allocating new PT pages and/or updating relevant flag bits in the PD or PT entries. The page-faulted instruction is then re-executed. To intercept the MMIO operations of certain PCI devices or PCI configuration transactions that require virtualization from the guest OS, AVMM sets the MMIO memory pages as not present. Thus, a page fault is created when a guest OS accesses the MMIO pages. Such page faults should be passed onto and handled by the MMIO handler module in AVMM. Translation lookaside buffer (TLB) operation. A TLB operation (e.g., INVLPG) issued by the guest OS must also be trapped and handled by AVMM because of possible inconsistencies between the guest and shadow PTS. After AVMM takes control, the shadow PTS is modified to emulate the desired effect of the INVLPG. In the modification, the relevant shadow PD or PT entries are set as not present. Then, the INVLPG is executed with the fault address to remove the invalid physical TLB entries. If all entries in the shadow PT are not present, the shadow PT is deallocated and the parent PT entry is set as not present. Subsequently, the CR3 is reloaded with the current value to flush the physical TLB. Address space switch. When a guest OS attempts to load from or store in CR3 or initiate a task switch, the address space changes, resulting in the invalidation of the entire TLB. Therefore, AVMM must take control and disable the entire shadow PTS to emulate the desired effect. The description above indicates that translating a guest virtual address incurs very high costs because of the population of the shadow PTS and its repopulation when the address space is switched back. Fortunately, the recently introduced hardware extended PT (EPT) mechanism of Intel [27] can be used to simplify the implementation of shadow page mechanism. Furthermore, by tagging each address space with a virtual processor identifier, multiple shadow PTs are maintained for a guest OS. Therefore, the cost of discarding shadow PTs and corresponding TLB entries is avoided during address switching or repopulation upon switching back to the original address space.

46


4.3 Resource Discovery and Allocation AVMM can assign an I/O device to one partition for exclusive use. That is, the assigned device is not used or shared by other partitions. In AVMM, we use a resource hiding technique to assign and hide resources for guest OSs. If underlying platform resources are allocated to one guest OS, they can be discovered, configured, and used in the standard manner. However, these resources must be hidden to prevent discovery and use by other partitions. Memory is allocated and hidden in a guest OS by intercepting access to advanced configuration and power interface (ACPI) tables [28]. When the guest OS loader is initiated, it discovers system memory resources through the INT 15H, E820H call2 of the BIOS to query the system address map. AVMM hooks the INT 15H to hide certain ranges of physical system memory. The memory address ranges allocated to AVMM and the service OS are labeled reserved by ACPI (AddressRangeReserved) and, therefore, must not be used by the user OS. Similarly, when the service OS discovers the memory resources, the address ranges of AVMM and the user OS are returned as ACPI reserved. After the BIOS relinquishes system control to the guest OS, it discovers and configures its PCI device resources by accessing their configuration spaces. By intercepting the access of the PCI configuration spaces, the devices that are not assigned to a partition can be hidden from this partition. AVMM traps the 0xCFC/0xCF8 port I/O and PCI MMIO configuration transactions to assign or hide the PCI devices. When a guest OS accesses the PCI configuration space of a device by accessing the 0xCF8 and 0xCFC I/O ports or the MMIO memory,3 such access is trapped into AVMM. If a device requires being assigned to one guest OS, AVMM returns the real configuration space; otherwise, it returns Null to hide this device from the guest OS. If a device is hidden in a guest OS, the corresponding MMIO resources are also hidden by the shadow memory module. If a device is virtualized or disguised by AVMM, the space reading operation returns the virtual configuration space of the device. AVMM must emulate and virtualize the PCI base address register, as well as other registers, for access by the guest OS.

5

IMPLEMENTATION AND EVALUATION

We implement AVMM on an Intel x86 machine. The user OS can support Windows XP Professional SP2 and a desktop Linux with kernel 2.6.21. The service OS is a simplified and customized Linux with kernel 2.6.13. AVMM is implemented with approximately 33,000 lines of C code and 2,000 lines of ASM code. To demonstrate the value-added benefits of AVMM, we virtualize the integrated drive electronics (IDE) device as a disguised device for the user OS. The disguised IDE operations are trapped into AVMM and passed onto the service OS for reading/writing from/to a remote server’s 2. In an EFI BIOS, the corresponding boot service call is GetMemoryMap(). 3. PCI Express defines the enhanced configuration access mechanism, which enables an OS to access configuration registers via memory-mapped address space [29].

VOL. 7,


hard disk via the network block device (NBD) protocol [30]. We also develop a set of management tools on the basis of the centric storage of these remote disk images on a server and deploy the pilot system to demonstrate the advantages. A client can also benefit from other value-added features, such as management and security, if these features are implemented in the service OS. We evaluate the performance against a variety of microand application-level benchmarks. The experimental setup and the results are presented in the succeeding sections. We are primarily interested in the following issues: 1) the extent to which AVMM can benefit from the bare-metal mechanism; 2) the extent to which AVMM can reduce overhead by efficient resource management and monopolized I/O devices; and 3) the extent to which value-added services can improve user experience.

5.1 Experimental Setup The AVMM client machine is configured as an Intel Dual Core E6300 1.86-GHz machine, with a 1-GB DDR2 667-MHz RAM, a 1-Gbps Intel network card, and an Intel G965 Express Chipset Family Graphics Card. Its hard disk is virtualized as an NBD device for reading/writing from/to an NBD server, which is configured as an Intel Xeon Quad Core 1.6-GHz machine, with a 4-GB DDR2 667-MHz RAM, a single Hitachi 160-GB, 15,000-rpm SATA hard disk, and a 1 Gbps onboard network card. The AVMM machine and the NBD server are connected with a TP-Link TL-SG1048 1-Gbps Ethernet switch. We compare the performance of the AVMM client with the following: 1) a regular PC, which has the same hardware configuration as the AVMM client but with a local 250-GB Seagate 7,200-rpm SATA hard disk; 2) VMware, virtualized with 256-MB memory (the optimal size recommended by VMware4) and a 20-GB hard disk that uses the VMWare Workstation 6.5 hosted by Windows XP professional (SP2) with an NTFS 3.1 file system on the same hardware as that in the regular PC; and 3) Xen 3.1 [13], which supports Hardware VM (HVM) and is configured with 256-MB memory for the user OS and Opensuse 11.1 [31] (kernel 2.6.27) as the Domain0 OS (with 722-MB memory). The AVMM client, regular PC, VMware, and Xen 3.1 are all run Windows XP Professional SP2 with NTFS 3.1 or a desktop Linux (kernel version 2.6.21) with an ext3 file system to test their performance in Windows or Linux. The AVMM service partition runs a customized Linux (kernel version 2.6.21) and an NBD client (version 2.9.9). The NBD server runs Red Hat Enterprise Linux Server 5.1 (kernel version 2.6.18) with an ext3 file system and an NBD server (version 2.9.9). 5.2 CPU We first evaluate the CPU performance of AVMM using PassMark PerformanceTest 7.0 [32] in Windows, which contains a set of test suites, including CPU, memory, and graphics. The CPU suite contains eight tests: Integer Math (IM), Floating-point Math (FPM), Find Prime Numbers (FPN), Multimedia Instructions (MI), Compression (COM), 4. We also confirm that this size is optimal by testing the performance of the web browsing application with different memory sizes.


47

Fig. 4. PassMark PerformanceTest CPU benchmarks.

Fig. 5. LMbench CPU benchmarks.

Encryption (ENC), Physics (PHY), and String Sorting (SS). We compare the CPU performance of AVMM against a regular PC without virtualization (native), VMware, and Xen. The results5 (see Fig. 4) indicate that VMware, Xen, and AVMM all achieve a CPU performance very close to that of the native client, with no more than 8 percent overhead. This little overhead is attributed to the fact that the tests are dominated by user-level executions and do not involve excessive virtualization work. AVMM achieves a performance that is 1 to 2 percent better than that of VMware in IM, FPM, PHY, and SS and 2 to 7 percent better than that of Xen in all cases. We also test CPU performance in Linux using LMbench 3.0 [33]. The CPU subset of LMbench consists of 34 arithmetic microbenchmarks, which can be divided into two groups: a simple test group with arithmetic operations, such as add/mul, and a complex test group with arithmetic operations, such as div/mod. A shorter completion time indicates good performance. The performance of VMware, Xen, and AVMM under the simple test group is very close to that of the native client, although the performance levels cannot be sufficiently differentiated in terms of precision. However, AVMM shows a performance similar to that of the native client and a better performance than that of VMware in the complex test group (see Fig. 5). VMware incurs an overhead of 10 to 15 percent over the native client in eight tests: integer div (ID), integer mod (IO), int64 div (ID64), int64 mod (IM64), float div (FD), double div (DD), float bogomflops (FB), and double bogomflops (DB). In the integer mod parallelism (IMP) test, VMware needs three times as long as that required by the native client to complete the operation. AVMM incurs an overhead of less than 4 percent over the native client in all the eight tests. Xen has a performance comparable to that of AVMM in most cases, but shows an obvious slowdown compared with AVMM in the ID64 and IM64 tests. The CPU performance evaluation shows that, with a baremetal VMM mechanism, AVMM achieves a performance comparable to or better than that of VMware and Xen.

suite includes five subtests: allocate small block, write, read cached, read uncached, and large RAM. Fig. 6 shows the test results. In the first four subtests, VMware, Xen, and AVMM show a performance similar to that of the native client, except that Xen exhibits a sharp performance drop in the read cached test. In the large RAM test, VMware, Xen, and AVMM suffer from a sharp slowdown compared with the native client. This test allocates large amounts of memory and then reads them. It can exhaust memory, thereby causing paging. Given that paging involves considerable virtualization and physical memory relocation, operations of VMware, Xen, and AVMM significantly suffer. Despite this drawback, AVMM achieves a performance that is much better than that of VMware and Xen. This result shows that with the static memory allocation strategy and shadow PT mechanism, AVMM outperforms existing VM approaches when excessive paging is encountered. To confirm this conclusion, we run LMbench 3.0 in Linux to measure page fault time. This benchmark measure how fast a file page can be faulted. We run the benchmark for five trials, and the averages are listed as follows: native client, with 1.33 ms; VMware, with 8.53 ms; Xen, with 9.07 ms; and AVMM, with 4.69 ms. The results are shown in the last cluster of the bars in Fig. 6, which indicates that AVMM achieves a performance nearly two times as good as that of VMware and Xen in the page fault test. To obtain a precise metric of memory performance, we run the R/W benchmark of the PerformanceTest Memory Advanced Suite. This suite enables the determination of the exact speed of the memory R/W process. It also measures the average time with which memory is accessed by requesting different block sizes. Fig. 7 illustrates the results of memory reading.

5.3 Memory We then evaluate the memory performance of AVMM and compare it with that of the native client, VMware, and Xen, using the PerformanceTest 7.0 Memory Standard Suite. This 5. All the experimental results in this paper are the average of five to 10 trials. The standard deviation of these trials is at most 5.8 percent and is, therefore, omitted.

Fig. 6. Memory benchmarks.

48


Fig. 7. PassMark PerformanceTest memory reading.

The memory reading speeds of AVMM and VMware remain at almost the same value as that of the native client until block size increases to 0.5 MB, whereas Xen shows a slight slowdown compared with the other three cases. When block size continuously increases, performance in the four cases diminishes. AVMM and VMware exhibit a highly similar curve in terms of performance reduction, but show a performance slower than that of the native client. Xen shows a considerably higher performance drop than do the other three cases. The result of memory writing is similar to that of memory reading, and is therefore excluded. The memory tests show that in most cases, AVMM achieves a memory performance close to that of VMware and Xen. However, when a system runs applications that consume very large amounts of RAM or when memory is exhausted, AVMM significantly outperforms VMware and Xen because of its static memory allocation and shadow PT mechanism.

5.4 Graphics In this section, we study direct-access I/O device performance in AVMM. As an example, we evaluate graphics performance. We first run PassMark PerformanceTest 7.0 3D Graphics Suite in Windows to measure performance in terms of 3D graphics. For this test, we also use the Microsoft Direct3D 9.0 graphics library. Three test scenarios are provided: simple, medium, and complex. The simple test scenario includes 20 bouncing balls. The medium and complex tests comprise tens of seconds of video on planes flying. The simple test employs drawing features that include texturing and specular lighting. The medium test employs features that include stencil buffer processing, alpha blending, lighting, fogging, multitexture, and mipmapping. The complex test provides a variety of features, including Vertex Shader 2.0, Pixel Shader 2.0 effects, stencil buffer processing, alpha blending, lighting, fogging, multitexture, and mip-mapping. These three tests cover most of the common drawing features, thereby enabling general performance comparison. Given that Xen does not support 3D, we cannot present its results for the 3D performance comparison. We instead compare AVMM with Xen in terms of 2D performance in accordance with a real-world application benchmark. Each of the three tests is run in two resolutions: low resolution (800 600) and high resolution (1,024 768). We evaluate graphics in term of frames per second (FPS) [34], a standard evaluation metric for 3D graphics.

VOL. 7,


Fig. 8. PassMark PerformanceTest 3D graphics benchmarks—average FPS and low resolution.

Fig. 8 shows the performance of the native client, VMware, and AVMM under low resolution. VMware exhibits a 50 percent slowdown over the native client in the simple and medium tests. This overhead stems primarily from the virtualization work involved with the graphics card and DMA. The graphics card and DMA device are direct-access devices in AVMM and can be directly accessed by the user OS. Therefore, AVMM does not incur graphics-related virtualization overhead, except for interrupt injection. AVMM exhibits a performance drop of only 15 percent over the native client in the simple test and 12 percent in the medium test. We attribute the 12 to 15 percent reduction to the CPU, memory overhead, and related interrupt injections. In the complex test, VMware and AVMM show a performance that is very close to that of the native client (about 6 FPS). We speculate that at such low frame rate, the benchmark used does not sufficiently reflect precision. The performance of AVMM, VMware, and the native client under high resolution are similar to that under low resolution, with only a 10 to 30 percent drop in all cases. Thus, we do not present the results here. The aforementioned findings show that AVMM significantly outperforms VMware in terms of graphics performance when the Direct3D graphics library is used. To examine graphics performance under real-world workloads, we also perform evaluations using several computer games. Computer games are driven by the graphics card, making them effective graphics benchmarks. We choose two popular computer games as the standards: Counter Strike 1.5 (CS) [35] and Quake 3 [36]. These games are first-person shooter computer games. CS can support software rendering, OpenGL acceleration, and Direct3D acceleration. In our experiment, we configure the game so that it uses Direct3D acceleration because we find that the performance of CS is the best under this configuration. Quake 3 supports only OpenGL acceleration and requires an OpenGL-compliant graphics accelerator to run [37]. Therefore, Quake 3 is configured so that it uses OpenGL acceleration. These configurations enable us to evaluate 3D graphics performance with the two types of 3D acceleration technologies. We evaluate graphics by replaying the same game demo and counting the average FPS using a benchmark utility called Fraps [38]. For CS, we record a 72-second demo and replay it. For Quake 3, we replay its self-contained demo DEM0001 [36] (68 s). We run all these demos in the regular


49

Fig. 9. Computer game benchmarks—average FPS.

Fig. 10. Web browsing latency.

PC, AVMM, and VMware under low and high resolution to compare performance. Fig. 9 shows the results for the two game demos. For CS, VMware and AVMM show FPS rates that are lower than that of the native client under low resolution; they exhibit a slowdown of 15 and 5 percent, respectively. The gap among the three approaches more substantially expands under high resolution. VMware exhibits a 37 percent slowdown, whereas AVMM shows a slowdown of 19 percent. These results show that AVMM outperforms VMware when the Direct3D graphics library is used. VMware also performs poorly in the Quake 3 test with only 15 to 25 FPS. By contrast, AVMM and the native client exhibit FPS rates greater than 70 FPS. We attribute this discrepancy to the fact that VMware Workstation 6.5 currently only works well with applications that use Direct3D 9.0 accelerated graphics and cannot fully support OpenGL acceleration [12]. AVMM has almost the same FPS rate as the native client, showing nearly zero overhead. For CS, AVMM shows a 15 percent to 19 percent slowdown over the native client, indicating that AVMM performs more efficiently with OpenGL than with Direct3D. We attribute this difference in performance to the marshalling technique that OpenGL uses to buffer system call commands. Through this technique, OpenGL can transfer a number of system call commands as it switches from user mode to kernel mode. By contrast, Direct3D can transfer only a single system call command in mode switch operation [39]. In this case, OpenGL has considerably fewer mode switch operations than does Direct3D. Thus, AVMM performs similarly to the native client because of the lower overhead incurred during mode switching. Direct3D 10.0 and its later versions also implement marshalling [39], but these Direct3D versions are included in only Windows Vista and later versions. The accelerated 3D graphics technique of VMware Workstation 6.5 supports only Direct3D 9.0 in a Windows XP guest OS [12]. Therefore, we cannot test Direct3D 10.0 and its later versions in VMware. The requirements of VMware on Windows XP also prevent the evaluation of graphics performance in Linux. Nevertheless, we believe that our experiments in Windows (covering both OpenGL and Direct3D graphics libraries) sufficiently show that AVMM achieves a performance close to that of the native client and significantly outperforms VMware.

5.5 Application Performance We also examine how AVMM affects the performance of real-world applications. These application-level benchmarks measure the overall performance of AVMM different components. We compare AVMM performance with that of the regular PC, VMware, and Xen. We use web browsing and video playback as the applications because they are commonly performed tasks in a desktop environment. Web browsing performance is measured with Microsoft IE 6.0, using the web text page load test provided by iBench benchmark suite 5.0 [40]. This benchmark consists of a sequence of 30 different webpages, each containing a mixture of text and graphs. To compare the performance across other virtualization systems, we set the window resolution to 800 600. Video playback performance is measured using Windows Media Player 9.0, which plays a 21 seconds (320 240 pixels at 24 FPS) video clip displayed at 800 600 resolution. For both applications, we use a packet monitor to capture network traffic and measure performance using slowmotion benchmarking [41], which enables the quantification of performance in a noninvasive manner. For web browsing, we examine average page download latency under a network speed of 100 Mbps. As shown in Fig. 10, AVMM has an average latency of 220 ms, indicating that its performance is similar to that of the native client (200 ms). VMware and Xen exhibit larger latency of 310 ms and 610 ms, respectively. This result indicates that AVMM can reduce web browsing latency by 29 and 64 percent compared with VMware and Xen, respectively. AVMM performs more effectively than existing VM approaches in only several microbenchmarks. However, the result for web browsing latency shows that AVMM achieves considerably better overall application performance with the integration of the asymmetric partitioning strategy, as well as efficient resource management and sharing mechanisms. For the video playback, we use the video playback quality defined in the slow-motion benchmarking as the performance metric. The results are shown in Fig. 11. We normalize the video playback quality on the native client as 100 percent and compare it with the quality of AVMM and the other VM approaches. AVMM exhibits 89 percent performance compared with the native client; this performance is also better than that of VMware (70 percent) and Xen (64 percent), indicating that AVMM can enhance video playback quality by at least 20 percent. Such enhancement substantially improves end-user experience. Considering

50


VOL. 7,


Fig. 11. Video playback quality.

that video playback involves only 2D graphics instructions, the result also shows that AVMM achieves a much better 2D graphics performance than do VMware and Xen. The two experiments demonstrate the robust performance of AVMM across different types of popular applications. They show that AVMM can achieve a performance better than that of existing virtualization approaches, especially for applications that require heavy I/O operations.

5.6 Value-Added Service: Remote Virtual Disk We then demonstrate the benefits of the service OS through a remote and fast virtual hard disk feature implemented in the service OS. As previously described, the contents of a remote virtual disk are stored in an image file on the NBD server. This remote virtual disk feature can be used to centralize user computing environments, including OSs, applications, and data. Thus, maintenance and management can be centrally performed and reduced accordingly. This benefit is demonstrated later in the paper. To verify the feasibility of this centralization, we first evaluate disk performance in AVMM and compare it with that of the native client, VMware, and Xen; in the evaluation, we use a disk benchmark called Iometer [42]. Fig. 12 shows the disk access throughput in four scenarios: sequential and random R/W accesses with different request sizes. For random disk access, the R/W throughput of AVMM more considerably increases with request sizes. This throughput is, on average, at least three times higher than that of the native client. This result is attributed to the NBD server having a faster hard disk (15,000 rpm) than does the native client (7,200 rpm). Seek time dominates disk random access, resulting in a substantially conservation of disk access time by a faster disk. Another reason is that the NBD server has a larger memory (4 GB) than does the native (1 GB); thus, it can have a large cache. This result confirms that remote disk operations do not diminish disk performance if the server is more powerful than the client.6 For sequential disk access, the sequential R/W throughput of the native client increases with request sizes and is saturated to approximately 53 MB/s at a request size larger than 16 KB. For most request sizes, VMware and Xen exhibit a larger slowdown that does the native client. In most cases, AVMM achieves a throughput two times as high as that of VMware and Xen, and outperforms the native client when 6. Servers often exhibit higher performance than do clients in realworld scenarios.

Fig. 12. Disk access throughput.

the request size in sequential reading is larger than 32 KB. This result is attributed to the NBD server’s easier performing prefetching at a large request size in sequential reading. This feature explains why Xen outperforms the native client at a reading size larger than 128 KB. In sequential reading in this case, the host OS of Xen can perform more prefetching similar to that of AVMM.

5.7 Value-Added Service: Pilot System We show the added value of a pilot system that is based on AVMM and the remote virtual disk feature. The pilot system is deployed in a computer laboratory at a university. In this scenario, students load and run computing environments by accessing the virtual disk images on a centric server with the help of AVMM and the service OS. We have developed a set of management tools based on remote virtual disk features, including virtual disk image creation, user access control, virtual disk image updating, backup, and recovery. With the help of these tools, laboratory administrators can create virtual disk images that contain appropriate computing environments for different types of experiments on a centric server. In each class, the students can select a virtual disk image from the server and then load and run it with AVMM and the remote virtual disk feature of the service OS. To enable access and sharing of the same remote virtual disk image on the centric server among different users, we use the copy-on-write (COW) feature of the NBD server. That is, when a user writes to the same virtual disk image, the contents are written to a differentiated COW copy instead of to the original image. To resolve the discrepancies that arise from different students using the system, the administrator only needs to manually or automatically clear the corresponding COW copy of a remote virtual disk image. This pilot system has been running for 3 months. Most of the time it exhibits stable operation, and achieves the following benefits at the least: Reduced system maintenance and management time. Administrators previously spent an average of one or at least half a day a week to regularly clear every machine even with the


help of automatic tools to fix the problems caused by user faults or malicious attacks. AVMM reduces system cleaning and upgrading time to 30 minutes per week because of centralized operations on virtual disk images. Improved system availability and usability. Before AVMM implementation, weekly system maintenance took four to eight hours. Students cannot use the laboratory during maintenance day. After AVMM deployment, the laboratory can be used daily without weekly service interruptions. Improved system security. After the pilot system is deployed, fewer virus or worm attacks have been reported. Because system updates can be easily implemented in a timely manner, requiring only the updating of the centric images on the server. Even with errors, the pilot system rapidly resumes operations.

6

RELATED WORK

Most existing VMMs, such as WMware workstation [12], Xen [13], and Virtual PC [14], attempt to simultaneously simulate several VMs over a single physical hardware platform. AVMM differs from these in that it divides underlying resources into two asymmetric partitions and runs only a single commodity OS for end users at one time. Recently introduced server-centric desktop virtualization products, such as Microsoft VDI [15], Citrix Xendesktop [16], and VMware View [17], virtualize entire desktops in data centers. The virtualization technology behind these approaches is the server-side VMMs that run each desktop in a VM. Given a desktop that runs in a data center is remotely accessed, user experience heavily depends on network connection. AVMM can mitigate this dependence by locally running virtual desktop with client resources. AVMM adopts the full virtualization [20] concept to run commodity OS without modifications and easily supports multiple OSs. In contrast to client-hosed VMMs (Type II), which use a host OS to virtualize different resources, AVMM is of bare-metal (Type I) and has complete control over resources management. AVMM can assign a set of devices exclusively for direct access by the user OS, thereby reducing virtualization overhead and improving system performance, especially I/O access performance. Although para-virtualization [20] may achieve better performance than full virtualization, the former needs to modify the commodity OS. This process may be very difficult for closed-source Windows OSs to complete, limiting its usage in client desktops. Hardware-assisted virtualization has recently been introduced into Xen 3.0 or its later versions to implement HVM in a commercial OS without modifications. AVMM also adopts recently introduced hardware virtualization, but in different ways. The dedicated service partition concept is similar to the isolated device domain (IDD) used in Xen 2.0 [21]. The virtual device drivers in Xen 2.0 use a special driver interface that communicates with the IDD. However, the virtual or disguised device driver in the user OS of AVMM can be the regular driver and does not require changes. Furthermore, the service OS in AVMM supports value-added services, in addition to functioning as an I/O VM. Our idea is similar to Intel LVMM [43], but we implement our concept on a new platform with a new

51

service OS. This feature enables AVMM to offer more distinct and detailed techniques. Several direct I/O access methods have been proposed, but they require the use of specialized hardware that enables direct guest access [44], [45]. These techniques require hardware support that is unavailable in commodity I/O devices. Other methods of direct access have implemented in Xen [46] and KVM [47] to enable a guest OS to directly access commodity devices by leveraging the newly emerged hardware IOMMU support [48], [49]. AVMM implements direct access I/O in a different manner in that it does not rely on any special hardware. It is also implemented in an asymmetric partitioning-based bare-metal scenario. Direct I/O device access may also causing sharing, isolation, and live migration problems. Recently developed hardware IOMMU features [48], [49] and related implementation approaches [50], [51] have addressed these issues. These techniques can also be exploited in AVMM to improve its security and usability. AVMM exploits proportional-share resource scheduling [52], [53] in CPU management. However, most existing algorithms use a fixed weight in assigning resources. They are unsuitable for our system. To resolve this drawback, we adopt two simple heuristics specific to AVMM. The heuristics improve AVMM flexibility while simplifying implementation. Shadow PT is a common memory virtualization technique employed in numerous virtualization approaches. Examples include the VMware ESX server [25] and Xen HVM [54]. We extend shadow PT by page faulting corresponding operations, so that it can handle memory-mapped I/O for device discovery, allocation, and virtualization. The core idea of asymmetric partitioning is introduced in our paper published at INFOCOM 2011 Workshop [55]. Given space limitations, that paper does not comprehensively discuss the fundamental idea and principles of asymmetric partitioning. In this work, we present the basic rationale and exhaustively discuss several techniques. We also carry out more extensive experiments to show the improvement in performance achieved by AVMM.

7

CONCLUSION

We develop AVMM, a bare-metal client-side VMM, for desktop virtualization. In addition to the performance benefits from the bare-metal mechanism, near-native performance is realized by AVMM by assigning a set of peripheral devices (e.g., graphics) for exclusive use by a single user OS, as well as by leveraging more efficient resource management and sharing mechanisms among the user OS, service OS, and AVMM. New enhanced features (e.g., management and security) can also be easily implemented in the service OS. We implement a preliminary prototype based on an Intel VT-enabled platform, and evaluate AVMM against micro- and application-level benchmarks. The results show that AVMM achieves a performance comparable to that of a native client and better than that of existing VM approaches. Users can substantially benefit from the enhanced features implemented in the service OS of AVMM.

52


Our future research directions include extending memory management by supporting dynamic memory partitions, such as a ballooning technique [25]. The effects of dynamic CPU scheduling on the performance of user or service OSs will be explored. We also intend to implement more enhanced features and deploy AVMM in real-world scenarios.

[20] [21]

[22] [23]

ACKNOWLEDGMENTS This work was supported by the National High Technology Research and Development Program of China (grant no. 2011AA01A203), the National Natural Science Foundation of China (grant no. 60903029), and the Intel Corp. The authors would like to thank the EFI PSI team of the platform software innovation division at the Intel Shanghai Lab, especially Ming Wu, Hua Zhou, and Ju Lu for their helpful comments and fruitful discussions.

[24]

[25] [26]

[27]

REFERENCES [1]

[2] [3] [4] [5] [6] [7] [8] [9]

[10]

[11] [12] [13] [14] [15] [16] [17] [18] [19]

M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A View of Cloud Computing,” Comm. ACM, vol. 53, pp. 50-58, Apr. 2010. W. Zhou, G. Pierre, and C. Chi, “CloudTPS: Scalable Transactions for Web applications in the Cloud,” IEEE Trans. Services Computing, vol. 5, no. 4, pp. 525-539, Oct.-Dec. 2012. H. Ma, F. Bastani, I. Yen, and H. Mei, “QoS-Driven Service Composition with Reconfigurable Services,” IEEE Trans. Services Computing, vol. 6, no. 1, pp. 20-34, 2013. M. Rose, B. O’Donnell, and R. Perry, “Understanding the Business Value of Centralized Virtual Desktops,” white paper, IDC, Nov. 2009. Symantec, “Symantec Ghost Solution Suite 2.5: Powerful, Versatile, and Efficient PC and Lifecycle Management,” Data Sheet: Endpoint Management, 2010. “BMC BladeLogic Client Automation and Intel Core vPro Processors,” white paper, BMC Software, Inc., 2013. Symantec, “Norton AntiVirus,” http://us.norton.com/antivirus/, 2011. Z. Wang, X. Jiang, W. Cui, and P. Ning, “Countering Kernel Rootkits with Lightweight Hook Protection,” Proc. 16th ACM Conf. Computer and Comm. Security, pp. 545-554, 2009. A. Dinaburg, P. Royal, M. Sharif, and W. Lee, “Ether: Malware Analysis via Hardware Virtualization Extensions,” Proc. 15th ACM Conf. Computer and Comm. Security, pp. 51-62, 2008. X. Jiang, X. Wang, and D. Xu, “Stealthy Malware Detection through VMM-Based ‘Out-of-the-Box’ Semantic View Reconstruction,” Proc. 14th ACM Conf. Computer and Comm. Security, pp. 128-138, 2007. A.M. Azab, P. Ning, E.C. Sezer, and X. Zhang, “HIMA: A Hypervisor-Based Integrity Measurement Agent,” Proc. Ann. Computer Security Applications Conf., pp. 461-470, 2009. VMware, “Workstation User’s Manual: Workstation 6.5,” http:// www.vmware.com/pdf/ws65_manual.pdf, 2009. Xen Project, “Xen 3.1.4,” http://www.xen.org/download/index_ 3.1.4.html, 2013. Microsoft, “Windows Virtual PC,” http://www.microsoft.com/ windows/virtual-pc/default.aspx, 2011. “Desktop Virtualization Strategy: Choosing the Right Solution for Your Needs,” white paper, Microsoft Corp., 2008. Citrix, “XenDesktop 5,” http://www.citrix.com/virtualization/ desktop/xendesktop.html, 2011. VMware, “VMware View 4.5,” http://www.vmware.com/ products/view/resources.html, 2011. R.P. Goldberg, “Architecture of Virtual Machines,” Proc. Workshop Virtual Computer Systems, pp. 74-112, 1973. R. Uhlig, G. Neiger, D. Rodgers, A. Santoni, F. Martins, A. Anderson, S. Bennett, A. Kagi, F. Leung, and L. Smith, “Intel

[28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43]

[44] [45]

[46]

[47] [48]

VOL. 7,


Virtualization Technology,” IEEE Computer, vol. 38, no. 5, pp. 4856, May 2005. “Understanding Full Virtualization, Paravirtualization, and Hardware Assist,” white paper, VMware Inc., Sept. 2007. K. Fraser, S. Hand, I. Pratt, A. Warfield, R. Neugebauer, and M. Williamson, “Safe Hardware Access with the Xen Virtual Machine Monitor,” Proc. First Workshop Operating System and Architectural Support for the on Demand IT Infrastructure (OASIS ’04), Oct. 2004. “AMD-V Nested Paging,” white paper, AMD Inc., July 2008. A. Warfield, S. Hand, K. Fraser, and T. Deegan, “Facilitating the Development of Soft Devices,” Proc. USENIX Ann. Technical Conf., pp. 378-382, Apr. 2005. D.T. Meyer, G. Aggarwal, B. Cully, G. Lefebvre, M.J. Feeley, N.C. Hutchinson, and A. Warfield, “Parallax: Virtual Disks for Virtual Machines,” Proc. Third ACM SIGOPS/EuroSys European Conf. Computer Systems (EuroSys ’08), pp. 41-54, 2008. C.A. Waldspurger, “Memory Resource Management in Vmware ESX Server,” ACM SIGOPS Operating Systems Rev., vol. 36, pp. 181-194, 2002. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the Art of Virtualization,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP ’03), pp. 164-177, 2003. Intel, “Intel 64 and IA-32 Architectures Software Developer’s Manual: Vol. 3B: System Programming Guide, Part 2,” http:// download.intel.com/products/processor/manual/253669.pdf, Sept. 2009. “Advanced Configuration and Power Interface Specification, Revision 4.0a,” Hewlett-Packard Corp., Intel Corp., Microsoft Corp., Phoenix Technologies Ltd., Toshiba Corp., Apr. 2010. “PCI Firmware Specification, Revision 3.0,” PCI-SIG, June 2005. “Network Block Device,” http://nbd.sourceforge.net/, 2013. “openSUSE 11.1,” http://software.opensuse.org/111/en, 2013. “PerformanceTest—PC Benchmark Software,” http://www. passmark.com/products/pt.htm, 2011. LMbench, http://www.bitmover.com/lmbench/get_lmbench. html, 2013. D. Overclock, “How to Benchmark a Videocard,” http://www. motherboards.org/articles/guides/1278_1.html, 2009. “Counter-Strike,” http://www.counter-strike.com/, 2011. “Quake III Arena,” http://www.idsoftware.com/games/quake/ quake3-arena/, 2011. “Quake III Arena,” Wikipedia, http://en.wikipedia.org/wiki/ Quake_III_Arena#Graphics, 2011. Fraps, “Real-Time Video Capture & Benchmarking,” http:// www.fraps.com/, 2011. “Microsoft Direct3D,” Wikipedia, http://en.wikipedia.org/wiki/ Microsoft_Direct3D, 2011. “I-Bench,” ftp://ftp.pcmag.com/benchmarks/i-bench/, 2011. J. Nieh, S.J. Yang, and N. Novik, “Measuring Thin-Client Performance Using Slow-Motion Benchmarking,” ACM Trans. Computer Systems, vol. 21, no. 1, pp. 87-115, 2001. Iometer, “Introduction,” http://www.iometer.org/, 2013. A. Chobotaro, E. Eduri, S. Garg, L. Janz, C. Klotz, M. Ramachandran, R. Rappoport, N. Smith, J. Stanley, and M. Wood, “New Client Virtualization Usage Models Using Intel Virtualization Technology,” Intel Technology J., vol. 10, no. 3, pp. 205-216, Aug. 2006. J. Liu, W. Huang, B. Abali, and D.K. Panda, “High Performance VMM-Bypass I/O in Virtual Machines,” Proc. USENIX Ann. Technical Conf., pp. 29-42, 2006. H. Raj and K. Schwan, “High Performance and Scalable I/O Virtualization via Self-Virtualized Devices,” Proc. 16th Int’l Symp. High Performance Distributed Computing (HPDC ’07), pp. 179-188, 2007. M. Ben-Yehuda, J. Mason, O. Krieger, J. Xenidis, L.V. Doorn, A. Mallick, J. Nakajima, and E. Wahlig, “Utilizing Iommus for Virtualization in Linux and Xen,” Proc. Ottawa Linux Symp. (OLS ’06), pp. 29-42, 2006. B.-A. Yassour, M. Ben-Yehuda, and O. Wasserman, “Direct Device Assignment for Untrusted Fully-Virtualized Virtual Machines,” Research Report H-0263, IBM, 2008. D. Abramson, J. Jackson, S. Muthrasanallur, G. Neiger, G. Regnier, R. Sankaran, I. Schoinas, R. Uhlig, B. Vembu, and J. Wiegert, “Intel Virtualization Technology for Directed I/O,” Intel Technology J., vol. 10, no. 3, pp. 179-191, Aug. 2006.


[49] “AMD I/O Virtualization Technology (IOMMU) Specification,” white paper, AMD Inc., Feb. 2009. [50] L. Xia, J. Lange, P. Dinda, and C. Bae, “Investigating Virtual Passthrough I/O on Commodity Devices,” SIGOPS Operating Systems Rev., vol. 43, no. 3, pp. 83-94, 2009. [51] A. Kadav and M.M. Swift, “Live Migration of Direct-Access Devices,” SIGOPS Operating Systems Rev., vol. 43, no. 3, pp. 95-104, 2009. [52] G. Banga, P. Druschel, and J.C. Mogul, “Resource Containers: A New Facility for Resource Management in Server Systems,” Proc. Third Symp. Operating Systems Design and Implementation, pp. 1-11, Feb. 1999. [53] A. Chandra and P. Shenoy, “Hierarchical Scheduling for Symmetric Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 3, pp. 418-431, Mar. 2008. [54] I. Pratt, D. Magenheimer, H. Blanchard, J. Xenidis, J. Nakajima, and A. Liguori, “The Ongoing Evolution of Xen,” Proc. Linux Symp., vol. 2, pp. 255-266, July 2006. [55] Y. Zhou, Y. Zhang, H. Liu, and N. Xiong, “AVMM: Virtualize Network Client with a Bare-Metal and Asymmetric Partitioning Approach,” Proc. IEEE INFOCOM Workshop Cloud Computing, pp. 653-658, Apr. 2011. Yuezhi Zhou received the PhD degree in computer science and technology from Tsinghua University, China, in 2004, where he is currently an associate professor in the same department. He worked as a visiting scientist in the Department of Computer Science at Carnegie Mellon University in 2005. His research interests include cloud computing, distributed systems, mobile device and systems. He has published more than 60 technical papers in international journals and conferences. He received the IEEE Best Paper Award at the IEEE AINA International Conference in 2007. He is a member of the IEEE and the ACM. Yaoxue Zhang received the PhD degree in computer networking from Tohoku University, Japan, in 1989. He was a visiting professor at the Massachusetts Institute of Technology (MIT) and the University of Aizu in 1995 and 1998, respectively. Currently, he is a professor in the Department of Computer Science and Technology, Tsinghua University, and president of Central South University, China. He serves as an editorial board member of four international journals. His major research areas include computer networking, operating systems, and ubiquitous/pervasive computing. He has led many national and international research, international collaboration, and industrialization projects. He has published more than 180 technical papers in international journals and conferences, as well as nine monographs and textbooks. He won the National Award for Scientific and Technological Progress (second place) in 1998 and 2001 and the National Award for Technological Invention (second place) in 2004, as well as five provincial or ministerial awards. In 2005, he won the Foundation for Scientific and Technological Progress Prize from the Holeung Ho Lee Foundation. He has filed 16 Chinese patents and three American patents.

53

Hao Liu received the BS degree in software engineering from the Beijing University of Aeronautics and Astronautics, China, in 2009. He is currently working toward the PhD degree in the Department of Computer Science and Technology, Tsinghua University, China. His research interests include distributed computing, highspeed transport protocols, and mobile cloud computing with smartphones.

Naixue Xiong received two PhD degrees, one in software engineering from Wuhan University and one in dependable networks from the Japan Advanced Institute of Science and Technology. He is currently a professor in the School of Computer Science, Colorado Technical University. His research interests include security and dependability, cloud computing, network architecture, and optimization theory. He has published approximately 100 journal papers. Some of his works were published in the IEEE Journal on Selected Areas of Communication, various IEEE/ACM journals, ACM Sigcomm Workshop, IEEE INFOCOM, and IPDPS. He has been a general chair, program chair, publicity chair, PC member, and OC member of more than 100 international conferences and a reviewer of about 100 international journals, including the IEEE Journal on Selected Areas of Communication; the IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics; the IEEE Transactions on Communications; the IEEE Transactions on Mobile Computing; and the IEEE Transactions on Parallel and Distributed Systems. He is serving as an associate editor or editorial board member for more than 10 international journals (including Information Science) and as a guest editor for more than 10 international journals, including the Sensor Journal, ACM Wireless Networks, and ACM/Springer Mobile Networks and Applications. He received the Best Paper Award at the 10th IEEE International Conference on High Performance Computing and Communications (HPCC ’08) and the Best Student Paper Award at the 28th North American Fuzzy Information Processing Society Annual Conference (NAFIPS ’09). He is a member of the IEEE, the IEEE ISATC, the IEEE TCPP, and the IEEE TCSC, as well as a chair of “Trusted Cloud Computing” in the IEEE Computational Intelligence Society. Athanasios V. Vasilakos is currently a professor in the Department of Computer and Telecommunications Engineering, University of Western Macedonia, Greece, and a visiting professor at the National Technical University of Athens (NTUA), Greece. He has authored or coauthored more than 200 technical papers in major international journals and conferences. He is author/coauthor of five books and 20 book chapters in the areas of communications. He has served as general chair and technical program committee chair for many international conferences. He has served or is serving as an editor and/or guest editor for many technical journals, such as the IEEE Transactions on Network and Services Management; IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics; IEEE Transactions on Information Technology in Biomedicine; the May 2009, January 2011, and March 2011 special issues of the IEEE Journal on Selected Areas of Communication; the IEEE Communications Magazine; ACM Wireless Networks; and ACM/Springer Mobile Networks and Applications. He is the founding editor-in-chief of the International Journal of Adaptive and Autonomous Communications Systems and the International Journal of Arts and Technology. He is chairman of the SIB Computing of the European Alliances for Innovation. He is a senior member of the IEEE.

A Bare-Metal and Asymmetric Partitioning Approach to ... - IEEE Xplore

A Bare-Metal and Asymmetric Partitioning Approach to ... - IEEE Xplore

Suggest Documents

An Orthogonal Partitioning Approach to Simplify Robust ... - IEEE Xplore

A Network Partitioning Algorithmic Approach for ... - IEEE Xplore

A Time Domain Passivity Approach for Asymmetric ... - IEEE Xplore

Coherency Detection and Network Partitioning ... - IEEE Xplore

A Visual Approach - IEEE Xplore

Resource partitioning and asymmetric competition

A New Approach to Measure and Control Grounding ... - IEEE Xplore

A Simplified Mathematical Approach to Model and ... - IEEE Xplore

A statistical approach to learning and generalization in ... - IEEE Xplore

A Practical Approach to Web Service Discovery and ... - IEEE Xplore

A General Approach to Sizing and Power Density ... - IEEE Xplore

A Descriptive Bayesian Approach to Modeling and ... - IEEE Xplore

A statistical approach to learning and generalization in ... - IEEE Xplore

Partitioning Kinetic Energy During Freewheeling ... - IEEE Xplore

A New Approach to Blog Information Searching and ... - IEEE Xplore

A Disc-based Approach to Data Summarization and ... - IEEE Xplore

Asymmetric-Circular Shaped Slotted Microstrip ... - IEEE Xplore

"new approach" to production - IEEE Xplore

AN ONLINE LEARNING APPROACH TO ... - IEEE Xplore

the ofdm-idma approach to - IEEE Xplore

A biomimetic approach to mobility distribution for a ... - IEEE Xplore

A Genetic Algorithm Approach To A General Category ... - IEEE Xplore

A Neural Network Approach to a Cooperative Balancing ... - IEEE Xplore

A Genetic Algorithm Approach to Partitioning ...