The proposed approach provides platform, network and offline security. Data ... Data theft or accidental loss is prevented by encrypting virtual hard disks and by ...
1
TrustBox: A Security Architecture for Preventing Data Breaches Matthias Schmidt, Sascha Fahl, Roland Schwarzkopf, Bernd Freisleben Department of Mathematics and Computer Science, University of Marburg Hans-Meerwein-Str. 3, D-35032 Marburg, Germany {schmidtm, fahl, rschwarzkopf, freisleb}@informatik.uni-marburg.de
Abstract— In this paper, a novel approach to prevent accidental or deliberate data breaches is presented. The proposed approach provides platform, network and offline security. Data is categorized as sensitive or insensitive, and the corresponding applications are isolated by using virtualization technology. Data theft or accidental loss is prevented by encrypting virtual hard disks and by introducing a multi-lane network architecture. If no connection to a corporate network is available, an offline mode handles data transfer and encryption. Authentication is managed by applying a biometric feature vector in association with a smart card setup. The approach increases security without disrupting the everyday work routines of users. An implementation based on VirtualBox and JavaCard is presented. A performance evaluation of the critical components is provided.
I. I NTRODUCTION During the last years, data breaches have become a severe problem for companies and governments around the world. The reasons for data breaches are manifold [26], [25], [10], [23], [4]. They cause enormous damages [15], and thus adequate protection mechanisms need to be installed to prevent them. There is a large number of commercial hardware and software products offering encryption, access control and logging of sensitive data [13]. For example, TrueCrypt is a popular open source toolkit for data encryption, and there are USB sticks offering the same features at the device level. Normally, such solutions require the user’s interaction and therefore interrupt his or her accustomed workflow by requiring the following steps: The data for protection has to be selected (1), and it gets encrypted (2) by entering and memorizing a password for encryption and decryption (3). For a later access, decryption has to be performed (4). This clearly requires additional effort from the user’s perspective and demonstrates that usability is important for any security infrastructure [9]. A proper data breach prevention solution that simultaneously allows offline work must be able to handle the following types of data losses: • Transfer data to an untrusted location either on the local hard disk or on a remote machine. • Transfer data to an untrusted removable device like an USB stick or an external hard drive. • Copy and paste sensitive data to other applications, emails or instant messaging applications. To prevent accidental or deliberate loss of sensitive data effectively, a novel approach called TrustBox using adequate
system platform and network security concepts is introduced in this paper. Accessing sensitive data is only permitted from within a platform-independent virtual machine (VM) that is hardened by a set of security policies. It will be called the trusted virtual machine in the remainder of this paper. Examples of applications running in this VM are the Microsoft Office suite and an email client. Other applications like browsers or instant messengers can be used to (un)deliberately leak data. Furthermore, due to the direct Internet access, these applications are particularly dangerous since potential security vulnerabilities can be misused by malicious individuals. As a consequence, they must not be permitted to access sensitive data and thus are executed in a separate virtual machine called untrusted virtual machine. The sensitive data is stored on a secured back end device like a Storage Area Network (SAN) and accessed from within the trusted VM through an encrypted and authenticated connection that currently is an IPSec network tunnel. By preventing the connection of hardware such as external hard disks, data transfer is restricted to well defined secure channels. Furthermore, it is not possible to copy and paste data from the trusted VM to the untrusted VM, but this is possible the other way round. Apart from preventing data breaches while the user is connected to a corporate network (either directly or over a secure, remote connection), TrustBox also provides data breach protection if a user is not connected to a network at all, but still needs access to sensitive data. By introducing a special ticketing service combined with biometrical authentication, smart card technologies and hard disk encryption, our approach supports this offline usage scenario without endangering the data. Thus, TrustBox implements the following features for data protection without interrupting a user’s usual workflow: • Sensitive data is not leaked accidentally or deliberately by using multiple virtual machines to separate sensitive data from potentially dangerous applications. All communication channels, e.g. network or copy and paste, are subject to adequate enforcement mechanisms. • Access to sensitive data is granted by a trustworthy authentication scheme. Depending on the usage scenario, this is either done by a local enforcement entity or by using a smart card with biometric authentication. • The daily workflow of a user is not disturbed by the provided security mechanisms.
2
An implementation based on the VirtualBox virtualization technology and JavaCard is presented. A performance evaluation of the critical components is provided. The paper is organized as follows. Section II presents the proposed approach. Section III discusses implementation issues. Section IV shows results of performance measurements. Section V reviews related work. Section VI concludes the paper and outlines areas for future work. II. T HE T RUST B OX A RCHITECTURE In this section, the threat model, the architecture and the components of TrustBox as shown in Figure 1 are presented. The combination of existing technologies together with the implemented enforcement policies and the offline mode represent a novel way to deal with data breaches. antivirus Internet noXc X X onn e
firewall
ction
trusted VM Guest OS
A2
sensitive data network storage
A1
policy enforcement
X X
XXX
clearing instance
no network
VM1 VM2
IPSec
C&P
domain controller
quarantine network storage
untrusted VM dpi
Guest OS A3
IPSec
insensitive data network storage
Host OS (Windows)
Fig. 1.
Architectural view on the platform and networking aspects
A. Threat Model and Assumptions The proposed approach is based on the standard assumptions of most other virtualization security architectures [5], [6]. The Virtual Machine Monitor (or hypervisor) is part of the trusted computing base. Since we focus on the prevention of data breaches, the paper does not deal with attacks against the Virtual Machine Monitor. We are not using any special host operating system (i.e. a trusted Linux), but instead a commodity one. The required security level is reached by preventing direct Internet access, stopping unnecessary services and applying restrictive security policies. These restrictions are possible, because the user does not work with the host operating system directly. The user has a central role in our threat model, because his or her everyday workflow might put sensitive data at risk. By using encrypted virtual hard disks, enforcing data storage on a server, preventing the use of portable storage media and checking outgoing email at the clearing instance, the user is prevented from putting the data at risk. Additionally, the use of a ticketing system with an audit of checked out files might stop malicious users from trying to steal data. Another threat to sensitive data is malware that infects a computer via the Internet. By executing applications accessing the Internet inside the untrusted VM, data leakages caused by malware can be prevented.
B. Platform Security The applied platform model divides user applications into two classes. The first class represents all applications that are allowed to access sensitive data, such as word processors, email clients or spreadsheet applications. The second class contains potentially dangerous applications, such as web browsers. Additionally, data is classified as sensitive or insensitive. Two different virtual machines are rolled out to the physical host. The trusted VM operates on sensitive data, while the untrusted VM works on insensitive data only. Our definition of sensitive data does not conflict with existing access control models, nor does it replace them. The user is only allowed to store data on the VM’s associated network storage, but not on the VM’s local hard disk. Since it might be necessary to reclassify insensitive data as sensitive data, our approach also includes a quarantine storage. This storage does antivirus detection to prevent malware from infecting applications inside the trusted VM and eventually accessing sensitive data. User applications are not running on the host operating system, but in their designated VM. The VirtualBox seamless mode completely hides the presence of VMs, thus application separation is invisible to the user. Thus, the presented platform security approach allows applications accessing sensitive and insensitive data to run concurrently. 1) User-visible Restrictions: Security measures need to be non-disruptive, to prevent a user from trying to circumvent them [9]. Thus, user-visible restrictions have to be limited to a minimum. An action that is often responsible for leaked data is copy and paste. Since it is one of the most useful operations, it is not a possibility to forbid it. By utilizing VM technology, TrustBox can monitor and permit copy and paste based on its direction of movement. It is permitted to copy data between applications inside a VM and from the untrusted to the trusted VM, but not the other way around. Steven stated in [1] that applications exists that cannot be strictly separated. Examples are a mail client and a web browser, i.e. the user can click on HTTP links in e-mail and on mailto links in a web browser. TrustBox solves this problem by using special URL handlers in both VMs, that pass those URLs to the adequate VM. 2) Virtual Hard Disk Encryption and Key Management: Most data breach prevention solutions only work well if a user’s machine is physically connected to a company network. TrustBox tries to avoid this limitation by locally storing the required VMs, but encrypting the VM’s hard disks on the fly. The encryption algorithm used is the industry standard AES-128, which was implemented directly into the low level functions of the VM software. For key management, a SmartCard serves as a secure container for storing the symmetric encryption key. An authorized end user is authenticated by verifying a biometric feature vector manageable by a SmartCard [18]. In this case, the human fingerprint with its good false acceptance and false rejection [27] rates has been chosen. 3) Screen Captures: Screen capturing is one of the common methods to circumvent copy restrictions put on sensitive data. Using the Print Screen key is the most straightforward
3
approach. Other options are the Snipping Tool (built in by default in Windows Vista) or one of the many third party applications. Considering our architecture, there are three sources of screenshots: the trusted VM, the untrusted VM, and the host OS itself. While screenshots made in the trusted VM can contain sensitive information, they can not leave the trusted VM, neither as as file nor via copy and paste. Screenshots made in the untrusted VM can not contain sensitive information by definition. The critical location is the host OS itself, because screenshots made on the host OS can contain windows from the trusted VM and therefore sensitive data. While there is no way that such screenshots could leave the physical machine under normal conditions (no network connections, no removable devices, copy and paste to the untrusted VM not permitted), once they are stored on the harddisk they might be accessed using another operating system. Currently, this is an open problem [28].
III. I MPLEMENTATION I SSUES We have implemented TrustBox based on Microsoft Windows as the host and guest operating system since it is the most widely deployed end user operating system. VirtualBox Open Source Edition is used as the platform virtualization technology. A. Virtual Hard Disk Encryption Encrypting virtual hard disks is achieved by modifying the disk’s low level storage driver at the HDD core level. Read and write operations were extended by encryption and decryption functionality using the OpenSSL library. The described symmetric key management was implemented with the help of an Open Operating System smart card - namely Sun’s JavaCard framework [21] - acting as a secure token device. User authentication is done with the help of a biometrical authentication scheme offered by the JavaCard: the user’s fingerprint.
C. Network Security To prevent data leaks over the network, a special network security solution has been developed. Figure 1 shows the fundamental network setup. The trusted VM handling sensitive data is neither connected to the physical host nor to the untrusted VM. For being able to access the protected data, an IPSec line between the trusted VM and the sensitive data network storage is established. To transfer data from the trusted VM to an untrusted network (e.g. the Internet), all the traffic is sent to the policy enforcement instance. This policy enforcement instance includes a deep packet inspection module to control the traffic in and out of a corporate network. In addition, a semi-automatic clearing instance is installed to review any data to send out of the corporate network. Additionally, the untrusted VM is isolated from the physical host and contains explicitly insensitive data only, thus preventing untrusted applications from accessing sensitive data. The untrusted VM’s Internet access is also filtered through the policy enforcement instance to allow the deep packet inspection instance to check outgoing data. Finally, the host operating system is only connected to the corporation’s domain controller for automated installation of security updates and administrative tasks, but not to any of the network storages. D. Offline Security Encrypted VM hard disks and a strong authentication scheme are a good base for supporting offline scenarios. While TrustBox usually stores data remotely, the offline scenario requires the virtual machine’s hard disk to be treated as a secure data storage temporarily. An Offline Treatment Ticket (OTT) has to be filed, resulting in the requested files to be checked out from the secure network storage and afterwards synced backed and securely deleted from the local hard disk. Encrypting virtual machine hard disks and choosing a strong authentication scheme both support offline scenarios in the case a user does not have access to the company’s network.
B. Offline Security The Offline Treatment Ticket service was implemented as a Java Server Pages based web application. Files are checked out by the user via his or her browser secured with HTTPS and after being authenticated by an Active Directory user database. The following steps realize a file checkout: 1) An OTT containing the authorized user, the requested file(s) and the timeframe is sent by the user. 2) The OTT is stored in an accounting database. 3) The Windows Active Directory service is instructed to set up the local working environment: create a local, restricted user, set up local policies and sync the home directory with the help of a Windows Batch script. 4) The requested files/directories are pushed to a local directory within the home directory. 5) A Windows Batch script is activated that is executed the next time the user logs in to the company’s network and revokes the last two steps. IV. E XPERIMENTAL R ESULTS In this section, we present the results of measurements investigating several aspects of our proposal. There are two aspects that are important for TrustBox: (1) The overhead introduced by hard disk encryption of the virtual machine and (2) the startup time of trusted and untrusted applications. The experiments were conducted on a typical desktop machine with an Intel Pentium IV 3 GHz processor, 2 GB of RAM, a 200 GB hard disk drive. The host and the operating system installed within the virtual machines was a Windows XP SP3. The VMM was VirtualBox 3.0.10 OSE with our patches, and both VMs have one virtual CPU and 512 MB of RAM. A number of measurements was performed to determine the encryption overhead using IoMeter [14]. The results are shown in Table I. We measured the data rate (in MB/s and I/O operations per second) as well as the data throughput with an encrypted and an unencrypted VM.
4
Encrypted Unencrypted
Throughput I/O Ops 660.7 1162.4
MBps (r/w) 41.3 72.6
Rate I/O Ops 1870.1 4303.0
MBps (r/w) 0.9 2.2
TABLE I R ESULTS OF THE IO
BENCHMARK
At first sight, the results seem disappointing, especially for the data throughput case. Since the user is connected to a remote storage most of the time, data throughput is not as critical as it seems to be. Only in the offline case, the user might notice a delay when operating on large files, compared to a native workstation. A more sophisticated solution compared to OpenSSL, like the optimized Assembler version of the XTSAES algorithm developed by TrueCrypt could be applied to improve performance ranging from 30-140% [24]. To sum it up, the overhead is acceptable if the gain of security due to the full disk encryption is considered. As stated in Section II, all applications are started inside a virtual machine, thus they have slightly longer startup times than applications started on the physical machine. This is because we cannot simply start a process on the local operating system; a remote invocation originating from the host operating system (precisely the user clicking on an icon) into the appropriate virtual machine is needed. Therefore, the icon is linked to a small piece of software that opens a network connection into the virtual machine to start the application. To quantify the overhead, we started a typical application, here Microsoft Excel, locally and virtualized in TrustBox. To achieve a robust mean, 100 trials were performed. The average starting time for local execution is 671 milliseconds and the average time for virtualized execution is 1104 milliseconds. Thus, the difference is about 0.4 seconds, which is negligible considering the gained security of executing applications inside the virtual machine. V. R ELATED W ORK Cryptzone [3] offers some commercial products dealing with data security. This includes a wide range of encryption products, such as USB stick, hard disk and file encryption.In contrast to TrustBox, the Cryptzone offerings mostly deal with keeping sensitive data away from unapproved persons, but they do not deal with data accidentally stored at wrong places. Furthermore, TrustBox enforces a strict separation of sensitive and insensitive data. Microsoft Application Virtualization [11] (AppV) allows applications to be deployed in real-time to a client from an application server. It decouples applications from the operating system and enables them to run as network services. Since the focus of AppV is usability and resource consolidation and not security, it is not a direct competitor as it does not address file encryption, removable media or secure network access. Yu et al. [28] have presented the Display-Only File Server (DOFS) to prevent insiders from stealing sensitive data. Their approach is based on a client-server solution where all content is located at the DOFS server. Access is only granted through a
special DOFS client that opens an encrypted and authenticated connection to the server. The applications needed to perform operations on the data are also run on the server and displayed on the client using a VNC connection. This solution requires high bandwidth networking access, offers worse performance when deployed over the Internet and does not support the offline scenario. Qubes [17] implements a Security by Isolation approach. To do this, Qubes utilizes virtualization technology, to isolate programs from each other, and sandbox many system-level components, like networking or storage subsystem, so that their compromise dont affect the integrity of the rest of the system. While TrustBox focuses on data breach prevention, Qubes focuses on so called AppVMs, i.e. specialized VMs for severals types of applications (personal, work, shopping etc.). As a consequence Qubes does not support an offline mode for local file storage including policy enforcement when the local user is not connected to a corporate network. Furthermore the proposed architecture does not include a clearing instance to check and reclassify sensitive data. An approach for secure document workflows based on trusted virtual domains has been presented by Gasmi et al. [7]. The authors describe an enterprise digital right management (ERM) system that allows to establish isolated execution environments spanning over virtual entities. In contrast to TrustBox, their approach is a prototypical implementation based on the L4-microkernel. Furthermore, splitting up a document in multiple parts like proposed in their paper based on different access permissions and being able to remerge the document after editing is neither reliably possible nor implemented in today’s systems. The NetTop system [16] is based on the VMware virtual machine monitor. It uses several virtual machines containing the user’s operating system and virtual machines containing encryption and filtering services. The base operating system is a trusted Linux while VMware is used for hosting the guest operating system. The system can provide multiple security levels based on multiple virtual machines. The issues using a trusted Linux distribution have already been discussed in Section II-A. All data is saved on the local, encrypted hard disk contrary to TrustBox where data storage in a safe remote location is enforced by system policies transparent to the user. The only exception to this policy is the offline mode, where the user has no network connection. In this case, the user has to file a ticket to receive a copy the data, thus the system has a record of all data being on (encrypted) local storage. Borders et al. [2] have presented an approach to protect confidential data on personal computers with Storage Capsules. A Storage Capsule is analogous to encrypted file containers from the user’s perspective. While the introduced Storage Capsules provide a convenient way to safely store data on a possibly compromised system, they do not deal with the problems stated in this paper. One of our goals is to introduce security while maintaining usability. Their approach is based on the secure mode, meaning that the system is not interruptible. This could lead to various problems with newer applications, such as Web 2.0 portals, and it is not possible to work with sensitive and non-sensitive data at the same time. Thus, it
5
might be necessary to switch back and forth between secure and insecure mode, creating additional overhead due to VM operations and restoring the previous state. Finally, Borders et al. consider the user as trustworthy, which is contrary to our assumption. TrustBox is based on virtual machine technology as part of the trusted computing base. Ormandy [22] presented an empirical study about the security of various virtual machine monitors. He concludes that none of the tested environments withstand his testing procedure and that further security research is needed. While this research states that virtual machines are not a security panacea, they are less complex in terms of code and have a decent security history [19], [20]. While the Linux 2.6 kernel had 38 vulnerability reports in 2009, Xen only had a single one. Since our approach is based on VirtualBox and not on Xen, these numbers cannot directly be used for comparison purposes. To the best of our knowledge, a dedicated study on VirtualBox security does not exist. Griffin et al. [8] have presented the concept of Trusted Virtual Domains (TVD). A TVD contains a mutually-trusted computing base (a TPM chip), a virtual environment and execution entities that represent applications. While their approach uses TPM, it does not tackle all the real-life problems covered in this paper, e.g. copy & paste, email submission etc. VI. C ONCLUSIONS In this paper, a novel approach to prevent data breaches has been presented. The approach provides platform, network and offline security. To protect sensitive data from being leaked, platform virtualization techniques have been used. This enables fine-grained control over various actions including file transfer and copy and paste. Extracting sensitive data from a stopped or crashed virtual machine is made impossible due to the introduction of full hard disk encryption. A dual-laned network architecture strictly separates trusted from untrusted data channels. Access to an outside network, is granted by a special clearing instance. To facilitate offline usage scenarios, several techniques including secure authentication with biometric fingerprints and smart cards have been implemented. The combination of the presented techniques decreases the likelihood of accidental or deliberate data breaches without harming the usability of the system. There are several areas of future work. Currently, OpenSSL is utilized for hard disk encryption, which could be replaced with a faster solution (like the XTS-AES implementation of TrueCrypt). Furthermore, dealing with screenshot prevention is an important topic and still an open problem. Technologies like the Protected Media Path (PMP) [12] deployed in Windows Vista to protect multimedia data could lead to promising results. VII. ACKNOWLEDGEMENTS This work is partly supported by the German Ministry of Education and Research (BMBF) (D-Grid and HPC Initiative). The authors want to thank Katharina Haselhorst for providing help with some implementation details.
R EFERENCES [1] S. M. Bellovin. Virtual machines, virtual security? Communications of the ACM, 49(10):104, 2006. [2] K. Borders, E. V. Weele, B. Lau, and A. Prakash. Protecting Confidential Data on Personal Computers with Storage Capsules. In Proceedings of the 18th USENIX Security Symposium, 2009. [3] Cryptzone. Cryptzone. http://www.cryptzone.com, October 2009. [4] T. N. Dunn. MoD data on 1m is missing. Sun, 10 October 2008. [5] T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh. Terra: a Virtual Machine-based Platform for Trusted Computing. ACM SIGOPS Operating Systems Review, 37(5):193–206, Jan 2003. [6] T. Garfinkel and M. Rosenblum. A Virtual Machine Introspection Based Architecture for Intrusion Detection. Proceedings of the 2003 Network and Distributed System Security Symposium, pages 191—206, Jan 2003. [7] Y. Gasmi, A.-R. Sadeghi, P. Stewin, M. Unger, M. Winandy, R. Husseiki, and C. St¨uble. Flexible and Secure Enterprise Rights Management based on Trusted Virtual Domains. Proceedings of the 3rd ACM Workshop on Scalable Trusted Computing, pages 71–80, Jan 2008. [8] J. L. Griffin, T. Jaeger, R. Perez, R. Sailer, L. V. Doorn, and R. Cceres. Trusted Virtual Domains: Toward Secure Distributed Services. In In Proc. of the First Workshop on Hot Topics in System Dependability (Hotdep05). IEEE Press, 2005. [9] P. Gutmann and I. Grigg. Security Usability. IEEE Security and Privacy, 3:56–58, Aug 2005. [10] C. Mellor. Nuclear weapons data leak from Los Alamos. Techworld, 26 October 2006. [11] Microsoft. Microsoft Application Virtualization 4.5. http://www.microsoft.com/systemcenter/ appv/default.mspx, October 2009. [12] Microsoft Developers. Protected Media Path. http://msdn.microsoft.com/en-us/library/aa3768462010. [13] R. Mogull. Top Five Steps to Prevent Data Loss and Information Leaks. Gartner Research Publication, 12 July 2006. [14] Open Source Development Labs. Iometer I/O Subsystem Benchmark. http://www.iometer.org/, October 2009. [15] Ponemon Institute LLC. 2008 Annual Study: Cost of a Data Breach. http://www.encryptionreports.com/costofdatabreach.html, February 2009. [16] Robert Meushaw and Donald Simard. NetTop. Commercial Technology in High Assurance Applications. In Tech Trend Notes. Preview of Tomorrows Information Technologies, volume 9, pages 1–9, Fall 2000. [17] J. Rutkowska and R. Wojtczuk. Qubes OS Architecture. http://qubesos.org/files/doc/arch-spec-0.3.pdf, 2010. [18] R. Sanchez-Reillo. Securing Information and Operations in a Smart Card through Biometrics. IEEE Aerospace and Electronic Systems Magazine, 16:3–6, April 2001. [19] Secunia. Vulnerability Report: Linux Kernel 2.6.x. http://secunia.com/advisories/product/2719, 2009. [20] Secunia. Vulnerability Report: Xen 3.x. http://secunia.com/advisories/product/15863, 2009. [21] Sun Microsystems. Runtime Environment, Java Card Platform, Version 3.0 Connected Edition. Technical report, Sun Microsystems, March 2008. [22] Tavis Ormandy. An Empirical Study into the Security Exposure to Hosts of Hostile Virtualized Environments. In Proceedings of CanSecWest Applied Security Conference, 2007. [23] Transport Security Administration (TSA). Employee Data Security Incident. http://www.tsa.gov/press/happenings/050407 statement.shtm, March 2007. [24] TrueCrypt Developers. TrueCrypt changelog for version 5.1. http://www.truecrypt.org/docs/?s=version-history, 2010. [25] J. Vijayan. Classified U.S. military info, corporate data available over P2P. Computerworld, 25 July 2007. [26] J. Vijayan. Classified data on president’s helicopter leaked via P2P, found on Iranian computer. Computerworld, 02 March 2009. [27] A. C. Weaver. Biometric Authentication. Computer, 39:96–97, 2006. [28] Y. Yu and T. cker Chiueh. Display-Only File Server: A Solution against Information Theft due to Insider Attack. Proceedings of the 4th ACM workshop on Digital Rights Management, pages 31–39, Jan 2004.