Oct 23, 2012 - run COARA with real Android applications on a modern smartphone to prove .... memory, such as IBM's X10 programming language [13], have ...
Code Offloading on Android with RMI and AspectJ (COARA) אנדרואיד,העברת קוד בזמן ריצה תוך שימוש בג'אווה ואספקטים Research Proposal by Nir Hauser Supervised by Roy Friedman October 23, 2012
Abstract Smartphones suffer from limited computational capabilities and battery life. A method to mitigate these problems is code offloading: executing certain methods on a remote server. Our proposal introduces COARA, a middleware for code offloading on Android that leverages RMI and Aspect-Oriented programming. In this proposal, we aim to improve on previous approaches by using only open source libraries and industry standard Java extensions rather than manipulating bytecode. COARA requires minimal changes to application source code. However it provides the application developer the option of providing metadata in order to improve performance, as well as alternative implementations of methods executed on a remote server. Research has shown that minimizing the size of the application state that is sent over the wire vastly improves performance by allowing more methods to be offloaded. We propose introducing the principle of lazy loading with virtual proxies to mobile offloading. This allows offloaded methods to execute without requiring that the full state of the application to be transferred. If the method accesses an object that has not been transferred, COARA will automatically retrieve the object from the mobile device. We will run COARA with real Android applications on a modern smartphone to prove that our novel state transfer techniques improve performance and battery life.
1. Introduction Mobile computing is quickly becoming the prominent computing and communication platform. Smartphones have become ubiquitous, while mobile applications are increasing in complexity. However, smartphone computational capabilities are nowhere near those of
desktops and laptops. Smartphones suffer from limited battery life, as hardware energy consumption continues to grow. This challenge has been addressed in the past using the client-server model, where the developer is responsible for writing code that issues requests to an API on a server in order to minimize computation on the mobile device. The downside of this approach is that the responsibility is on the application developer to decide what computations should be sent to a remote location and when. Rather than focusing on the application, the developer must invest resources in network communications development. Another approach, as seen in CloneCloud [1], has been to migrate the VM on which the application is running in order to limit the impact on the developer. The downside of this approach is the overhead required to migrate the entire VM from the mobile device to the server. The MAUI [2] architecture combines these two approaches by providing the capability of fine-grade code offload while minimizing the changes required to applications. MAUI modifies the .NET bytecode, which makes source code debugging more difficult. Our proposed approach is similar to MAUI in that it provides fine-grained code offload, with minimal code changes. Our middleware would run on the Dalvik VM and use an open source lightweight RMI-like library to offload the application bytecode and state of the application. Rather than manipulate the Java bytecode ourselves, we plan to leverage AspectJ to perform the offload. We believe that use of open source libraries and industry standard Java extensions is preferable to custom bytecode manipulations. We have seen in MAUI that one of the primary challenges in code offloading is minimizing the size of the application state that needs to be offloaded. We will use novel techniques that will not require the entire state of the application to be sent to the server in order for the computation to begin processing.
2. Related Work FarGo [11] is a system that enables distributed applications to maintain a separation between application logic and the layout of components on the underlying physical system. FarGo allows components known as complets to be relocated based on events at runtime. In order to accomplish this, FarGo extends Java and leverages Java RMI to communicate between JVMs. While FarGo is able to decouple application logic from layout logic, the system is tightly coupled with the application itself. FarGo requires that existing applications be rewritten to adhere to the Fargo programming model. Early attempts at offloading like Spectra [3] and Chroma [4] focused on partitioning the application into modules and calculating the optimal offloading strategy. These partitioning schemes require significant modification of the original application. The MAUI [2] architecture provides the capability of fine-grade code offload while minimizing the changes required to applications. It shows that minimizing the size of the application state that is sent over the wire vastly improves performance by allowing more methods to be offloaded. MAUI solves this problem by only sending the difference in
application state on subsequent method call invocations. However, MAUI serializes the application state using XML which is many times slower than binary serialization and results in data that is many times as large [9]. A limitation of MAUI is that it requires the entire server version of the application be on the server beforehand. In addition, MAUI modifies the .NET bytecode and creates two separate code bases – one for the mobile device and one for the server. This makes debugging more difficult and acts as a “black box” for the application developer. CloneCloud’s [1] approach has been to migrate the VM on which the application is running. It migrates the Dalvik VM from an Android phone to a backend server. Unlike other approaches, CloneCloud does not require intervention on the part of the developer for offloading since it happens at the OS level. It also has the ability to offload native method calls. The downside of this approach is the overhead required to migrate the entire VM from the mobile device to the server. In addition, the server must already be running a hardware simulator with the same configuration as the mobile device, further complicating matters. ThinkAir [5], like MAUI, provides method level code offloading but focuses more on scalability issues and parallel execution of offloaded tasks. It creates Virtual Machines (VM) on the cloud and can allocate multiple VMs to handle a single application. However the authors do not discuss how they merge the application states from multiple VMs back to the original application. The Cuckoo [6] framework leverages the existing activity/service model in Android. This model makes a separation between the computation intensive parts of the application and the rest of the application. This allows Cuckoo to offload the well-defined computation intensive portions of the applications. It requires the developer to write offloadable methods twice – once for local computations and once for remote computations. This allows flexibility but may lead to unnecessary code duplication. COCA [7] uses AspectJ to offload Android applications to the cloud. COCA is a prototype that accomplishes a subset of what we are setting out to do. COCA finds that the overhead incurred by AspectJ is minimal. However COCA only offloads pure functions which means that the state of the application is never transferred. It establishes that our idea is possible, but does not tackle the hard problem of state transfer which MAUI demonstrated was one of the key challenges of offloading. COCA also requires that the entire program be sent to the cloud before offloading can begin, rather than sending only the class files that are necessary. Calling the Cloud [8] is a middleware platform that can automatically distribute different layers of an application between the phone and the server, and optimize a variety of objective functions. However it requires the application to be partitioned into several inter-connected software modules using the R-OSGi module management tool. The authors expect that application developers could take up to a month to adapt their applications work their solution. COMET [12] is runtime system that performs offloading by leveraging Distributed Shared Memory (DSM). DSM provides a virtual shared memory space that is accessible by threads on different machines without any work on the part of the developer. The advantage of this approach is that fine-grained parallel algorithms can be offloaded to multiple machines,
resulting in improved performance. The downside of DSM is that developers may be unaware that a simple memory access can result in a network call. This could lead developers to unknowingly write inefficient applications that do not scale. Recent attempts at distributed memory, such as IBM’s X10 programming language [13], have forced the developer to be conscious of any remote calls in order to avoid this problem. For this reason, COARA forces the application developer to annotate offloadable methods. While COARA will not support offloading fine-grained parallel algorithms, it is possible to explore offloading coarse-grained parallel algorithms where each machine operates on a copy of shared memory but is oblivious of changes on other machines. An example would be performing facial recognition on multiple images in parallel.
3. Research Plan 3.1 Overview Our goal is to develop a middleware that automatically offloads method invocations from an Android application to the cloud. We aim to minimize the impact on the application developer so that resources can be invested in developing a quality application, not the hard work of offloading. While other solutions have involved modifying VMs or hacking bytecode, our solution would use contemporary technology, open source libraries, and industry standard Java extensions to achieve our goals. Our solution will be simple in its design and available as an open source project. This allows the developer the ability to use our solution out of the box with minimal configuration. However, it also allows the developer the flexibility to easily modify the middleware and contribute to the architecture. Finally, we plan to benchmark our solution's performance in order to show its viability.
3.2 Background 3.2.1 Android Android is a Linux-based operating system designed primarily for touchscreen mobile devices. Android applications are mostly developed in the Java language using the Android Software Development Kit and compiled to Java bytecode. They are then converted from Java Virtual Machine-compatible .class files to Dalvik-compatible .dex (Dalvik Executable) files that run on the Dalvik Virtual Machine. While not all Java bytecode can be translated to .dex, many Java applications are compatible out of the box, or require minor adjustments. This means we have the entire Java open source community at our disposal. 3.2.2 Aspect Oriented Programming (AOP) and AspectJ AOP is a programming paradigm that promotes modularity by allowing the separation of cross-cutting concerns. Concerns such as logging or security are often found across modules. Rather than scattering the code to address these concerns throughout the application, we handle them in special classes called aspects. These aspects can alter behavior in the base code. This is done by applying advice. An advice contains code that should be executed at a
given joinpoint, or location in the base code. The advice uses a query known as a pointcut to indentify which joinpoints should be intercepted by the advice. AspectJ is a widely used aspect-oriented extension for the Java programming language. It is open source and is integrated into Eclipse. AspectJ is a natural fit for architecture. Our primary cross-cutting concern is offloading the invocation of certain methods. Other solutions such as MAUI have modified the bytecode of applications to enable offloading. With AspectJ we simply create an OffloadingAspect that handles offloading. We identify which methods invocations we want to handle with the appropriate pointcut. The advice within OffloadingAspect performs the actual offloading. COCA has provided a simple prototype to show that this is indeed an efficient solution. We have been able to apply the AspectJ compiler to our base code and create java bytecode that runs successfully on the Dalvik VM. 3.2.3 Java Remote Method Invocation (RMI) Java RMI is a Java API that performs the object-oriented equivalent of remo te procedure calls (RPC). RMI provides the ability to make calls from one JVM to another. Upon a remote method invocation, all relevant Java objects are automatically serialized and sent to the remote JVM. This allows us to send the state of the application from the Android phone to the cloud using native Java binary serialization. MAUI [2] has shown that minimizing the size of the application state is critical. MAUI serializes to XML, however native binary serialization has been shown to be many times faster than XML and reduces the size of the serialized data. Since Dalvik does not support Java RMI, we will be leveraging an open source lightweight implementation of RMI called lipermi. Our prototype has successfully used a modified version of lipermi as part of our solution. COCA [7] did not attempt to tackle the issue of application transfer. We have been able to address it with RMI. 3.2.4 Java Annotations A Java Annotation is metadata that can be added to Java source code. We allow the developer to specify which methods should be offloaded with the use of annotations. The user simply specifies tags a method with the annotation @RemotableMethod and an optimization engine decides whether or not to offload the method at runtime.
3.3 Goals Our research goals are to minimize the amount of application state transfer, allow alternative method implementations for offloaded methods, enable tasks to be split up on multiple servers and run in parallel, and leverage static analysis techniques to optimize state transfer. 3.3.1 State Transfer MAUI [2] has shown that minimizing the size of the application state that is sent over the wire vastly improves performance by allowing more methods to be offloaded. Oftentimes in a Java application, the global state may be very large, yet only a small part of it is needed by offloaded methods. For example consider a chess game where a user plays against a computer opponent. The method nextMove() might be offloaded to the cloud to improve
performance. This application also stores a history of every past move. Assuming that nextMove() does not take history into consideration when deciding the next move, it is unnecessary to send the history to the cloud. In such an example, a naïve serialization engine might automatically include the global history in the state transfer. We propose introducing the principle of lazy loading with virtual proxies to mobile offloading. This allows offloaded methods to execute without requiring that the full state of the application to be transferred. If the method accesses an object that has not been transferred, COARA will automatically retrieve the object from the mobile device. An alternative solution would be an eager loading scheme. In this strategy, we would also send a partial application state upon invocation of an offloaded method. However in this case, rather than waiting for the server to request additional objects, we eagerly send the remaining application state asynchronously. This allows the remote machine to get a “head start” on the computation without waiting for the full application state to be transferred. However if the method attempts to access an object that has not yet been transferred, it will block and wait for the required object to arrive. The downside of this approach is that we may consume battery power to transfer unneeded information. It is possible to utilize both lazy and eager strategies in the same application. 3.3.2 Alternative Method Implementation Application fidelity has been defined as “the degree to which the results produced by an application match the highest quality results that would be produced given ample computational resources." [14] As an example, consider face recognition software. COARA can run a simple algorithm if the method is executing locally, or a computationally expensive algorithm that provides higher quality results if it is executing on a remote server. This increases application fidelity. COARA will enable application developers to designate alternative method implementations when executed on a remote server. We will implement this feature using minimally invasive Java Annotations. 3.3.3 Executing Tasks in Parallel on the Cloud COARA will support the execution of coarse-grained parallel algorithms on multiple servers on the cloud. Since such algorithms are coarse grained, their input can be split up on multiple servers with each server working independently of the others. As an example, consider a face recognition algorithm that operates on multiple images in parallel. The application developer can annotate the method parameter that contains an array of images to alert COARA that it can be split up. When the method execution is offloaded to the cloud, COARA will execute the method on several different servers, each processing a separate section of the image array. This improves performance by taking advantage of the parallel nature of the task. COARA will adopt a strategy to merge the application states on the servers to return a single state back to the mobile device. It must be noted that this will not work for fine-grained parallel algorithms where data is shared between threads. This is because COARA will not support a Distributed Shared Memory (DSM) scheme. Threads running on one server will not be aware of changes on another machines until the computation has completed. 3.3.4 Static Analysis
We will leverage static analysis techniques to optimize state transfer. Tools such as Soot [10] allow us to analyze program call graphs. For a given method call, we can determine whether it always, sometimes, or never accesses a given class. If it is always accessed, we always send it with the initial state transfer. If it is sometimes accessed, we transfer it in a lazy way. If it is never accessed, we never send it. Static analysis is done at compile-time and therefore does not affect runtime performance.
3.4 Performance Evaluation To evaluate COARA, we will experiment with several contemporary phone models and server platforms. The types of open source Android applications that we expect to benefit from COARA are face recognition, speech recognition, chess games, music identification, and others. If possible we will obtain applications used by other projects and compare our results to theirs. We will run the applications with and without COARA, and show the difference in performance and battery consumption. We will compare offloading over Wi-Fi, 3G, and 4G LTE. We will also analyze how much state is being transferred and demonstrate that our strategies are effective in minimizing state transfers. Finally, we will also show that by offloading we can increase fidelity, meaning that we can run more powerful algorithms that provide higher quality results without a loss in performance. We will describe what annotations were added to the original source code, and what other changes to the source code were made, if any.
4. Preliminary Results We have developed a working prototype of COARA to demonstrate the feasibility of these ideas. We have developed a middleware that can be applied to an Android application that allows certain methods to be offloaded to a server. This can be done without modifying the source code of our test application other than applying annotations. We designate methods that are candidates for offloading as @RemotableMethod. We designate classes that are candidates for lazy loading as @Lazy. We have built a mock optimizer that can be hard coded to decide when method invocations should be offloaded. 4.1 Pi Calculator In order to test COARA, we have written a simple Android application that computes Pi to 100,000 digits. We observed an 805% increase in performance by offloading a method from a Samsung Galaxy S2 smartphone to a MacBook Pro with 2.53 GHz Intel Core 2. The method that was offloaded accessed and modified the global state. We verified that the global state showed the correct data on the remote server, and that changes on the server were reflected on the smartphone. We also verified that our lazy loading technique worked as expected. When the offloaded method did not access objects whose class was marked by @Lazy, the objects were not transferred as part of the application state. However in cases where those objects were accessed, those objects were provided to the server on-demand. In future calls, those objects were automatically sent with the original transfer of application state.
4.2 0xbench 0xbench is an open source Android benchmarking tool. We observed a 24.2x speed up with COARA for 0xbench’s Linpack benchmark. Only minor modifications were made to the original source code in addition to annotations. 4.3 Pocket Chess for Android Pocket Chess for Android is an open source Android chess game. We observed a 5x to 10x speed up with COARA for the computation of the opponent’s next move. Without offloading, the opponent spent a noticeable amount of time thinking for each move. With offloading, the opponent’s moves appeared to be instantaneous. Only annotations were added to the original source code.
5. References [1] Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, Mayur Naik, and Ashwin Patti. 2011. CloneCloud: elastic execution between mobile device and cloud. In Proceedings of the sixth conference on Computer systems (EuroSys '11). ACM, New York, NY, USA, 301-314. DOI=10.1145/1966445.1966473 http://doi.acm.org/10.1145/1966445.1966473 [2] Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. 2010. MAUI: making smartphones last longer with code offload. In Proceedings of the 8th international conference on Mobile systems, applications, and services (MobiSys '10). ACM, New York, NY, USA, 49-62. DOI=10.1145/1814433.1814441 http://doi.acm.org/10.1145/1814433.1814441 [3] Jason Flinn, Dushyanth Narayanan, and M. Satyanarayanan. 2001. Self-Tuned Remote Execution for Pervasive Computing. In Proceedings of the Eighth Workshop on Hot Topics in Operating Systems (HOTOS '01). IEEE Computer Society, Washington, DC, USA, 61-. [4] Rajesh Krishna Balan, Mahadev Satyanarayanan, So Young Park, and Tadashi Okoshi. 2003. Tactics-based remote execution for mobile computing. In Proceedings of the 1st international conference on Mobile systems, applications and services (MobiSys '03). ACM, New York, NY, USA, 273-286. DOI=10.1145/1066116.1066125 http://doi.acm.org/10.1145/1066116.1066125 [5] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang. Thinkair: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading. In IEEE Infocom, 2012. [6] R. Kemp, N. Palmer, T. Kielmann, and H. Bal. Cuckoo: a Computation Offloading Framework for Smartphones. In MobiCASE '10: Proceedings of The Second International Conference on Mobile Computing, Applications, and Services, pp. 62-81, 2010. [7] H. Y. Chen, Y. H. Lin, and C. M. Cheng. 2012. COCA: Computation offload to clouds using AOP. In 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), May 2012, 466-473.
[8] Ioana Giurgiu, Oriana Riva, Dejan Juric, Ivan Krivulev, and Gustavo Alonso. 2009. Calling the cloud: enabling mobile phones as interfaces to cloud applications. In Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware (Middleware'09), Jean M. Bacon and Brian F. Cooper (Eds.). Springer-Verlag, Berlin, Heidelberg, 83-102. [9] Marjan Hericko, Matjaz B. Juric, Ivan Rozman, Simon Beloglavec, and Ales Zivkovic. 2003. Object serialization analysis and comparison in Java and .NET. SIGPLAN Not. 38, 8 (August 2003), 44-54. DOI=10.1145/944579.944589 http://doi.acm.org/10.1145/944579.944589 [10] http://www.sable.mcgill.ca/soot/ [11] Ophir Holder, Israel Ben-Shaul, and Hovav Gazit. 1999. Dynamic layout of distributed applications in FarGo. In Proceedings of the 21st international conference on Software engineering (ICSE '99). ACM, New York, NY, USA, 163-173. DOI=10.1145/302405.302462 http://doi.acm.org/10.1145/302405.302462 [12] Mark S. Gordon, D. Anoushe Jamshidi, Scott Mahlke, Z. Morley Mao and Xu Chen. 2012. COMET: Code Offload by Migrating Execution Transparently. In Proceedings of OSDI 2012. [13] Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not. 40, 10 (October 2005), 519-538. DOI=10.1145/1103845.1094852 http://doi.acm.org/10.1145/1103845.1094852 [14] J. Flinn. 2012. Cyber Foraging: Bridging Mobile and Cloud Computing. Lectures on Mobile and Pervasive Computing, 2012.