However, Android, the most used software platform for embedded systems, features a virtual ..... has the higher overhead, so it is not as optimized as the ARM.
2015 IEEE 39th Annual International Computers, Software & Applications Conference
The impact of virtual machines on embedded systems Anderson L. Sartor, Arthur F. Lorenzon and Antonio C. S. Beck Institute of Informatics Federal University of Rio Grande do Sul (UFRGS) Porto Alegre, Brazil {alsartor, aflorenzon, caco}@inf.ufrgs.br Android processor architectures (i.e., ARM, MIPS and x86) and whether this impact is the same for all of them. We show that assessing the variation of the impact as we change the target processor or the way the application is written is of extreme importance if one wants to develop a low-energy application, and must be considered at early design stages.
Abstract— Embedded systems are becoming increasingly complex and, due to their tight energy requirements, all the available resources must be used in the best possible way. However, Android, the most used software platform for embedded systems, features a virtual machine to run applications. Even though it ensures flexibility so the application can execute on different underlying architectures without the need for recompilation, it burdens the system because of the introduction of an extra software layer. Considering this scenario, through the development of an extension of the Android QEMU emulator and a specific benchmark set, this work evaluates the significance of the virtual machine by comparing applications written in Java and in native language. We show that, given a fixed energy budget, a different amount of applications can be executed depending the way they were implemented. We also demonstrate that this difference varies according to the processor, by executing the applications on all officially supported Android architectures (Intel x86, ARM, and MIPS). Therefore, even though the Virtual Machine provides total transparency to the software developer, he/she must be aware of it and the underlying target microarchitecture at early designs stages so as to build a lowenergy application.
To achieve this, we extended the Android QEMU emulator and developed a benchmark set that comprises applications written in Java and applications with methods written in native code, which are called through the Java Native Interface (JNI). Furthermore, we assess the energy consumption and the performance of these applications on four representative processors in order to highlight the differences between the ISAs (Instruction Set Architectures) and microarchitectures, investigating the particularities of general purpose and embedded processors. The remaining of this work is organized as follows. The Dalvik Virtual Machine and the Java Native Interface are introduced in Section II. Section III discusses related works, and Section IV presents the QEMU extension that was developed. Section V discusses the experimental results, and Section VI concludes this work.
Keywords—Virtual machine; Android; Dalvik; JNI;
I. INTRODUCTION
II. DALVIK VIRTUAL MACHINE AND JAVA NATIVE INTERFACE
Embedded systems operate in constrained environments in which computer memory, storage, power supply and processing capability are limited. As they are getting more complex and with more functionalities, a number of applications must run concurrently in an environment that is not optimized for performance, but rather for energy consumption, which must be kept as low as possible to maintain an acceptable battery life [1]. Android, the world’s most popular mobile platform [2] [3], is a Linux-based software platform that uses a virtual machine, Dalvik, to execute the bytecodes from Java applications. Android is mainly used in smartphones, tablets, and it also is a promising candidate to be used in other consumer electronics products, such as smart TVs, set-top boxes, and smart watches.
Dalvik VM (Virtual Machine) is a register-based architecture [4], opposed to JVM (Java Virtual Machine) that is stack-based. Dalvik was designed to run on low-memory environments and to allow multiple instances of the VM, so every application runs on its instance, which provides security, isolation, and effective memory management. A register-based architecture needs fewer VM instructions to implement a high-level code, even though this comes at the cost of increased instruction size, code size and memory fetches [5]. With larger instructions, register-based architectures take more time to execute each instruction, compared to stack-based architectures. However, the product between the time per instruction and the number of executed instructions is usually smaller in register-based architectures than in stack-based ones, which means that the former architecture will take less time to execute an application [5].
By using a virtual machine, the application’s portability is highly increased, once the code can be executed on several processor architectures without the need for recompilation, following the idea of “write once, run everywhere”. However, this flexibility comes at the cost of an additional layer between the code execution and the processor, which introduces an overhead in the application’s execution.
Through JNI, it is possible for Java code to interact with C or C++, by calling methods implemented in native code, so one may reuse legacy code and increase the application’s performance in some situations. On the other hand, JNI compromises the application’s portability and security, once the code needs to be compiled for each target architecture since it
Therefore, this work evaluates the impact of the virtual machine for several applications on all officially supported 0730-3157/15 $31.00 © 2015 IEEE DOI 10.1109/COMPSAC.2015.90
626
does not run on the Dalvik VM anymore. In addition, by using JNI, overhead is created because of the context switches, which involves copying of operands in memory from one side to the other (from the Java to the native side or vice-versa). In this work, we will refer to applications implemented in Java and native code (through JNI) as JNI applications, in order to distinguish the three main types of Android applications: pure Java (Java applications), Java with native code (JNI applications) and those purely written in native code (native applications).
Android applications. [13] and [14] presented the energy consumption of some Android applications, but no comparison between Java and JNI applications was made. In addition, some works only considered extremely simple algorithms when compared to popular embedded systems’ applications, as the case of sorting algorithms. Our work differs from previous research in several aspects. None of the previous works assess other Android CPU architectures besides ARM. In addition, we are not only evaluating the performance variation of implementing an Android application using native code through JNI, but also measuring the impact of this modification on the energy consumption, number of executed instructions and more importantly, the impact of the Dalvik VM on ARM, MIPS and x86 architectures and whether it is the same for them.
III. RELATED WORK Batybuk et al. [6] have evaluated sorting algorithms written in Java, JNI and pure native code by measuring the differences in performance considering two environments: an Android ARM emulator and an x86 Linux PC with Sun’s JRE (Java Runtime Environment). Their results showed a maximum speed up to 30 times using native C applications when compared to its Java version running on Dalvik, and a maximum speed up of 10 times by using JNI. Lin et al. [7] have compared Java and JNI by benchmarking 12 applications on a real Android device. According to their experiments, JNI is, on average, 34.2% faster than its Java version running on Dalvik. [8] and [9] also have shown that JNI is faster with five types of Android applications. Son et al. [10] have developed the JNI version of the NyARToolKit, an augmented reality engine, and obtained a speedup of 1.8 times compared to the Java version. Acosta and Almeida [11] analyzed the performance of the Paralldroid framework, presented in [12], which automatically generates native C, OpenCL or Renderscript code based on annotations in the Java source code. Their test applications comprise imageprocessing benchmarks. However, when compared to the manually implemented native C or Renderscript code, the automatically generated code always loses in performance (up to three times of slowdown).
IV. QEMU EXTENSION In order to collect the required data for our analysis, we have extended the Android Emulator’s QEMU to profile the Android applications. This modification was needed because, to the best of our knowledge, there is no tool capable of profiling an Android application for all supported architectures (i.e., ARM, MIPS, and x86). Traceview [18] only profiles code that is executed by the Dalvik VM (not being able to profile native code) and it has serious limitations regarding the amount of data it can trace; PowerTutor [15] can only estimate the energy consumption for a few specific devices with ARM processors, using a generic model for all the others. Other approaches, such as the ones in [19] [20] [21] [22], propose the application profiling or the energy consumption estimation, but they fail to cover all architectures, and some are not able to estimate the energy consumption for a specific application, only for the system as a whole. As this extension was implemented in the QEMU, it is completely transparent to the Android system. A selective trace mechanism was also implemented, so it is possible to trace only the information that is relevant to the user (e.g.: a specific process). It is available for all supported Android architectures and both native and Dalvik code can be traced. A tool with a graphical interface was developed to import all the collected data in order to estimate how much time and energy each application will consume based on the average cost of each instruction. The tool proposed in [23] presents more details about the initial implementation and it was improved for the utilization in this work.
Wilke et al. [13] have evaluated the energy consumption of three emailing and three browsing applications for Android, observing that the advertisements can substantially increase the energy consumption. Pathak et al. [14] have proposed a finegrained energy profiler for smartphone applications (Android and Windows Mobile) and evaluated the energy consumption of six popular applications, reaching the similar conclusions. PowerTutor [15] is a power estimation system that uses the model generated by PowerBooter [16] for online power estimation. PowerBooter models significant components regarding power dissipation in the system, which are: CPU, LCD display, GPS, Wi-Fi, 3G, and audio interfaces. The authors evaluated six Android applications, showing that the power dissipation varies, depending on the running application. Shye et al. [17] proposed an experiment where Android G1 mobile phones with a logger were given to users in order to trace their activity and characterize the power consumption. They presented a regression-based model based on the data collected by their logger. As a case study, they slowly reduced the screen brightness and CPU frequency when the screen was active for a long time, which resulted in 10.6% of total system energy savings with minimal impact on the user experience.
V. EXPERIMENTS To the best of our knowledge, there is no set of benchmarks written in Java, with correspondent versions that have their hot spots (most time consuming methods) implemented in native code, using the JNI to call these native methods from the Java side. Therefore, we developed these applications based on a subset of the JVM SPEC 2008, which was originally developed for measuring performance of the Java Runtime Environment. This resulted in a set of 14 benchmarks (seven applications with two versions each: Java and JNI). For the JNI versions, the hot spots chosen were methods that executed for more than 10% of the application’s total execution time. For some benchmarks, just one method is responsible for more than 90% of this total; while for others, up to five methods fit in the aforementioned
The first six works discussed in this session only focus on the performance variation of using native or Java code in
627
respectively. In this case, the Java version significantly executed more instructions than the JNI version on all architectures. Comparing these three virtual machines, the one that presented the lowest overhead was the ARM Dalvik because, proportionally, it executed fewer instructions than the other two virtual machines.
constraint. The converted methods were always from the application itself, and not methods from external libraries. The Android Virtual Device setup used was: Android 4.0.3, 512MB RAM, on ARM, x86, and MIPS CPUs. A. Java vs. JNI Fig. 1 presents the average number of executed instructions and the standard deviation for each benchmark for the ARM, x86, and MIPS architectures. They were executed three times due to the small standard deviation (less than 1%). As can be observed, the MIPS executed more instructions than the ARM, because its ISA comprises simpler instructions (more RISC oriented), while the x86, in general, executed fewer instructions, because its CISC nature.
Now, let us consider a benchmark that executed fewer instructions in its Java version than its JNI version: MPEG Audio, with a ratio of 0.66, 0.47 and 0.73 for the same architectures as before (ARM, x86 and MIPS). In this case, the VM that presented the lowest overhead was the x86 Dalvik. This shows that the x86 Dalvik was able to optimize the code’s execution better than the two other VMs. Furthermore, executing the JNI version of this application is more expensive than executing its counterpart implemented in Java only. Hence, the use of JNI is not advisable when one considers this specific case. Let us now address the Compress application. This is the most significant case regarding the overhead of the Dalvik VM: both MIPS and ARM Dalviks present 135.69% and 125.33% higher overhead than the x86 version, respectively.
Fig. 2 presents the Java/JNI ratio, which is the ratio between the number of executed instructions by the Java only application and by the application with JNI calls (comprising native and Java codes, and the JNI communication). As one can observe, the decision of developing some parts of the application in native code depends on the type of application that is being developed. For example, Scimark Sparse and FFT benchmarks have one method that lasts for 80-90% of the total execution time, and this method performs calculation with arrays and matrices. Such applications are very suitable for JNI use: the designer must reprogram only one method, and there is almost no context switches between the Dalvik and native code. On the other hand, MPEG Audio is quite the opposite: it has five methods that take only about 10% of the execution time each. When methods are simple or called several times, there are cases with slowdowns because the faster execution of native code does not amortize the costs of context switching. Therefore, the usage of the JNI can be beneficial or not depending on several factors, such as: the overhead of accessing Java attributes and calling a JNI method through a Java method; the number of times that the method is called; the complexity of the method, etc.
In most cases, the Dalvik implemented for the x86 is better than its counterparts, with only two exceptions: Sparse and LU, on which the ARM version performs better. The MIPS Dalvik has the higher overhead, so it is not as optimized as the ARM and x86 Dalviks. Finally, let us compare the variation (average of all applications) of the impact between two architectures separately (ARM-x86, ARM-MIPS, and x86-MIPS). The ARM and MIPS versions have 39.54% and 65.18% more impact than the x86 Dalvik, while the MIPS Dalvik has 24.97% higher overhead than the ARM Dalvik. C. Performance and energy consumption estimation In this section, we have chosen the ARM and the x86 architectures in order to evaluate the energy consumption and performance, considering different processor organizations to reflect both embedded and general purpose systems. On the embedded system side, we consider the ARM Cortex-A8, Cortex-A9, and the Intel Atom, while the Intel Sandybridge Core i7 is used to represent a general purpose system. Data on processor power dissipation, frequency and Cycles per Instruction (CPI), presented in Table I, were obtained from [24] [25]. In these studies, the authors have used internal hardware counters of the processors and external power meters to make the measurements, proving the accuracy of the experiments. The
B. The impact of the Dalvik virtual machine In order to evaluate the impact of the Dalvik VM for the chosen architectures, let us analyze Fig. 2 again, but now comparing the Java/JNI ratio between different architectures. The lowest Java/JNI ratio between all architectures for a given application represents the lowest impact of the virtual machine on the application’s execution. This will be further explained with two examples. First, let us consider Scimark Sparse: it has a ratio of 8.72, 10.80 and 13.57 for ARM, x86, and MIPS,
Executed instructions
1.E+13 1.E+12 1.E+11
ARM x86
1.E+10
MIPS
1.E+09 Java
JNI
Scimark Sparse
Java
JNI
Scimark FFT
Java
JNI
MPEG Audio
Java
JNI
Scimark SOR
Java
JNI
Scimark MonteCarlo
Java
Scimark LU
Fig. 1. Number of executed instructions between different architectures
628
JNI
Java
JNI
Compress
Java/JNI executed instructions ratio
16
13.57
14 12 10
13.38
10.80
9.86 8.94
10.26
8.72 5.34 6.07 4.83
6
ARM
7.12 5.91
8
x86 MIPS
4
0.47 0.66 0.73
2
0.07 0.17 0.17
0.02 0.04 0.04
0 Scimark Sparse
Scimark FFT
MPEG Audio
Scimark SOR
Scimark MonteCarlo
Scimark LU
Compress
Fig. 2. Java/JNI executed instructions ratio between different architectures
savings of choosing the best version (Java or JNI) of a given application, e.g., the Java version of MPEG Audio instead of its counterpart implemented with JNI; or the opposite for the Scimark LU.
MIPS architecture is not considered in this section because there is no updated information available, such as cycles per instruction and power dissipation. Main memory read/write power dissipations were obtained from CACTI 6.5, with the following configuration: 512MB, 8 banks, block size of 64 bytes and 45nm technology; which results in an access time of 8.26ns and an energy consumption of 2.66nJ/2.56nJ for a read or a write, respectively; and a leakage power of 109.613mW. Given the small difference between reads and writes, we considered the average of these two values as the energy consumption of each memory access (2.61nJ), regardless if it was a memory read or write. Fig. 3 and Fig. 4 present the performance and the energy consumption of the Cortex-A8, Cortex-A9, Intel Atom and Intel i7 based on the aforementioned data. These figures show the estimation for each Java or JNI benchmark and the geometric mean of all applications.
These figures also show how the performance and energy consumption can vary depending on the organization of the same processor architecture (i.e.: processors that implement the same ISA). Moreover, one can compare processors that support different ISAs: the ARM to the x86. As can be observed, the A9 presented a lower performance when compared to the Atom; however, the former is much more energy efficient. Considering these four processors, the one that, on average, consumes less energy to execute the selected applications is the A9. Not surprisingly, it is one of the most used processors in the embedded market. This also highlights the differences between embedded systems’ processors (Cortex-A8, Cortex-A9 and Intel Atom) and the general purpose one (Intel i7) in terms of performance and energy consumption. The Intel i7 was the one that consumed more energy to execute all the tested applications, and it was the one with the best performance.
Comparing these results, the A8/A9 ratio is of 2.58 for the execution time and 1.88 for the energy consumption. It means that, for the selected applications, the A9 executes the applications two times faster than the A8 and consumes almost half the energy. The Atom/i7 ratio is of 2.05 for the execution time and 0.48 for the energy consumption. Therefore, even though the i7 doubles the performance, it consumes more energy than the Atom.
Considering the Cortex-A9 and the Intel Atom, one can note that, even though the x86 Dalvik presented a lower overhead and executed the applications faster when compared to the ARM Dalvik, the Atom consumed more energy to execute the tested benchmarks. However, if one compares the Cortex-A8 with the Atom, the latter consumes less energy in most benchmarks, besides having a better performance with a lower VM overhead. This highlights the importance of the chosen ISA and, most importantly, that the micro-architecture also influences when it comes to the efficiency of the virtual machine.
It is important to note that both Fig. 3 and Fig. 4 are in logarithmic scale and that the impact on the energy/performance of programming Java or JNI applications can overcome the full execution of several applications. For instance, the Scimark Sparse can be executed only with the energy consumption
D. Energy budget Embedded systems’ users want to execute the maximum number of applications without the device running out of battery. So, let us consider four devices that have identical hardware, but different processors (Cortex-A8, Cortex-A9, Intel Atom and Intel i7) and each device has 3kJ of energy available for the processor and memory. Then, we will examine a set of applications that can execute with this amount of energy, considering three different situations: all the applications must be written in Java only, all the applications must use JNI and,
TABLE I. ARM AND X86 POWER AND CYCLES INFORMATION
Frequency (GHz) Processor power dissipation (W) CPI Main memory access rate
ARM Cortex-A8 0.6
ARM Cortex-A9 1
Intel Sandybridge Atom i7 1.66 3.4
0.85
1.2
5.5
24
3.36
2.17
1.59
0.68
~1%
~1% ~0.5%
~2%
629
100,000.0
Exec. Time (s)
10,000.0 1,000.0 100.0 10.0 1.0 0.1
Java
JNI
Scimark Sparse
Java
JNI
Scimark FFT
Java
JNI
MPEG Audio
ARM Cortex-A8
Java
JNI
Scimark SOR
ARM Cortex-A9
Java
JNI
Scimark MonteCarlo
Intel Atom
Java
JNI
Scimark LU
Java
JNI
Java
Compress
JNI
Mean
Sandybridge i7
Fig. 3. Estimated execution time between different processor's organizations 100,000
Estim. energy (J)
10,000 1,000 100 10 1 Java
JNI
Scimark Sparse
Java
JNI
Scimark FFT
Java
JNI
MPEG Audio
ARM Cortex-A8
Java
JNI
Scimark SOR
ARM Cortex-A9
Java
JNI
Scimark MonteCarlo Intel Atom
Java
JNI
Scimark LU
Java
JNI
Java
Compress
JNI
Mean
Sandybridge i7
Fig. 4. Estimated energy consumption between different processor's organizations
finally, the best solution between Java and JNI (i.e., the ones that consume less energy). This is presented in Table II.
VI. CONCLUSIONS AND FUTURE WORK Through this study, it was possible to analyze the impact of the Dalvik virtual machine in all officially supported Android CPU’s architectures. The experiments show that the overhead of virtual machines in different architectures varies considerably for most of the applications. As one could observe, the x86 Dalvik is the one that presents the lowest impact on the execution of an application, followed by the ARM Dalvik. The MIPS version is the one which performs the worst. Although having a lower overhead on the execution, the Intel Atom consumed more energy to execute the tested applications than the Cortex-A9, and less energy when compared to the CortexA8. Most importantly, we demonstrated that the target architecture (which includes architecture and organization) and the most appropriate application’s implementation highly influences performance and energy consumption.
Considering Java applications only, it can be observed that the Cortex-A9 and the Intel Atom are able to execute all the seven applications within the energy budget. The i7, on the other hand, is able to execute only five applications. The Cortex-A8 can execute all but the Scimark MonteCarlo. Now, considering the subset of JNI applications only, even though none of the processors are able to execute all the JNI applications, the Cortex-A9 is able to execute one more application than the others (Compress). Choosing the best solution between Java and JNI applications allows the processors to execute more applications within the aforementioned energy budget and highlights the importance of investigating the efficiency of virtual machines. In this scenario, all the processors but the i7 are able to execute the seven applications, as it can be observed in Table II. The most significant case, when it comes to choosing the best application between Java and JNI, is when one compares Core i7 with ARM. While the Core i7 is not capable of executing all applications, the Cortex-A9 is able to execute all of them consuming less than half of the energy budget.
As future work, we will consider two more types of applications: those that use Native Activities, which are Android activities written in native code; and Renderscript applications. We also intend to build a framework to automatically investigate which is the best way to develop a given application to meet performance and energy requirements.
630
TABLE II. PROCESSORS COMPARISON WITH AN ENERGY BUDGET OF 3KJ Java applications only Benchmark
1 2 3 4 5 6 7
Total Energy (J)
JNI applications only 1 2 3 4 5
Java and JNI (best solution)
Total 6 7 Energy (J) 1 2 3 4
5
6
7
Total Energy (J)
Cortex-A8
x x x x
x x
2,642.60 + + + +
+
1,361.34 + + x +
x
+
x
2,811.38
Cortex-A9
x x x x x x x
2,047.40 + + + +
+ +
2,709.45 + + x +
x
+
x
1,492.76
Intel Atom
x x x x x x x
2,677.61 + + + +
+
1,298.81 + + x +
x
+
x
1,946.03
Sandybridge i7
x x
2,592.32 + + + +
+
2,770.70 + + x +
+
x
2,227.01
x
x x
Legend: Benchmark: Executed version:
1-Scimark Sparse
2-Scimark FFT
6-Scimark LU
7-Compress
Java: ‘x’
JNI: ‘+’
3-MPEG Audio
A. C. S. Beck, C. A. L. Lisboa and L. Carro, Adaptable Embedded Systems, Springer, 2012.
[2]
IDC, "Android and iOS combine for 91.1% of the worldwide smartphone OS market in 4Q12 and 87.6% for the year," 2013. [Online]. Available: http://www.idc.com/getdoc.jsp?containerId=prUS23946013. [Accessed May 2013].
[3]
N. Mawston, "Android captures record 85% share of global smartphone shipments in Q2 2014," 30 July 2014. [Online]. Available: http://www.strategyanalytics.com/default.aspx?mod=reportabstractvie wer&a0=9921. [Accessed August 2014].
[4]
Android, "Bytecode for the Dalvik VM," [Online]. Available: http://source.android.com/tech/dalvik/dalvik-bytecode.html. [Accessed May 2013].
[5]
D. Ehringer, "The Dalvik virtual machine architecture," Techn. report, March 2012.
[6]
L. Batyuk, A.-D. Schmidt, H.-G. Schmidt, A. Camtepe and S. Albayrak, "Developing and benchmarking native Linux applications on Android," MobileWireless Middleware, Operating Systems, and Applications, vol. 7, pp. 381-392, 2009.
[7]
C.-M. Lin, J.-H. Lin, C.-R. Dow and C.-M. Wen, "Benchmark Dalvik and native code for Android system," Innovations in Bio-inspired Computing and Applications (IBICA), International Conference on, pp. 320-323, 2011.
[8]
[9]
5-Scimark MonteCarlo
[14] A. Pathak, Y. C. Hu and M. Zhang, "Where is the energy spent inside my app?: fine grained energy accounting on smartphones with Eprof," ACM european conference on Computer Systems (EuroSys), pp. 29-42, 2012.
REFERENCES [1]
4-Scimark SOR
[15] M. Gordon, L. Zhang, B. Tiwana, R. Dick, Z. M. Mao and L. Yang, "PowerTutor: A Power Monitor for Android-Based Mobile Platforms," [Online]. Available: http://ziyang.eecs.umich.edu/projects/powertutor/. [Accessed December 2014]. [16] L. Zhang, B. Tiwana, R. Dick, Z. Qian, Z. Mao, Z. Wang and L. Yang, "Accurate online power estimation and automatic battery behavior based power model generation for smartphones," Hardware/Software Codesign and System Synthesis (CODES+ISSS), IEEE/ACM/IFIP International Conference on, pp. 105-114, 2010. [17] A. Shye, B. Scholbrock and G. Memik, "Into the wild: studying real user activity patterns to guide power optimizations for mobile architectures," IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 168-178, 2009. [18] Android, "Profiling with Traceview and dmtracedump," [Online]. Available: http://developer.android.com/tools/debugging/debuggingtracing. [Accessed July 2014]. [19] M. Cho, S. J. Hwang, H. J. Lee, M. Kim and S. W. Kim, "AndroScope for detailed performance study of the android platform and its applications," International Conference on Consumer Electronics (ICCE), pp. 408-409, 2012. [20] MARSSx86, "Micro-ARchitectural and system simulator for x86-based systems," [Online]. Available: http://marss86.org/~marss86/index.php/Home. [Accessed July 2014].
S. Lee and J. W. Jeon, "Evaluating performance of Android platform using native C for embedded systems," Control Automation and Systems (ICCAS), International Conference on, pp. 1160-1163, 2010.
[21] M. Dong and L. Zhong, "Self-constructive high-rate system energy modeling for battery-powered mobile systems," International conference on Mobile systems (MobiSys), pp. 335-348, 2011.
J. K. Lee and J. Y. Lee, "Android programming techniques for improving performance," Awareness Science and Technology (iCAST), International Conference on, pp. 386-389, 2011.
[22] Qualcomm developers, "Trepn Profiler," [Online]. Available: https://developer.qualcomm.com/mobile-development/increase-appperformance/trepn-profiler. [Accessed July 2014].
[10] K.-C. Son and J.-Y. Lee, "The method of android application speed up by using NDK," Awareness Science and Technology (iCAST), International Conference on, pp. 382-385, 2011.
[23] A. L. Sartor, U. B. Correa and A. C. S. Beck, "AndroProf: A Profiling Tool for the Android Platform," Brazilian Symposium on Computing Systems Engineering (SBESC), pp. 23-28, 2013.
[11] A. Acosta and F. Almeida, "Performance Analysis of Paralldroid Generated Programs," International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 60-67, 2014.
[24] E. Blern, J. Menon and K. Sankaralingam, "A detailed analysis of the contemporary ARM and x86 architectures," UW-Madison Technical Report, 2013.
[12] A. Acosta and F. Almeida, "Towards a Unified Heterogeneous Development Model in Android," Euro-Par 2013: Parallel Processing Workshops, vol. 8374, pp. 238-248, 2013.
[25] E. Blern, J. Menon and K. Sankaralingarn, "Power struggles: revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures," High Performance Computer Architecture (HPCA), International Symposium on, pp. 1-12, 2013.
[13] C. Wilke, C. Piechnick, S. Richly, G. Püschel, S. Götz and U. Aßmann, "Comparing mobile applications' energy consumption," ACM Symposium on Applied Computing (SAC), pp. 1177-1179, 2013.
631