An Exploratory Study on Patterns in Dynamic Memory Allocations Alexandre Beletti Ferreira¹, Rivalino Matias Jr.², and Vinícius Fonseca Maciel²
¹ School of Computer Engineering
² School of Computer Science
Federal University of Pará
Federal University of Uberlândia
Tucuruí, Brazil
Uberlândia, Brazil
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
Introduction
Dynamic Memory Allocations • Dynamic memory allocation (DMA) is omnipresent. • Real applications allocate and deallocate memory blocks of constant and varying sizes, millions of times. – In [1] the authors observed that a stock-market software performed millions of memory allocations requests within 60-second.
• RQ: Do real applications share common dynamic memory allocation patterns? [1] T. B. Ferreira, R. Matias Jr., A. Macedo, and L. B. Araujo, “An experimental study on memory allocators in multicore and multithreaded applications,” in Proc. of the 12th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Gwangju, Korea, 2011, pp. 92-98.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
1/35
Introduction
Previous Study • In [2] the authors characterized the DMA of real world applications. – How frequent are DMAs? What is the distribution of allocated sizes? What is the average allocations’ retention time?
• Seven real applications from different categories were studied. – Desktop: Web browsers, graphic editors, audio & video players, numerical computing environment, integrated development environment (IDE). – Servers: DB server and Web Server.
[2] D. Costa and R. Matias Jr., “Characterization of dynamic memory allocations in real-world applications: an experimental study,” IEEE 23rd International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Atlanta, GA, USA, 2015, pp. 93-101.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
2/35
Introduction
Motivation • In [2] only one application was investigated per category. – Poor generalizability.
– The authors did not consider other applications from the same category.
• Our study aims to assess external validity of the results discussed in [2]. – Desktop: Web browsers, graphic editors, audio & video players, numerical computing environment. – Three new applications for each category investigated.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
3/35
Introduction
Goal
• Characterize the DMA behavior of the new applications and compare their allocation patterns, both intra- and intercategory, to the patterns observed in [2].
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
4/35
Methodology
Method • We used a memory allocator wrapper to collect the memory usage data: – Intercept the calls to allocation (malloc, calloc, realloc) and deallocation (free) routines. – Redirects all calls to the system’s memory allocator (e.g. glibc). – Works only for applications that use the standard dynamic allocation and deallocation routines.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
5/35
Methodology DATA COLLECTED PER ALLOCATION REQUEST Data
Description
Size (in byte)
Allocation size.
Operation type
Type of allocation routine: malloca, realloc, and calloc.
Time (in nanoseconds) Address
Allocation request time. Address of the allocated memory block. a.
The C++ operator new calls malloc internally.
DATA COLLECTED PER DEALLOCATION REQUEST Data Time (in nanoseconds) Address
6th Brazilian Symposium on Computing Systems Engineering
Description Deallocation request time.
Address of the memory block to be deallocated.
Nov 2016, João Pessoa, PB
6/35
Methodology
Material • Our goal was to evaluate the external validity of the results obtained for desktop application investigated in [2]. – We chose 12 new applications (3 applications per category).
• The criteria used to select the applications were the same used in the previous work [2]. – Be mainly written in C/C++.
– Run under Linux. – Use glibc. – Allow automating their operations regarding the exercised workload.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
7/35
Methodology INVESTIGATED APPLICATIONS PER CATEGORY Category
Application
Version
Lifespan (years)
Mainly written in
Elinks
0.12pre6
15
C, C++
Uzbl
2012.05.14
07
C
Midori
0.4.3
09
C
Gimp
2.8.14
20
C
MyPaint
1.1.0
11
C++, Python
Krita
3.0
11
C, C++
Gmerlin
1.2.0
12
C
Mplayer
1.0.9
16
C
SMPlayer
14.9.0
10
C, C++
Matlab©
R2015a
32
C, C++, Fortran
R Project
3.2.2
23
C, C++, Fortran
Sage
7.1
11
C++, P/Cython
Web Browser
Graphic Editor
Audio & Video Player Numerical Computing Enviroment
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
8/35
Design of Experiments
Experiments • We conducted fifteen experiments, one per application. – Although audio and video were the same program.
• We replicated each experiment 31 times. – The first replication was discarded to avoid influence of warm-up effects.
• Our experimental approach aim at reducing the influence of experimental errors. – Analyses were based on the average and median statistics calculated over the trace samples collected from 30 replications per experiment.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
9/35
Design of Experiments
Workload (Web Browsers) • We accessed Google website and searched for a keyword, and then accessed the first link listed. • The keywords were: – facebook, youtube, yahoo, and baidu. (in this order).
– The first link shown were the official keyword’s web site.
• These websites were selected according to the Alexa Ranking. (July 2016) – The top 500 sites on the web: http://www.alexa.com/topsites/
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
10/35
Design of Experiments
Workload (Graphic Editors) • We loaded and displayed a JPG file.
• The image properties are: – Resolution: 1920 x 1080 pixels.
– Size: 265.9 kilobytes.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
11/35
Design of Experiments
Workload (Audio Players) • We set the audio players to play an MP3 file. • Track: “1-1 Symphony Nº 5 Allegro Con Brio” from the “The Very Best of Beethoven” album. • The track properties are: – Bit rate: 256 Kbps. – Duration: 5 min 34 sec.
– Size: 10.7 megabytes.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
12/35
Design of Experiments
Workload (Video Players) • We set the video players to play an OGG file. • High-definition video obtained from: – http://www.bigbuckbunny.org/
• The video properties are: – Frame rate: 24 fps. – Resolution:1280 x 720 pixels.
– Duration: 9 min 56 sec. – Size: 196.9 megabytes. 6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
13/35
Design of Experiments
Workload (Numerical Computing) • Executed a program that performed the Gauss-Seidel elimination algorithm. • Scaling tridiagonal matrix of order 50.
• This method is used for linear problem solving, which is a typical usage of Numerical Computing applications.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
14/35
Design of Experiments
Test Bed • The test bed computer was restarted before each experiment run in order to avoid possible interferences between subsequent experiments. Resource
Description
CPU
Intel Core i7-3537U @ 2.00GHz
Memory
8Gb RAM
System
Linux OS kernel 4.2.0-36-generic SMP
Distribution
Ubuntu 15.10
Architecture
64 bits
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
15/35
Results and Discussion
Allocation Requests and Memory Allocated • Do real applications share common dynamic memory allocation patterns? • We analyzed the amount of allocation requests and memory allocated. • In both quantities, it was not observed a consistent pattern either intra- or inter-application category. – This substantial diversity corroborates the findings observed in [2].
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
16/35
Results and Discussion DYNAMIC ALLOCATED MEMORY PER APPLICATION Application
Number of Allocationsa
Allocated Memory (megabyte)a
Allocation Size (byte)b
Elinks
1,145,332
220
49
Uzbl
3,581,006
839
158
Midori
3,995,208
836
139
Gimp
2,110,367
255
16
457,986
232
32
4,119,021
1,105
42
Gmerlin (Audio)
309,676
59
32
Mplayer (Audio)
1,305,688
152
27
SMPlayer (Audio)
439,586
309
41
Gmerlin (Video)
352,710
80
32
Mplayer (Video)
1,997,262
239
32
801,747
1,521
56
MyPaint Krita
SMPlayer (Video) a Average. b Median.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
17/35
Results and Discussion DYNAMIC ALLOCATED MEMORY PER APPLICATION Application
Number of Allocationsa
Allocated Memory (megabyte)a
Allocation Size (byte)b
Matlab©
2,455,521
858
38
R Project
60,468
68
396
Sage
107,981
99
384
a Average. b Median.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
18/35
Results and Discussion
Allocation Requests and Memory Allocated • We looked for possible correlations among the three variables below. – Two-tailed p-value tests with significance level (α) of 5%. – Correlation between number of allocation and amount of memory allocated, considered statistically significant at the 0.05 level (ρ=0.75, pvalue=0.001). – The other two associations showed very weak correlations. SPEARMAN’S CORRELATION COEFFICIENTS Number of Allocations Number of Allocations
Allocated Memory
Allocation Size
1
Allocated Memory
0.75
1
Allocation Size
-0.11
0.14
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
1
19/35
Results and Discussion
Allocation Sizes • Eleven out of fifteen applications (73.33%) showed the median allocation sizes less than 128 bytes. – Predominance of DMA of small sizes. – Similar pattern (75%) was reported in [2].
• To understand the previous results we analyzed their distribution using box-and-whisker plots for each investigated application per category. – The bottom and top of each box show 25% and 75% quartiles, median is represented by the horizontal mark. And the extremes are 10th and 90th percentiles.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
20/35
Results and Discussion
Fig. 2. (a) to (e) allocations sizes per category; (f) pct. of the ten-most allocated sizes per application. 6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
21/35
Results and Discussion
Allocation Sizes • In this study we found that 85% of the ten most allocated sizes were smaller than or equal to 128 bytes. – In [2], the most recurrent allocated sizes for desktop applications, 78.33%, were equal or smaller than 128 bytes. Fig. 1. Cumulative percentage of the ten-most allocated sizes for all applications investigated
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
22/35
Results and Discussion
Allocation Routines • We analyzed the source code of the applications written fully or partially in C++. • Sometimes the calls must be from some external code linked to it (e.g. MyPaint). PERCENTAGE OF CALL SITES PER ALLOCATION ROUTINE
Applications
new
malloc
realloc
calloc
Elinks
5
34
12
49
MyPaint
0
90
0
10
Krita
98
1.3
0.1
0.6
SMPlayer
90.7
8.2
0.5
0.6
Sage
16.2
64.2
12.8
6.8
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
23/35
Results and Discussion USAGE PERCENTAGE OF ALLOCATION ROUTINES Applications
malloc
realloc
calloc
Elinks
59.82
36.71
3.47
Uzbl
39.77
55.85
4.39
Midori
41.78
53.73
4.49
Gimp
41.31
7.03
51.66
MyPaint
65.03
10.41
24.57
Krita
85.54
8.75
5.7
Gmerlin (audio)
84.11
8.78
7.11
Mplayer (audio)
58.57
30.06
11.37
SMPlayer (audio)
82.53
3.79
13.67
Gmerlin (video)
83.37
9.54
7.1
Mplayer (video)
58.65
32.41
8.94
SMPlayer (video)
77.56
2.38
20.07
Matlab
81.28
17.51
1.21
R Project
97.17
2.26
0.57
Sage
67.42
6.33
26.25
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
24/35
Results and Discussion
Allocation Retention Time (ART) • The retention time of an allocated memory block is the time interval between its allocation and deallocation. – The pattern of retention time influences the degree of the application’s heap memory fragmentation [3].
• ART was classified into three classes: – Short duration (ART < 100 milliseconds).
– Medium duration (100 milliseconds ≤ ART ≤ 1 second). – Long duration (ART > 1 second).
[3] P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles, “Dynamic storage allocation: a survey and critical review,” in Proc. of the International Workshop on Memory Management (IWMM), London, UK, 1995, pp. 1-116.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
25/35
Results and Discussion
• In average applications:
for
all
– Short: 85.06% – Medium: 8.6% – Long: 6.4%
• In [2], the average for all applications: – Short: 71.6% – Medium: 6.34% – Long: 22.06%
Fig. 3. Percentages of allocation retention time categories per application. 6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
26/35
Results and Discussion
Allocation and Deallocation Path • Closely related to the retention time is the analysis of the allocation and deallocation behavior over time. • We divided the total time of the experiment into 1000 time intervals. – For each interval, we counted the number of allocations and deallocations performed in that interval. – Subtracted both values and summed to the result computed. – Obtained the total amount of non-deallocated blocks from start time to the time interval.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
27/35
Results and Discussion
Fig. 4. Allocation and deallocation paths per application.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
28/35
Results and Discussion
Fig. 4. Allocation and deallocation paths per application.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
29/35
Results and Discussion
Fig. 4. Allocation and deallocation paths per application.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
30/35
Threats to Validity
Threats to Validity (Internal Validity) • Our results were based on the average/median of multiple replications of each experiment. • The instrumentation used nanosecond precision, however, the real accuracy is limited to the quality of the hardware timer.
• It was not able to distinguish malloc and new. – Mmap or sbrk/brk allocations were not captured in this study
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
31/35
Threats to Validity
Threats to Validity (External Validity) • The web browsers Lynx, Elinks, Uzbl and Midori are not so popular like Mozilla Firefox and Google Chrome. – Midori is the closest one to the most popular applications.
• Retention times suffer influence of hardware factors. – This influence may not be so substantial, since the results in [2], which used a different machine setup, corroborates our results.
• The study was desktop driven. • Experiments were based only on one type of workload.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
32/35
Conclusion
Conclusion • The experimental evidences showed that different realworld applications share patterns with respect to DMA. • Some of these patterns were consistently more observed in applications of the same category. – Correlation with the workload type.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
33/35
Conclusion
Our key findings include… • Allocation sizes ≤ 128 bytes were predominant (85%). • Short duration allocations (< 100 milliseconds) corresponded to 85.06% of all allocations analyzed. • Increasing growth pattern of non-deallocated memory blocks observed in 11 out of 15 (73.33%) applications.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
34/35
Thank You! Vinícius Fonseca Maciel
[email protected] http://hpdcs.facom.ufu.br/
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
35/35
Backup Slides
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
Backup Slides
Index 1.
Consequences
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB
Consequences • Memory allocator algorithms can be optimized. • New dynamic data structures can be proposed. • Analytical studies in memory management can benefit from the found patterns creating more accurate methods. • Experimental studies can use more realistic synthetic workloads based on the discovered patterns.
6th Brazilian Symposium on Computing Systems Engineering
Nov 2016, João Pessoa, PB