Dynamic Memory Allocations

1 downloads 0 Views 542KB Size Report
Track: “1-1 Symphony Nº 5 Allegro Con Brio” from the. “The Very Best of Beethoven” album. • The track properties are: – Bit rate: 256 Kbps. – Duration: 5 min 34 ...
An Exploratory Study on Patterns in Dynamic Memory Allocations Alexandre Beletti Ferreira¹, Rivalino Matias Jr.², and Vinícius Fonseca Maciel²

¹ School of Computer Engineering

² School of Computer Science

Federal University of Pará

Federal University of Uberlândia

Tucuruí, Brazil

Uberlândia, Brazil

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

Introduction

Dynamic Memory Allocations • Dynamic memory allocation (DMA) is omnipresent. • Real applications allocate and deallocate memory blocks of constant and varying sizes, millions of times. – In [1] the authors observed that a stock-market software performed millions of memory allocations requests within 60-second.

• RQ: Do real applications share common dynamic memory allocation patterns? [1] T. B. Ferreira, R. Matias Jr., A. Macedo, and L. B. Araujo, “An experimental study on memory allocators in multicore and multithreaded applications,” in Proc. of the 12th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Gwangju, Korea, 2011, pp. 92-98.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

1/35

Introduction

Previous Study • In [2] the authors characterized the DMA of real world applications. – How frequent are DMAs? What is the distribution of allocated sizes? What is the average allocations’ retention time?

• Seven real applications from different categories were studied. – Desktop: Web browsers, graphic editors, audio & video players, numerical computing environment, integrated development environment (IDE). – Servers: DB server and Web Server.

[2] D. Costa and R. Matias Jr., “Characterization of dynamic memory allocations in real-world applications: an experimental study,” IEEE 23rd International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Atlanta, GA, USA, 2015, pp. 93-101.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

2/35

Introduction

Motivation • In [2] only one application was investigated per category. – Poor generalizability.

– The authors did not consider other applications from the same category.

• Our study aims to assess external validity of the results discussed in [2]. – Desktop: Web browsers, graphic editors, audio & video players, numerical computing environment. – Three new applications for each category investigated.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

3/35

Introduction

Goal

• Characterize the DMA behavior of the new applications and compare their allocation patterns, both intra- and intercategory, to the patterns observed in [2].

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

4/35

Methodology

Method • We used a memory allocator wrapper to collect the memory usage data: – Intercept the calls to allocation (malloc, calloc, realloc) and deallocation (free) routines. – Redirects all calls to the system’s memory allocator (e.g. glibc). – Works only for applications that use the standard dynamic allocation and deallocation routines.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

5/35

Methodology DATA COLLECTED PER ALLOCATION REQUEST Data

Description

Size (in byte)

Allocation size.

Operation type

Type of allocation routine: malloca, realloc, and calloc.

Time (in nanoseconds) Address

Allocation request time. Address of the allocated memory block. a.

The C++ operator new calls malloc internally.

DATA COLLECTED PER DEALLOCATION REQUEST Data Time (in nanoseconds) Address

6th Brazilian Symposium on Computing Systems Engineering

Description Deallocation request time.

Address of the memory block to be deallocated.

Nov 2016, João Pessoa, PB

6/35

Methodology

Material • Our goal was to evaluate the external validity of the results obtained for desktop application investigated in [2]. – We chose 12 new applications (3 applications per category).

• The criteria used to select the applications were the same used in the previous work [2]. – Be mainly written in C/C++.

– Run under Linux. – Use glibc. – Allow automating their operations regarding the exercised workload.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

7/35

Methodology INVESTIGATED APPLICATIONS PER CATEGORY Category

Application

Version

Lifespan (years)

Mainly written in

Elinks

0.12pre6

15

C, C++

Uzbl

2012.05.14

07

C

Midori

0.4.3

09

C

Gimp

2.8.14

20

C

MyPaint

1.1.0

11

C++, Python

Krita

3.0

11

C, C++

Gmerlin

1.2.0

12

C

Mplayer

1.0.9

16

C

SMPlayer

14.9.0

10

C, C++

Matlab©

R2015a

32

C, C++, Fortran

R Project

3.2.2

23

C, C++, Fortran

Sage

7.1

11

C++, P/Cython

Web Browser

Graphic Editor

Audio & Video Player Numerical Computing Enviroment

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

8/35

Design of Experiments

Experiments • We conducted fifteen experiments, one per application. – Although audio and video were the same program.

• We replicated each experiment 31 times. – The first replication was discarded to avoid influence of warm-up effects.

• Our experimental approach aim at reducing the influence of experimental errors. – Analyses were based on the average and median statistics calculated over the trace samples collected from 30 replications per experiment.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

9/35

Design of Experiments

Workload (Web Browsers) • We accessed Google website and searched for a keyword, and then accessed the first link listed. • The keywords were: – facebook, youtube, yahoo, and baidu. (in this order).

– The first link shown were the official keyword’s web site.

• These websites were selected according to the Alexa Ranking. (July 2016) – The top 500 sites on the web: http://www.alexa.com/topsites/

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

10/35

Design of Experiments

Workload (Graphic Editors) • We loaded and displayed a JPG file.

• The image properties are: – Resolution: 1920 x 1080 pixels.

– Size: 265.9 kilobytes.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

11/35

Design of Experiments

Workload (Audio Players) • We set the audio players to play an MP3 file. • Track: “1-1 Symphony Nº 5 Allegro Con Brio” from the “The Very Best of Beethoven” album. • The track properties are: – Bit rate: 256 Kbps. – Duration: 5 min 34 sec.

– Size: 10.7 megabytes.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

12/35

Design of Experiments

Workload (Video Players) • We set the video players to play an OGG file. • High-definition video obtained from: – http://www.bigbuckbunny.org/

• The video properties are: – Frame rate: 24 fps. – Resolution:1280 x 720 pixels.

– Duration: 9 min 56 sec. – Size: 196.9 megabytes. 6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

13/35

Design of Experiments

Workload (Numerical Computing) • Executed a program that performed the Gauss-Seidel elimination algorithm. • Scaling tridiagonal matrix of order 50.

• This method is used for linear problem solving, which is a typical usage of Numerical Computing applications.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

14/35

Design of Experiments

Test Bed • The test bed computer was restarted before each experiment run in order to avoid possible interferences between subsequent experiments. Resource

Description

CPU

Intel Core i7-3537U @ 2.00GHz

Memory

8Gb RAM

System

Linux OS kernel 4.2.0-36-generic SMP

Distribution

Ubuntu 15.10

Architecture

64 bits

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

15/35

Results and Discussion

Allocation Requests and Memory Allocated • Do real applications share common dynamic memory allocation patterns? • We analyzed the amount of allocation requests and memory allocated. • In both quantities, it was not observed a consistent pattern either intra- or inter-application category. – This substantial diversity corroborates the findings observed in [2].

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

16/35

Results and Discussion DYNAMIC ALLOCATED MEMORY PER APPLICATION Application

Number of Allocationsa

Allocated Memory (megabyte)a

Allocation Size (byte)b

Elinks

1,145,332

220

49

Uzbl

3,581,006

839

158

Midori

3,995,208

836

139

Gimp

2,110,367

255

16

457,986

232

32

4,119,021

1,105

42

Gmerlin (Audio)

309,676

59

32

Mplayer (Audio)

1,305,688

152

27

SMPlayer (Audio)

439,586

309

41

Gmerlin (Video)

352,710

80

32

Mplayer (Video)

1,997,262

239

32

801,747

1,521

56

MyPaint Krita

SMPlayer (Video) a Average. b Median.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

17/35

Results and Discussion DYNAMIC ALLOCATED MEMORY PER APPLICATION Application

Number of Allocationsa

Allocated Memory (megabyte)a

Allocation Size (byte)b

Matlab©

2,455,521

858

38

R Project

60,468

68

396

Sage

107,981

99

384

a Average. b Median.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

18/35

Results and Discussion

Allocation Requests and Memory Allocated • We looked for possible correlations among the three variables below. – Two-tailed p-value tests with significance level (α) of 5%. – Correlation between number of allocation and amount of memory allocated, considered statistically significant at the 0.05 level (ρ=0.75, pvalue=0.001). – The other two associations showed very weak correlations. SPEARMAN’S CORRELATION COEFFICIENTS Number of Allocations Number of Allocations

Allocated Memory

Allocation Size

1

Allocated Memory

0.75

1

Allocation Size

-0.11

0.14

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

1

19/35

Results and Discussion

Allocation Sizes • Eleven out of fifteen applications (73.33%) showed the median allocation sizes less than 128 bytes. – Predominance of DMA of small sizes. – Similar pattern (75%) was reported in [2].

• To understand the previous results we analyzed their distribution using box-and-whisker plots for each investigated application per category. – The bottom and top of each box show 25% and 75% quartiles, median is represented by the horizontal mark. And the extremes are 10th and 90th percentiles.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

20/35

Results and Discussion

Fig. 2. (a) to (e) allocations sizes per category; (f) pct. of the ten-most allocated sizes per application. 6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

21/35

Results and Discussion

Allocation Sizes • In this study we found that 85% of the ten most allocated sizes were smaller than or equal to 128 bytes. – In [2], the most recurrent allocated sizes for desktop applications, 78.33%, were equal or smaller than 128 bytes. Fig. 1. Cumulative percentage of the ten-most allocated sizes for all applications investigated

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

22/35

Results and Discussion

Allocation Routines • We analyzed the source code of the applications written fully or partially in C++. • Sometimes the calls must be from some external code linked to it (e.g. MyPaint). PERCENTAGE OF CALL SITES PER ALLOCATION ROUTINE

Applications

new

malloc

realloc

calloc

Elinks

5

34

12

49

MyPaint

0

90

0

10

Krita

98

1.3

0.1

0.6

SMPlayer

90.7

8.2

0.5

0.6

Sage

16.2

64.2

12.8

6.8

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

23/35

Results and Discussion USAGE PERCENTAGE OF ALLOCATION ROUTINES Applications

malloc

realloc

calloc

Elinks

59.82

36.71

3.47

Uzbl

39.77

55.85

4.39

Midori

41.78

53.73

4.49

Gimp

41.31

7.03

51.66

MyPaint

65.03

10.41

24.57

Krita

85.54

8.75

5.7

Gmerlin (audio)

84.11

8.78

7.11

Mplayer (audio)

58.57

30.06

11.37

SMPlayer (audio)

82.53

3.79

13.67

Gmerlin (video)

83.37

9.54

7.1

Mplayer (video)

58.65

32.41

8.94

SMPlayer (video)

77.56

2.38

20.07

Matlab

81.28

17.51

1.21

R Project

97.17

2.26

0.57

Sage

67.42

6.33

26.25

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

24/35

Results and Discussion

Allocation Retention Time (ART) • The retention time of an allocated memory block is the time interval between its allocation and deallocation. – The pattern of retention time influences the degree of the application’s heap memory fragmentation [3].

• ART was classified into three classes: – Short duration (ART < 100 milliseconds).

– Medium duration (100 milliseconds ≤ ART ≤ 1 second). – Long duration (ART > 1 second).

[3] P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles, “Dynamic storage allocation: a survey and critical review,” in Proc. of the International Workshop on Memory Management (IWMM), London, UK, 1995, pp. 1-116.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

25/35

Results and Discussion

• In average applications:

for

all

– Short: 85.06% – Medium: 8.6% – Long: 6.4%

• In [2], the average for all applications: – Short: 71.6% – Medium: 6.34% – Long: 22.06%

Fig. 3. Percentages of allocation retention time categories per application. 6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

26/35

Results and Discussion

Allocation and Deallocation Path • Closely related to the retention time is the analysis of the allocation and deallocation behavior over time. • We divided the total time of the experiment into 1000 time intervals. – For each interval, we counted the number of allocations and deallocations performed in that interval. – Subtracted both values and summed to the result computed. – Obtained the total amount of non-deallocated blocks from start time to the time interval.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

27/35

Results and Discussion

Fig. 4. Allocation and deallocation paths per application.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

28/35

Results and Discussion

Fig. 4. Allocation and deallocation paths per application.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

29/35

Results and Discussion

Fig. 4. Allocation and deallocation paths per application.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

30/35

Threats to Validity

Threats to Validity (Internal Validity) • Our results were based on the average/median of multiple replications of each experiment. • The instrumentation used nanosecond precision, however, the real accuracy is limited to the quality of the hardware timer.

• It was not able to distinguish malloc and new. – Mmap or sbrk/brk allocations were not captured in this study

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

31/35

Threats to Validity

Threats to Validity (External Validity) • The web browsers Lynx, Elinks, Uzbl and Midori are not so popular like Mozilla Firefox and Google Chrome. – Midori is the closest one to the most popular applications.

• Retention times suffer influence of hardware factors. – This influence may not be so substantial, since the results in [2], which used a different machine setup, corroborates our results.

• The study was desktop driven. • Experiments were based only on one type of workload.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

32/35

Conclusion

Conclusion • The experimental evidences showed that different realworld applications share patterns with respect to DMA. • Some of these patterns were consistently more observed in applications of the same category. – Correlation with the workload type.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

33/35

Conclusion

Our key findings include… • Allocation sizes ≤ 128 bytes were predominant (85%). • Short duration allocations (< 100 milliseconds) corresponded to 85.06% of all allocations analyzed. • Increasing growth pattern of non-deallocated memory blocks observed in 11 out of 15 (73.33%) applications.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

34/35

Thank You! Vinícius Fonseca Maciel [email protected] http://hpdcs.facom.ufu.br/

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

35/35

Backup Slides

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

Backup Slides

Index 1.

Consequences

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB

Consequences • Memory allocator algorithms can be optimized. • New dynamic data structures can be proposed. • Analytical studies in memory management can benefit from the found patterns creating more accurate methods. • Experimental studies can use more realistic synthetic workloads based on the discovered patterns.

6th Brazilian Symposium on Computing Systems Engineering

Nov 2016, João Pessoa, PB