Memory Management for High-Performance Applications - MavDISK

4 downloads 14411 Views 485KB Size Report
4. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science. Memory Management 101. ▫ Memory allocators: ▫ Accept requests for ...
High-Performance Applications

Memory Management for High-Performance Applications



 

Emery Berger Univ. of Massachusetts, Amherst

Web servers, search engines, scientific codes C or C++ Run on one or a cluster of server boxes

software compiler



Needs support at every level

runtime system operating system hardware

UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science

UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science

New Applications, Old Memory Managers

Memory Management 101

Applications and hardware have changed   



Multiprocessors now commonplace Object-oriented, multithreaded Increased pressure on memory manager (malloc, free)

Memory allocators: 





The Heap: 

UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science

 





Multiprocessors now commonplace Object-oriented, multithreaded Increased pressure on memory manager (malloc, free)



But memory managers have not kept up 

4

Current Memory Managers Limit Scalability

Applications and hardware have changed 

The pool of unused memory

UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science

3

New Applications, Old Memory Managers (cont) 

Accept requests for memory and return virtual address to a block of memory of the requested size Accept requests to free memory, which allows virtual memory to be reused

As we add processors, program slows down Caused by heap contention

Runtime Performance

Speedup



2

14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Ideal Actual

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Number of Processors

Inadequate support for modern applications

Larson server benchmark on 14-processor Sun UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science

5

UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science

6

1

The Problem 

Implementing Memory Managers

Current memory managers inadequate for high-performance applications on modern architectures 

Memory managers must be



 

Limit scalability & application performance

Heavily-optimized C code



   ⇒

UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science

Real Code: DLmalloc 2.7.2





Allocator-induced false sharing

UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science



Solution: provably scalable memory manager



Extended memory manager for servers 

8

Heap Layers framework Contention, false sharing, space

Hoard Reap 10

Multiple Heap Allocator: Pure Private Heaps 

Reduce contention but cause other problems: 

Problems with current memory managers

UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science

9

One heap per processor:



11

Key:

malloc



free

puts memory on its local heap

STL, Cilk, ad hoc

= in use, processor 0 = free, on heap 1

gets memory from its local heap



Multiple heaps [Larson 98, Gloger 99]

P-fold or even unbounded increase in space





Impractical



Building memory managers



Concurrent single heap [Bigler et al. 85, Johnson 91, Iyengar 92] 





Previous work for multiprocessors 

Hard to write, reuse, or extend

This Talk

Problems with General-Purpose Memory Managers 

Hand-unrolled loops Macros Monolithic functions

UNIVERSITY OF M ASSACHUSETTS, A MHERST • Department of Computer Science

7

#define chunksize(p) ((p)->size & ~(SIZE_BITS)) #define next_chunk(p) ((mchunkptr)( ((char*)(p)) + ((p)->size & ~PREV_INUSE) )) #define prev_chunk(p) ((mchunkptr)( ((char*)(p)) - ((p)->prev_size) )) #define chunk_at_offset(p, s) ((mchunkptr)(((char*)(p)) + (s))) #define inuse(p)\ ((((mchunkptr)(((char*)(p))+((p)->size & ~PREV_INUSE)))->size) & PREV_INUSE) #define set_inuse(p)\ ((mchunkptr)(((char*)(p)) + ((p)->size & ~PREV_INUSE)))->size |= PREV_INUSE #define clear_inuse(p)\ ((mchunkptr)(((char*)(p)) + ((p)->size & ~PREV_INUSE)))->size &= ~(PREV_INUSE) #define inuse_bit_at_offset(p, s)\ (((mchunkptr)(((char*)(p)) + (s)))->size & PREV_INUSE) #define set_inuse_bit_at_offset(p, s)\ (((mchunkptr)(((char*)(p)) + (s)))->size |= PREV_INUSE) #define MALLOC_ZERO(charp, nbytes) \ do { \ INTERNAL_SIZE_T* mzp = (INTERNAL_SIZE_T*)(charp); \ CHUNK_SIZE_T mctmp = (nbytes)/sizeof(INTERNAL_SIZE_T); \ long mcn; \ if (mctmp < 8) mcn = 0; else { mcn = (mctmp-1)/8; mctmp %= 8; } \ switch (mctmp) { \ case 0: for(;;) { *mzp++ = 0; \ case 7: *mzp++ = 0; \ case 6: *mzp++ = 0; \ case 5: *mzp++ = 0; \ case 4: *mzp++ = 0; \ case 3: *mzp++ = 0; \ case 2: *mzp++ = 0; \ case 1: *mzp++ = 0; if(mcn