Detection of Precise C/C++ Memory Leakage by ... - CiteSeerX

21 downloads 381 Views 374KB Size Report
Warangal, India. [email protected] ... Warangal, India guru_rao_cv@hotmail. ... management require the programmer to manually deallocate memory ...
International Journal of Research in Computer and Communication Technology, Vol 2, Issue 10, October- 2013

ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156

Detection of Precise C/C++ Memory Leakage by diagnosing Heap dumps using Inter procedural Flow Analysis statistics S.Poornima SR Engineering College Warangal, India

[email protected]

Dr. C.V. Guru Rao

S.P. Anandaraj

Professor, Dept. of CSE SR Engineering College Warangal, India

Sr, Asst. Prof., Dept. of CSE SR Engineering College Warangal, India

[email protected]

Abstract-Memory leak is a time consuming bug often created by C++ developers. Detection of memory leaks is often tedious. Things get worst if the code is not written by you, or if the code base is quite huge. The most difficult coding bugs such as Memory Corruption, reading uninitialized memory, using freed memory, are challenging in recognizing and fixing due to the delay and non-determinism linking the error. Detecting memory leaks is challenging because real-world applications are built on multiple layers of software frameworks, making it difficult for a developer to know whether observed references to objects are legitimate or the cause of a leak. Our aim is to build a fast and feature rich c Heap Analyzer that helps the user in finding memory leaks and to reduce memory consumption. By using the Heap analyzer. Productive heap dumps with hundreds of millions of objects and the retained sizes of objects can be calculated quickly [2]. This analyzer also prevents the garbage collector from collecting objects, run a report to automatically extract heap leak suspects. The Heap Analyzer allows the users in finding the possible heap leak areas in various C/C++ applications through its Context flow analysis and heap dump analysis. Our approach identifies not just leaking candidates and their structure, but also provides aggregate information about the access path to the leaks Keywords: Heap Dumps, Allocation/Deallocation, Memory Leakage, Dynamic Linkage, Heap linking and execution, Static and dynamic analysis

I. INTRODUCTION Memory leaks are challenging to identify and debug for several reasons. First, the observed failure may be far removed from the error that caused it, requiring the use of heap analysis tools that examine the state of the reachability graph when a failure occurred [4]. Second, real-world applications usually make heavy use of several layers of frameworks whose implementation details are unknown to the developers debugging

www.ijrcct.org

[email protected]

encountered memory leaks. Often, these developers cannot distinguish whether an observed reference chain is legitimate (such as when objects are kept in a cache in anticipation of future uses), or represents a leak. Third, the sheer size of the heap — large-scale server applications can easily contain tens of millions of objects — makes manual inspection of even a small subset of objects difficult or impossible. Existing diagnosis tools are either online or offline. Online tools monitor either the state of the heap or accesses to objects in it, or both. They analyze changes in the heap over time to detect leak candidates, which are “state” objects that have not been accessed for some time. Online tools are not widely used in production environments, in part because their overhead can make them too expensive, but also because the need to debug memory leaks often occurs unexpectedly after an upgrade or change to a framework component, and often when developers believe their code has been sufficiently tested. Offline tools use heap snapshots, often obtained post-mortem when the system runs out of memory. These tools find leak candidates by analyzing the relationships, types, and sizes of objects and reference chains. Most existing heuristics, however, are based solely on the amount of memory an object retains and ignore structural information. Where structural information is taken into account, it often relies on prior knowledge of the application and its libraries [10]. Languages with explicit memory management require the programmer to manually deallocate memory blocks that are no longer needed by the program, and memory leaks are a common problem in code written in such languages. This paper majorly contributions the memory leaks via heap dumps and its consequences. Heap areas are defined by objects,

Page 1041

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 10, October- 2013

arrays and classes [7]. During the program execution, the Garbage Collector allocates areas of storage in the heap, an objects continues to be active while a reference to it exists somewhere in the active state, therefore the object is reachable. When an object creases to be references from the active state, it becomes garbage and can be reclaimed for reuse [11]. When this reclamation occurs, the Garbage Collector must process a possible finalizer and also ensure that any internal Memory resources that are associated with the object are returned to the pool of such resources. At certain point of time, during accessing the memory, the heap dump becomes the snapshot of memory process. The snapshot consists of information about C++ objects and classes in allocated heap, which consists of various types of data of it, when trigging occurs [2] [6]. During the given moment of a snapshot, a heap dump doesn’t have information about when and where an object was allocated in the program (in which method).

program into memory, it is structured into three types of memory, termed as segments: the text segment, the stack segment, and the heap segment [2]. The text segment represents machine language of the program control lines are executed, which contains all functions setting the user and system defined program. The text segment is also termed as code segment, because it contains the program’s compiled code residing in itself. The compiler (gcc) generates an executable program, is represented in the following structure of the memory. High address

Heap

The above criteria’s defined the requirements of heap dumps allocation in memory.





If there is a system which is crashing sporadically with an OutOfMemoryError, then analyzing an automatically written heap dump can be a very easy way to find the root cause of the problem It helps in analyzing the footprint of memory in user application and also helps to find which are the biggest structure, redundant data structures, finds space wasted in unused collections and more. Most of the Leak Detecting techniques depends on analysis of the objects activities, such as allocation and garbage collection, which are complex in implementing using heap dumps. This helps the user in finding the reason behind too many garbage objects produced during a certain operation. II. MEMORY ANALYZER

A user program is assigned for execution and it is loaded into memory. While loading the

www.ijrcct.org

fd Stack

A. Necessity of Heap Dumps:



ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156

Low address

Uninitialized data Initialized data Text

Initialized by zero by exec Read from program file by exec

Fig.1 Memory Organization of a Typical Program The figure.1 shows the memory organization of a typical program, which consists of the following  





Code executable or binary code residing in Code Segment. Data Segment is partitioned as initialized data segment, which consists of all global, static and constant data and Uninitialized data segment, which consists of uninitialized variables stored in BSS. During program execution, calloc and malloc functions allocates memory at runtime, the structure is called as heap, whenever the heap size has to be increases, again calloc and malloc functions are used. Stack is used to store local variables defined in the program and is used in functions for passing argument along with the return address of the instruction, which is to be executed after the function call is over.

Page 1042

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 10, October- 2013

At runtime, the process’s virtual address space is occupied by stack and heap at opposite ends. Using setrlimit(RLIMIT_STACK…), the stack size can be automatically increased upto a size defined by the kernel. The heap size can be increase by invoking brk() or sbrk() system calls, this allows the memory to map more segments or pages of memory into process’s virtual address space. Actually, the stack and heap implementation usually increases the runtime or operating system clock rate. For example, the games and larger applications , generates their own memory allocations leads to performance critic, since they occupy a bulk memory from heap and utilizes it completely to avoid dependence on operating system for memory allocation and execution. A. Criteria’s of Creating and accessing Heap dumps The heap contains a linked list of used and free blocks. New allocations on the heap (by new or malloc) are satisfied by creating a suitable block from one of the free blocks [14]. This requires updating list of blocks on the heap. This metainformation about the blocks on the heap is also stored on the heap often in a small area just in front of every block 10. The heap consists ADT of used and free segments of a Linked List ADT. While creating a heap, malloc function is used for new allocations by constructing a user defined segments from memory availability. Therefore, updations are needed on the heap segment allocations and deallocations. The heap dumps consists of metadata is stored on the allocated area of memory before each block. While structuring heap, the below considerations are needed,

www.ijrcct.org



 

 



ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156

While starting the application, the heap size is set, but it can be increases as space is required by using allocator, it generates memory from the OS. Heaps are structures and stored in Computer Static Memory. Heap Variables must be deleted manually to prevent its scope. The freeing of data is done by using delete, delete[] or free function. Heap variable are time consuming allocation process than the stack. Block of data required for the user programs are used on demand. The blocks can be fragmented whenever large chunks of allocations arises. Heap structure is highly recommendable when user does not know the size of data needed at runtime.

For example, consider an following example code on heap ADT: #include Int x; /*static storage Void main() { Int y; /* dynamic storage */ Char *str; Str=malloc(100); /* allocates 100 bytes of dynamic Heap storage*/ y=foo(23); free(str);/* deallocates 100 bytes if dynamic heap storage */ } Int foo(int z) { Char ch[100]; /*ch is dynamic stack storage */ If(z==40) foo(9); return 3; /* z and ch are deallocated from stack and 3 is pushed on stop of stack */

Page 1043

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 10, October- 2013

ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156

Fig.2 Representation of Compiling and Linking of heap dumps During Program Execution

1) Mismatched Allocation/Deallocation

In most of the C Applications, the system performance depends upon the memory consumption considerably. Memory Leaks are considered as one of the most common problem arises in memory, responsible for decrease in performance. The C/C++ application has Garbage Collection(GC), so memory leaks should not occur,because GC cleans up unused objects which are are not referenced any longer. But yet, the objects which are not used are still referenced, GC doesn’t removes it, due to this memory leak problem arises. Apart from memory leaks, there are certain memory problems encountered in fragmentation of memory, objects invocation, and tuning. In such cases, these problems leads the application to demolished with OutOfMemory exception. Consider the following example, the str variable is allocated with 540 segments. It consumes large chunk of memory,

char *str=(char*) malloc(540); return; In the above code, the character variable str is declared and allocated, but it is not freed. This kind of coding often leads to memory leaks and if it occurs most often, it causes the application with out of memory leading to premature termination, called as crash. B. Heap Dump Memory Analysis

www.ijrcct.org

The allocation/deallocation error occurs when there is an attempt to deallocate a function which is logically not allocated in the declaration defined. char *s =(char*) malloc(5); delete s; This error can be avoided by defining the right deallocation. The new[] function used in C++ is used for allocating the memory and delete[] function is used for freeing the memory. 2)Missing allocation ` In a program, calling a memory which has already been freed is called missing allocation error, also named as repeated free error or double free error. char* pstr = (char*) malloc(20); free(pstr); free(pstr); * Results in Missing Allocation * 3)Uninitialized Memory Access Whenever an uninitialized variable is read again in the program, the error is called uninitialized memory access Char *str=(char*) malloc(512); Char d=str[0]; Void val() {

Page 1044

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 10, October- 2013

}

int p; int q=p*4;

* uninitialized read of variable p *

This error can be avoided by using initialized variables always. 4) Cross Stack Access This error occurs, when a thread accesses stack memory of a different thread. Void main() { int *x; CreateThread(., thread#1,.); CreateThread(.,thread#2,.); } Thread #1 { int x[1024]; x=y; y[0] =1; } Thread #2 { * Stack Crossed * *x=2; } The stack crosses can be avoided by defining and using global variables. III.MEMORY LEAKS In computer Science Application, the problem occurs during the program execution, when it tries to consume larger chunk of memory, and is unable to release it is called Memory Leak, commonly called as leakage. A Memory leak can be normally detected and analyzed in the source code by programmers with access. The computer performance can be reduced by a Memory Leak with more consumption of memory. In worst cases, chunk of memory can be allocation, due to that parts of system or resources stops working properly, or it leads to application failures or slows down the system due to thrashing [15]. The memory leak occurs due to dynamic allocation of memory which are unreachable. To avoid this, garbage collectors provides a solution, can be integrated to any programming languages as builtin feature.

www.ijrcct.org

ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156

Example C code is represented to demonstrate a memory leaks by losing the pointer to the allocated memory. #include int main(void) { /* this is an infinite loop calling the malloc function which * allocates the memory but without saving the address of the * allocated place */ while (malloc(50)); /* malloc will return NULL sooner or later, due to lack of memory */ return 0; /* free the allocated memory by operating system itself after program exits */ } The memory allocation function, malloc(), is called inside the program loops, it fails when using it without saving its address, when memory is unavailable to the user program. It is because, the allocation address is not stored in the memory, so it cannot be free the prior allocated blocks. Consider, the Operating Systems delays memory allocation, until it utilizes. IV. EXPERIMENTATION The goal is to thoroughly revise the performance of the user programs in the programming environment. To attain sensible solution, the garbage collector is compiled with internal debugging mechanisms and evaluation flags are designated to intimate the collector to generate statistics about the program. By this procedure, the solutions will be realistic and required statistics will be produced. The Compiler(namely GCC Version 4.1.0) is used for compilation with the below criterias: > home/gcc-repo/configure --prefix=home/gccprefix--enable-languages=c,c++,java Searching for memory leaks with tcmalloc is very simple — you need to link program with this library, and run it as in following example: # HEAPCHECK=normal ./your-program To compile and execute programs with a version of GCJ that uses collector as the sole garbage collector, we run the following commands.

Page 1045

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 10, October- 2013

# LD_PRELOAD=/usr/lib/libtcmalloc.so.0.0.0 HEAPCHECK=normal ./your-program When the program is linked and executed, it generates a report about memory leaks identified. The Command LD_PRELOAD links dynamically the user program with libraries. This command needs to be executed before compiling and executing the user programs. For the purpose of experimentation, the program is linked and executed under various criterias. While program’s execution, the linked library used the following environment variable, defined for ensuring the memory levels:  

HEAP_CHECK-REPORT – true or false, by default: true, defined to print the report in the program HEAP_CHECK_STRICT_CHECK – true or false, by default:true, it selects the

 

 

ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156

function used to check sameHeap or NoLeaks. HEAP_CHECK_IDENTIFY_LEAKS – true or false, by default:false, it gets the addresses of leaked objects. HEAP_CHECK_TEST_POINTER_ALIG NMENT – true or false, by default:false, identifies the memory leaks due to nonaligned pointers. PPROF_PATH – used to specify path to pprof utility. HEAP_CHECK_DUMP_DIRECTORY – it specifies directory path, where temporary files are created.

The sample output of user program is shown in the below Fig.3, which shows the heap memory leaks occurred in the given user program, with data objects related to it.

Fig.3 Sample Generated Report for Identification of Heap memory Leak in user Program

The HEAPCHECK environment variable sets level of checks, that will applied during execution. This variable can has one of four values

www.ijrcct.org

— minimal, normal, strict and draconian — from the simplest one to strictest, that could lead to slow execution of program. Besides this, there are also

Page 1046

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 10, October- 2013

two additional modes: as-is — when user can specify which checks should be executed, and local — when checks are performed only for code, that explicitly marked for checking (this is performed by adding calls to GPT's functions to source code). After the finding of memory leak (as in our example above) library terminates program, and prints call stack for functions, that lead to this memory leak. In our example, memory leak is in main function, at 106th line of code in file testhashes.cpp. The heap checker automatically prints basic leak info with stack traces of leaked objects' allocation sites, as well as a pprof command line that can be used to visualize the call-graph involved in these allocations. The latter can be much more useful for a human to see where/why the leaks happened, especially if the leaks are numerous. V.CONCLUSION A novel memory leak detection algorithm is represented based on solving Boolean flag criterias. Performance and scalability is attained to the expected levels by setting Boolean flags definition to each function to generate a precise statistics. The solutions shows that the systems generates realistic and statistics information about the heap memory leaks identified. The results show that, memory can be used for on-in the fly detection of memory leaks and memory corruption during production runs. Moreover, a new methodology is presented, that uses proper memory usage behaviour analysis for heap memory leakage detection. This work can be extended by comparing with other tools and also investigation can be done with more debugging problems in real world applications that contain well documented bugs.Future work will focus on optimizations to reduce the run-time overhead.

ACKNOWLEDGEMENT Authors would like to express sincere thanks to Department of Science and Technology (New Delhi) for their financial support to carry out this work under project grant No. SR/WOS-A/ET24/2012.Further, their sincere feelings and

www.ijrcct.org

ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156

gratitude to Management and Principle of SR Engineering College for their support and encouragement to carry out the research work.

REFERENCES [1]. D. L. Heine and M. S. Lam. A practical flowsensitive and context-sensitive C and C++ memory leak detector. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI), pages 168–181, Jun 2003. [2]. P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas. iWatcher: Efficient architecture support for software debugging. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA), pages 224–237, Jun 2004. [3]. C++0x/C++11 Support in GCC. http://gcc.gnu.org/onlinedocs/libstdc++/manual /bk01pt04ch11.html (ref.19th May 2012). [4]. David Drysdale. High-Quality Software Engineering. Lulu.com, 2007. Mel Gorman. Understanding the Linux Virtual Memory Manager. Prentice Hall, 2004. [5]. D.R. Chase, M. Wegman, and F. Zadeck. Analysis of pointers and structures. In SIGPLAN Conf. on Prog. Lang. Design and Impl., pages 296{310, 1990. [6]. N.D. Jones and S.S. Muchnick. Flow analysis and optimization of Lisp-like structures. in S.S. Muchnick and N.D. Jones, editors, Program Flow Analysis: Theory and Applications, chapter 4. Prentice-Hall, Englewood Cli_s, NJ, 1981. [7]. Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. Operating System Concepts. Wiley, 2008. [8]. Y. Xie and A. Chou. Path sensitive analysis using boolean satis_ability. Technical report, Stanford University, Nov. 2002. [9]. D. Evans. Static detection of dynamic memory errors. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, 1996. [10].Y. Xie and A. Aiken. Scalable error detection using boolean satis_ability. In Proceedings of the 32nd Annual Symposium on Principles of Programming Languages, Jan. 2005. [11].D. Liang and M. Harrold. E_cient computation of parameterized pointer information for interprocedural analysis. In Proceedings of the 8th Static Analysis Symposium, 2001. [12].T. Xie, S. Thummalapenta, D. Lo, and C. Liu. Data Mining for Software Engineering. IEEE Computer Vol. 42(8):35–42, Aug 2009 [13].J. Engelfriet and G. Rozenberg. Graph grammars based on node rewriting: an introduction to nlc graph grammars. In Graph

Page 1047

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 10, October- 2013

ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156

grammars and their application to computer science: 4th Intl. Workshop pages 12–23, 1991 [14].N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI ’07 , pages 89–100, 2007 [15].M. Jump and K. McKinley. Cork: dynamic memory leak detection for garbage-collected languages. In POPL ’07 , pages 31–38, 2007. [16].S. Cherem, L. Princehouse, and R. Rugina. Practical memory leak detection using guarded value-flow analysis. In PLDI ’07 , pages 480–491, 2007

www.ijrcct.org

Page 1048

Suggest Documents