On Debugging of Multi-threaded Embedded Software and Call Stack ...

6 downloads 7412 Views 180KB Size Report
tool was developed in Samsung Research Center in Moscow and Samsung Advanced Institute of Technology. ... widely used even in embedded devices.
D

Computer Technology and Application 3 (2012) 175-180

DAVID

PUBLISHING

On Debugging of Multi-threaded Embedded Software and Call Stack Unwinding Methods Ilya A. Mukhin1, Zoya V. Puschina1, Oleg A. Strikov1, Sang Bae Lee1, Mikhail P. Levin1, Tian Feng2 and Jaehoon Jeong3 1. Samsung Research Center, Moscow 127018, Russia 2. Samsung Electronics (China) R&D Center (SCRC), Nanjing 210008, China 3. Samsung Advanced Institute of Technology, Yongin-si 446-712, South Korea Received: November 15, 2011 / Accepted: November 28, 2011 / Published: February 25, 2012.

Abstract: Nowadays multi-core processor platforms are widely used even in embedded devices. Providing debugging of multi-threaded embedded software is a more complicated problem in comparison with usual desktop platforms due to embedded platforms limitations. Embedded resources are enough to perform only pre-defined set of applications, but not for debugging. Most of all known debugging solutions for parallel applications are intended for desktops or high-performance computers, but not for embedded systems. Another problem is that most of debugging solutions don’t give any information on a system-wide application behavior. To solve these problems and help developers to debug their multi-threaded embedded applications is a subject of Thread Visualizer. This tool was developed in Samsung Research Center in Moscow and Samsung Advanced Institute of Technology. Thread Visualizer supports based on ARM architecture platforms and Linux OS. Key words: Debug, multi-thread, embedded, stack, unwinding.

1. Introduction Nowadays multi-core processor platforms are widely used even in embedded devices. Software complexity for these platforms is rising dramatically. The software becomes more complicated and multi-threaded. In some modern embedded applications, hundreds of threads are created and run simultaneously. The complexity of debugging of such applications rise, because different threads run on different cores, share resources, face with synchronization problems, race conditions, etc. Another problem is embedded platforms’ limitations. Embedded resources are enough to perform only pre-defined set of applications, but not for debugging. Available CPU resources are about 1-5%, available RAM is about several megabytes. Corresponding author: Ilya A. Mukhin, M.S., research field: system programming. E-mail: [email protected].

Most of all known debugging solutions for parallel applications are intended for desktops or high-performance computers, but not for embedded systems. Also, most of them don’t give any information on the system-wide application behavior. To solve these problems and help developers to debug their multi-threaded embedded applications is a subject of Thread Visualizer tool. It was developed in Samsung Research Center in Moscow and Samsung Advanced Institute of Technology. Thread Visualizer supports based on ARM platforms and Linux OS. Thread Visualizer provides visualizing of hierarchy between main process and threads; synchronization dependencies; unique thread identifying including full backtrace from thread creation call, and other useful features. This essentially simplifies debugging of complex multi-threaded applications on embedded systems.

176

On Debugging of Multi-threaded Embedded Software and Call Stack Unwinding Methods

2. Thread Visualizer Thread Visualizer is a tool for debugging of multi-threaded embedded applications. It supports ARM-based platforms and Linux OS. Its architecture has a target-host type that allows overcoming the embedded resources limitations. Lightweight target part collects data that describes application behavior and sends it to host through the network connection. All heavy-weight operations like data storage, analysis and visualization operate on host. For collecting data Thread Visualizer uses System-Wide Analyzer of Performance (SWAP) engine [1]. SWAP is a profiler and performance analyzer for embedded applications also developed in Samsung Research Center in Moscow and Samsung Advanced Institute of Technology. It is based on kprobes technique [2] and provides dynamic instrumentation of kernel and user-space functions. SWAP doesn’t require application’s source code modification or re-compilation. Using SWAP engine, Thread Visualizer instruments necessary functions and collects the data on instrumented functions, such as function name, Process IDentifier (PID)/Thread IDentifier (TID) of a process, CPU number, on which function was executed, time stamp of function call, function arguments, etc. Then, processing of collected data and executed binary files and final visualization are performed. Thread Visualizer provides visualizing of hierarchy between main process and threads; synchronization dependencies; unique thread identifying; source code mapping, timing view, statistics and other features. Using Thread Visualizer developer can consider system-wide behavior of application, not only perform the number of specific operations on parallel threads, like conventional debuggers provide. Developer can see the main process, threads and synchronization objects of application and relations between them, such as hierarchy parent-child relations between processes and threads, synchronization dependencies. Via unique thread identifying, together with generally used in

Linux number identifier, also including thread function name and full backtrace from thread creation point, Thread Visualizer provides full information, where and how every thread was created, including source code mapping. Additionally, timing view feature provides visualization of the time line with segments of execution of instrumented functions for every thread. Statistics on calls of instrumented functions is provided for every thread. Some modern embedded applications create hundreds of threads and synchronization objects. Thread Visualizer is extremely useful for analysis of such applications. Thread Visualizer’s thread hierarchy, synchronization dependencies, thread identifying and source code mapping visualization are shown in Fig. 1. Timing view visualization is shown in Fig. 2.

Fig. 1 Thread Visualizer’s thread hierarchy, synchronization dependencies and unique thread identifying.

Fig. 2

Thread Visualizer’s timing view.

On Debugging of Multi-threaded Embedded Software and Call Stack Unwinding Methods

Detailed description of thread identifying feature of Thread Visualizer, development barriers, related to stack unwinding limitations and proposed solution, are given below.

3. Thread Identifying 3.1 Standard Linux Thread Identifying In Linux, every thread has unique numerical identifier: Process IDentifier (PID)/Thread IDentifier (TID). But such identification is not informative because it gives no clue on source code of particular threads. 3.2 Thread Identifying in Intel Thread Profiler To make thread identifying more transparent, Intel Thread Profiler [3] includes into the thread identifier the name of function which is started on thread creation (thread function name) together with PID/TID. For example, pthread_create POSIX API [4] accepts the address of thread function as the third argument. See an example of thread creation source code in Fig. 3. But applications can create a lot of threads with the same thread function. In this case, the thread identifier doesn’t contain enough information for the thread identification. In the next section, let’s consider the solution of this problem. 3.3 Unique Thread Identifying in Thread Visualizer To provide unique identifying of threads, let’s include into the thread identifier together with PID/TID and thread function name full call backtrace from thread creation point. To make it clear that what call backtrace is, look at C code example, shown in Fig. 4. In this example, full call backtrace from thread creation point is a chain of function symbolic names: func2, func1 and main. Including full call backtrace from thread creation point into the thread identifier gives full and unique information about every created thread. Thread Visualizer uses SWAP engine to collect stack snapshots and registers values. Then it unwinds stack snapshots to restore full call backtraces from thread creation points.

177

void* thread_function (void* arg)  {      ...  }    void father_function (...)  {      ...      int        thread_id; //thread ID      pthread_t  p_thread;  //thread structure      thread_id = pthread_create(&p_thread,        NULL, thread_function, NULL);      ...  }  Fig. 3 Thread creation source code example.

Known stack unwinding methods, their limitations and Thread Visualizer’s method are described below.

4. Stack Unwinding Methods Well-known stack unwinding methods are based on using of a frame pointer register [5] and binary file debug information on stack frame layout. But such methods have some limitations which are described below. To overcome them, a new method described below, is proposed in Thread Visualizer. 4.1 Method Based on Frame Pointer Register Using Let’s consider the method based on frame pointer register using. It is the easiest and well-known method of call stack unwinding. To use this method application should be build by gcc/g++ compiler with -fno-omit-frame-pointer option. Any level of code optimization (options -O ... -O3) turns off that option and omits using of frame pointer. The values of frame pointer register are stored in the stack frames at application’s execution. See the example of ARM assembly instructions of storing in stack and restoring of frame pointer and return address values: push   {fp, lr}  pop    {fp, pc}  Here, fp is a frame pointer register, lr is a link register (it stores return address of called function), and pc is a program counter register; the value of return address of called function is restored to it. List of

178

On Debugging of Multi-threaded Embedded Software and Call Stack Unwinding Methods

consecutive values of return addresses is a backtrace. An example of stack of executing process for code shown in Fig. 4 at the thread creation point with stored frame pointer values is shown in Fig. 5. The stack unwinding code example for stack with stored frame pointer values is shown in Fig. 6. Limitations of above mentioned method consist in storing of excess register values (frame pointer) that decrease application performance. This makes impossible using of such a method in embedded platforms. Other drawbacks of method are: in some void func2()  {  ...  pthread_create(...)  }  void func1()  {  func2();  }  void main()  {  func1();  }  Fig. 4

C code example.

Fig. 5 Example of the stack of executing process with stored frame pointer values. #include   void backtrace(void* fp)  {  if (fp  stack_start_address  || fp 

Suggest Documents