Using Program Profilers for Reusable Component ... - CiteSeerX

7 downloads 5200 Views 245KB Size Report
record program execution. Call graph, statistical sampling, custom logging code, and call-level execution timing are some of the most common. In the sections ...
Using Program Profilers for Reusable Component Optimization and Indexing William B. Frakes

Morgan Pittkin

Reghu Anguswamy

ISASE Sterling, VA, USA

Computer Science, Virginia Tech. Falls Church, VA – 22043 USA +1-703.538.3789

[email protected]

[email protected]

Software Reuse Lab Computer Science, Virginia Tech. Falls Church, VA – 22043 USA +1-703.538.3789

ABSTRACT This paper describes an exploratory study of how software profilers can be used to help design, evaluate, and index reusable components. Testing of a program using a sampling profiler, gprof, showed that longer execution times are needed to obtain reliable measures of execution speed. We discuss how to use profiling data for indexing reusable components.

Categories and Subject Descriptors D.2.13 [Reusable Software]: Reusable libraries

General Terms Measurement, Documentation, Experimentation.

Performance,

Design,

Keywords Gprof profiler, component optimization, dynamic analysis, software reuse.

1. INTRODUCTION One aspect of reuse design is component optimization for execution speed. There is no agreed upon definition of a reusable component [14], and we use the term here in the general sense of a program intended for use in multiple contexts. Experience has shown that if a reusable component is much slower than an equivalent one use component, people will avoid using it. A rule of thumb from discussions at the WISR reuse workshops [13], estimates 25% slower as the point where reuse will be avoided. In some domains, such as real time systems, execution speed may be critical and speed penalties of less than 25% may prevent reuse.

[email protected] from the true value of the measure. For example, a scale that always weighs one pound too heavy or a line counting tool with an off by one error that always under-reports the true value by one line giving a 100 line program a measured value of 99 lines. Noise is random, that is errors have an equal probability of occurring above or below the true value. In this study we focus on software profiler measurement error from noise. As with any measuring device, there are limits on the accuracy of software profilers. Measuring very small execution times can give relatively large measurement errors, for example. Results can also vary widely if measurements are done on a heavily loaded system. For a clear view of execution times it is usually necessary to use averages derived from multiple measurements collected on dedicated or lightly loaded machines. It is known that multiple executions of a profiler on the same program with the same input can give different answers. That is, the measurement reliability of profilers is less than 1. In this paper, we will show how to quantify the variability of profiler measurements and use this information for component design, testing, and indexing.

2. HOW TO OPTIMIZE COMPONENTS In this section, we discuss question one, how to optimize a component for execution speed. It has long been known that the execution times for parts of programs follow a Pareto distribution [7]. That is, a small part of the code of a program will typically account for most of its execution time. As a consequence, careful measurement of execution times for program subsections is necessary for correct optimization. Bentley suggests the following method [3].

This raises several important questions.

1. build the system right using good software engineering techniques

1. How should a reusable component be optimized for execution speed?

2. measure the system for speed, memory usage, etc.

2. How does a potential reuser determine if a given component meets execution speed requirements? 3. What sort of indexing information regarding execution speed should be available for a component? There are many measurements of software components that may be useful in determining their potential reusability [5]. In this paper, we examine measures of program execution speed derived from software profilers. All measurements have errors. The errors can be of two types-bias and noise. Bias causes a consistent mistake in one direction

3. then, if needed, optimize carefully 4. document the optimization with pre-optimized code Developers use many techniques for optimizing components to run faster. But without some kind of guidance, an engineer may spend time optimizing a function that only accounts for a small part of the overall execution time of a component, resulting in no discernible improvement. On the other hand, small changes, like reading a large file from disk in batches instead of line-by-line, can result in significant performance gains. The problem is figuring out where to look for places to make these optimizations. It is almost impossible to tell just by looking at the code where the most time is being spent.

A better way to find these ‘hot spots’ in the code is to use dynamic software measurement tools called profilers. Profilers monitor the execution of a program as it is running, keeping track of which functions call which others and the amount of time spent in each. Armed with this information, the developer can see where the program is spending most of its time, and focus on optimizing those sections of code. In some cases a change as small as a single line of code can improve execution speed enormously.

2.1 Types of Profilers There are several techniques profilers can use to monitor and record program execution. Call graph, statistical sampling, custom logging code, and call-level execution timing are some of the most common. In the sections below, we explain how each work. Later, we look more closely at the statistical sampling technique, and analyze the accuracy of its results as used in the profiling tool gprof.

2.1.1 Call-graphs A call-graph shows which functions called which other functions, and how often, during execution of a program. For most profilers, including gprof, this is an exact measurement. Given the same input for a deterministic program, this result should be the same every time. This can be useful for understanding the execution flow, especially for a developer who is beginning work on software that other people wrote. But it is not very useful for optimization, because it doesn’t give any information on how much time each function took to complete its task.

2.1.2 Call-level Monitors Another technique is to have a monitor that hooks into the program and records every function call, and how long it takes the function to run. This gives more exact timing information, but there are several problems. First, it requires compiling extra code into the program and can be difficult to implement. Second, it adds overhead to the execution, slowing down the system being measured and possibly skewing the results. And third, because it gathers and records so much information, especially when used on a complex software system, using this technique cuts down the time that a program can run before the data becomes unmanageable. As an example, a large software system can easily contain tens of thousands of functions, and perform thousands of function calls every couple of seconds. Using this technique, a tool would record the run time for each and every call the system makes, as well as the information about who was calling whom. If the system was being monitored over the course of hours, or even weeks, it would be difficult to manage that amount of profiling data.

2.1.3 Custom Profiling Code As another option, when developers want to focus on profiling small, specific areas of a software system, they can write custom monitoring code. In the example below, the developers have built custom monitoring code into the program. #include “logging.h” void sort(int* array, int size) { log.begin(“sort”); //… function does its work log.end(“sort”);

} For each function they want to monitor, they can just include the logging header file and use the functions provided to record function start/end times. This can give accurate information about runtime in small parts of the code. Used sparingly, it requires little computational overhead and creates small log files. But it is timeconsuming to implement and maintain, and does add some overhead to the program’s execution.

2.1.4 Statistical Sampling Profilers using statistical sampling check what the program is doing at fixed intervals, say every hundredth of a second [1]. After the program has ended, the profiler counts how often each part of the program was found active and estimates the total time spent in each function. The approach to profiling has been criticized, [8] [10] [11]. Improvements to this type of profiler have been proposed [9]. While this paper focuses on C and C++, statistical sampling profiling has also been used in other languages such as Java [12]. What we can do to counter this is to understand that if the sampling period is small compared to the run-times of the functions being examined, the statistical error will be reduced and we can rely more on the results obtained Another problem is estimating the average run time of each function. If a function is called 10 times, and the total run time is measured as 100 seconds, then the function is estimated to take 10 seconds each time it is called. That makes sense, but what if the results vary depending on who calls that function. This technique can work in some cases, but the possibility of this kind of error must be accounted for. What an engineer might do is use the profiler to get an idea of the functions that are taking longest, and then using another technique like inserting some custom profiling code to get more detailed information. In the next section we describe in detail the usage and characteristics of a tool that uses the statistical sampling technique, gprof.

2.2 Gprof Gprof is a popular cross-platform profiling tool that runs on many versions of Unix and Linux. Gprof uses call-graph generation and statistical sampling techniques [6][4]. When code is compiled with the gprof option, the compiler automatically inserts the necessary code to record and output the profiling information. When the program runs, the extra code collects all of the data to complete the gprof profile. When it exits, the code outputs that information into a file called gmon.out. Then the program gprof reads the gmon.out file, matches the information with symbols in the executable file, calculates all of the estimated values for the profile, and outputs the profile to the console. One of the first lines in the profile gives the sample size. For all of our tests, the sample size is 0.01 seconds. This means that every 0.01 seconds, the gprof profiling code checks to see what the program is doing and records it. Gprof provides a summary report. The first section is the flat profile. This gives individual statistics on each of the functions that were monitored by gprof. The figure of interest is self seconds. This is the amount of time spent in the function as

estimated by gprof, not including time spent in subroutines (child functions), or time spent waiting for things like CPU time, memory access, or IO. The second section is the call graph. This lists which monitored functions called which other monitored functions, how many times they called them, and the estimates for the amount of time spent in the child functions. Estimates for time spent in child functions can vary widely.

3. Testing Our test program generates a list of random numbers, sorts them using bubblesort, then calculates the mean and standard deviation of the numbers. We ran the test program on input sizes ranging from 2,500 numbers up to 150,000. The data was generated using the C++ random number generator, but because it was seeded with the same value the generated data was the same each time. We calculated the mean run time of the functions, the standard deviation, and the standard deviation as a percentage of the mean. We ran the program on a PC running Ubuntu Linux. The machine has an Intel Core2 Duo processor running at 2.66 GHz, and 2GB of RAM. One of the unknown variables for software that is being deployed at customer sites is the environment on which it will run. Sometimes a server system will be given a powerful, dedicated machine with little or nothing else to do but run that server. Other times, it may be sharing a machine with several other programs, all competing for limited resources like processing time, disk access, and network bandwidth. Gprof does not take these things into account. The profiling data is only gathered while the program is running, not while it is waiting for a disk access to finish, or for its turn on the cpu. This is both a strength and a weakness. The strength is that if we simply want to analyze how fast the code is, then we don’t want to worry about whether another program is using the machine and skewing the results. Ultimately, however, the most important measure of performance is going to be how fast it runs when it is deployed on a customer’s machine. If that machine has a lot of processes competing for disk access, it will slow down disk access for your program. If the code to read and write disk data is written poorly, it could seriously degrade that performance, even if during in-house tests the disk access didn’t make a big difference to overall run time. Gprof does not measure this type of performance liability. We tested this by writing a test program that did some random data analysis, then adding a section where it wrote the data to a hard drive and randomly read back chunks to analyze it (the random access slows down disk performance by causing cache misses). The overall execution time went up by a factor of 10, but the gprof profile changed very little. So we have not tested this problem further here, but in many cases it must be taken into consideration, and other profiling tools can be used.

3.1 Results We ran the test program, with the gprof code compiled in, using test data sizes ranging from 2,500 to 150,000 numbers. For each input size we ran the test program 30 times. Table 1 shows, for the different input data sizes, the mean estimated run time, standard deviation and relative standard deviation for the BubbleSort routine we used. (100 ×

[(standard deviation of array X)/ (average of array X)] = relative standard deviation expressed as a percentage). Table 1. Estimated run time, and standard deviation for bubblesort

2500

Mean Runtime 0.00366

0.0048189

131.43%

5000

0.026

0.004899

18.84%

7500

0.06633

0.0065744

9.91%

10000

0.123

0.0069041

5.61%

15000

0.285

0.0084656

2.97%

20000

0.508

0.0101324

1.99%

30000

1.13867

0.008459

0.74%

40000

2.03767

0.0098939

0.49%

50000

3.178

0.01249

0.39%

75000

7.14633

0.00836

0.12%

100000

12.712

0.0090922

0.07%

150000

28.5837

0.0166302

0.06%

Input Size

SD

RSD

What we would like to know is the variability of the mean relative to the estimated run time. If a given input causes an estimated run time of 3 seconds, with a standard deviation of 2 seconds, we might not think it a very accurate measurement. But if that standard deviation of 3 seconds applied to a measurement of 20 minutes, then the accuracy, and our faith in the measurement, would be much higher. As table 1 shows, and as we expected, the longer the estimated time for a function, the more accurate the measurement will be. The accuracy is largely dependent on two things: The size of the estimated runtime for a function, and the sampling interval.

4. Consequences for component indexing In this section we discuss questions two and three. 2 How does a potential reuser determine if a given component meets execution speed requirements? 3. What sort of indexing information regarding execution speed should be available for a component ? How should information produced by a profiler be used to in the indexing of components? We suggest that the following fields should be provided: • • • • • • • •

Program Name Input type Input size Machine Specs Machine Load Mean Standard Deviation Relative standard deviation

Information like Table 1 above, should be provided to give a potential reuser information about performance over a range of inputs.

5. Conclusions This paper discussed how software profilers can be used to help design, evaluate, and index reusable components. Testing of a program using a sampling profiler, gprof, showed that longer execution times are needed to obtain reliable measures of execution speed. We discussed how to use profiling data for indexing reusable components.

6. REFERENCES [1] Anderson, Berc, et al, “Continuous Profiling: Where Have All The Cycles Gone,” ACM Transactions on Computer Systems, Vol. 15, No. 4 (Nov 1997) pp. 357390 [2] Thomas Ball and James Lazarus, “Optimally Profiling and Tracing Programs,” ACM Transactions on Programming Languages and Systems, 16(3) (July 1994) pp. 1319-1360 [3] Jon Bentley, Writing efficient programs, Prentice-Hall, Inc. Upper Saddle River, NJ, USA 1982. [4] J. Fenlason and R. Stallman, GNU gprof: the GNU pro ler, Free Software Foundation, 2000. [5] Frakes, W. and C. Terry (1996). "Software Reuse Models and Metrics: A Survey." ACM Computing Surveys 28(2): 415-435. [6] Graham, S.L., P.B. Kessler, and M.K. McKusick, gprof: a call graph execution profiler. SIGPLAN Not., 2004. 39(4): p. 49-57. [7] Knuth, D. E. (1971), An empirical study of FORTRAN programs. Softw: Pract. Exper., 1: 105–133. [8] Carl Ponder and Richard J Fateman, “Inaccuracies in Program Profilers,” Software – Practice and Experience, Vol. 18-5 (May 1988) pp. 459-467. [9] Spivey, J. M. (2004), Fast, accurate call graph profiling. Softw: Pract. Exper., 34: 249–264. doi: 10.1002/spe.562 [10] K.V. Subramaniam and M. J. Thazhuthaveetil. (1994). Effectiveness of sampling based software profilers. Software Testing, Reliability and Quality Assurance, 1994. Conference Proceedings., First International Conference on. [11] Varley, D.A., Practical experience of the limitations of Gprof. Softw. Pract. Exper., 1993. 23(4): p. 461-463. [12] J. Whaley, A portable sampling-based profiler forJava virtual machines, in: Proceedings of the ACM 2000 Conference on Java Grande, June 2000, ACM Press, 2000, pp. 78–87 [13] W. B. Frakes and D. Lea, "Design for Reuse and Object Oriented reuse Methods," presented at the Sixth Annual Workshop on Institutionalizing Software Reuse (WISR '93), Owego, NY, 1993. [14] R. Anguswamy, "Study of Factors Affecting the Design and Use of Reusable Components," PhD Dissertation, Computer Science and Applications, Virginia Polytechnic and State University, 2013.