Sep 19, 2007 - OS Jitter: interference due to scheduling of daemons, handling of interrupts. Can cause up to 100% performance degradation at 4096 proc ...
Identifying Sources of Operating System Jitter Through Fine-Grained Kernel Instrumentation Pradipta De, Ravi Kothari, Vijay Mann IBM Research, New Delhi, India IEEE Cluster 2007
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Measuring OS Jitter: Problem OS Jitter: interference due to scheduling of daemons, handling of interrupts Can cause up to 100% performance degradation at 4096 proc (Petrini, et.al SC 03) Low jitter systems: Use of specialized kernels on compute nodes
Identification of jitter sources is important for creation of light weight versions of commodity operating systems tuning “out of the box” commodity OSes for HPC applications detecting new jitter sources that get introduced with software upgrades
Very little information available about the biggest contributors to OS Jitter Few tools available that can measure impact of various sources of OS Jitter Administrators resort to existing knowledge for tuning systems error prone as new sources of OS Jitter get introduced when systems are patched 2
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Contributions of this paper Design and implementation of a tool that can be used to identify sources of OS jitter and measure their impact compare various system configurations in terms of their jitter impact detect new sources of jitter that get introduced with time
detect patterns of scheduling (that can lead to jitter) Experimental results that point to the biggest contributors of OS jitter on off the shelf Linux run level 3 (Fedora core 5) Validation of the methodology through introduction of synthetic daemons 3
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Methodology 1.
Instrument the kernel to record start and end times (in memory) for all processes and interrupts – a kernel patch (Linux 2.6.17.7 and 2.6.20.7 kernels)
2.
Expose the kernel data structures to user level applications through device driver memory map
3.
Run a user level micro benchmark that executes several rounds
4.
Analyze the user level histogram and the scheduler/interrupt trace data generate a master histogram where the samples consists of
runtime of all the processes that caused the user level benchmark to get descheduled
runtime of all interrupts that occurred when the user level benchmark was running
Ideally, the user level histogram and master histogram should match (if all interruptions experienced by the user level benchmark are due to a context switch or an interrupt being handled)
4
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
User level micro benchmark iptr=mmap(interrupt_device_file); sptr=mmap(scheduler_device_file); /* start of kernel-level tracing, iptr,sptr=>memory mapped pointers interrupt and scheduler device driver files*/ start_scheduler_index = sptr->current index, start_interrupt_index = iptr->current index; for (i = 0 to N) do
Step 1
ts[i] = rdtsc(); /* critical section */ end for end_scheduler_index = sptr->current index, end_interrupt_index = iptr->current index; for start_scheduler_index : end_scheduler_index do read_and_print_to_file(start time, end time, process name);
Step 2
end for for start_interrupt_index : end_interrupt_index do read_and_print_to_file(start time, end time, interrupt name); end for /* calculation of difference of successive samples - timestamp deltas*/ for i = 0 to N-1 do ts[i] = (ts[i+1]-ts[i]);
Step 3
end for /* generation of user level histogram from the user-level delay data */ add_to_distribution (ts); 5
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiments – 2.8 GHz Intel Xeon, 512 KB Cache, 1GB RAM Experiment 1 – identifying all sources of jitter on Linux run level 3 x axis – interruptions in us, y axis – logarithmic function of the number of samples in a bucket (frequency)
Parzen Window distribution for multiple processes 1 0
user_level_benchmark
log10 [F(X)]
master_from_tracing -1 -2 -3 -4 1
10
100
1000
10000
X: time in us 6
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiment 1 – overall picture Noise Source
Highest Interruption (us)
mean (us)
frequency
total jitter (us)
std dev (us)
total jitter %
timer
9.74
1042.05
76997
14.7
1131522.36
828.1
63.27
hidd
4.52
364.31
3404
49.17
167375.21
3450.97
9.35
python
4.52
220.17
1337
106.58
142494.64
5985.53
7.96
haldaddonstor
4.52
364.31
1522
61.03
92892.55
3708.57
5.19
10.92
64.74
3364
21.57
72569.8
1256.24
4.05
events0
4.65
101.33
433
81.18
35152.02
4452.23
1.96
eth0
7.12
80.35
1122
24.17
27115.4
1584.66
1.51
automount
4.98
173.85
156
162.72
25383.71
8703.52
1.41
sendmail
5.19
364.31
159
146.69
23323.57
8714.47
1.30
pdflush
4.93
220.17
161
100.7
16213.35
6389.97
0.90
idmapd
6.27
364.31
147
71.07
10446.99
5064.51
0.58
init
5.67
160.75
156
56.65
8836.82
3210.1
0.49
kblockd0
5.83
220.17
82
99.11
8127.18
6298.97
0.45
kjournald
0.92
154.15
181
39.61
7169.62
3531.18
0.40
4.7
110.16
705
9.71
6843.7
787.98
0.38
348.69
363.31
13
353.48
4595.28
19435.05
0.25
watchdog0
0.92
7.67
735
5.47
4022.33
291.58
0.22
crond
6.01
164.43
18
91.21
1641.8
5460.71
0.09
syslogd
48.1
58.32
24
51.36
1232.68
2773.01
0.06
cupsd
117.92
349.48
2
236.06
472.12
19563.9
0.02
atd
130.07
134.3
3
132.09
396.28
8545.5
0.02
57.44
81.97
3
73.08
219.25
4781.35
0.01
160.76
164.43
1
164.24
164.24
0
0.01
36.34
36.86
1
36.49
36.49
0
0.002
ide1
kedac hald
smartd runparts xfs
7
Lowest Interruption (us)
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiment 1 – who contributes where zooming in on peaks around 11-13 us
7,000
others events0 ide1 timer haldaddonstor hidd python
Total Events
6,000 5,000 4,000 3,000 2,000 1,000 0
8
11.1 11.2 11.3 11.5 11.6 11.7 11.8 12 12.1 12.2 12.4 12.5 12.6 12.8 12.9 Mean time for a bucket (microsec) September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiment 1 – who contributes where zooming in on peaks around 100-110 us
others events0 ide1 timer haldaddonstor hidd python
80
Total Events
70 60 50 40 30 20 10 0
9
101
102 104 105 Mean time for a bucket (microsec) September 19, 2007
107 © 2007 IBM Corporation
IBM India Research Laboratory
Experiments – 2.8 GHz Xeon, 512 KB Cache, 1GB RAM Experiment 2 – Introduction of synthetic daemons to verify methodology A. one synthetic daemon with a period of 10 seconds and execution time of ~2300 us B. two synthetic daemons – one with a period of 10 seconds and another one with 10.5 seconds and each having an execution time of ~2300 us C. two synthetic daemons – both with a period of 10 seconds and an execution time of ~1100 us each
10
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiments – 2.8 GHz Xeon, 512 KB Cache, 1GB RAM Experiment 2A – Introduction of synthetic daemons to verify methodology one synthetic daemon with a period of 10 seconds and execution time of ~2300 us comparison of master distribution with that of the default run level 3 shows the synthetic daemon
Parzen Window distribution for multiple processes 1 0
def_run_level3
log10 [F(X)]
1_synthetic_daemon -1 -2 -3 -4 10 11
100 X: time in us
1000 September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiments – 2.8 GHz Xeon, 512 KB Cache, 1GB RAM Experiment 2A – Introduction of synthetic daemons to verify methodology one synthetic daemon with a period of 10 seconds and execution time of ~2300 us
zooming in on the peak around 2000-3000 us
30
dummydaemon_1_python_ dummydaemon_1_hidd_ dummydaemon_1_rpc.idmapd_init_hidd_ kedac_dummydaemon_1_ dummydaemon_1_
Total Events
25 20 15 10 5 0 12
2,308.281 2,329.581 2,351.956 2,382.862 2,449.985 2,479.753 Mean time for a bucket (microsec) September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiments – 2.8 GHz Xeon, 512 KB Cache, 1GB RAM Experiment 2B – Introduction of synthetic daemons to verify methodology two synthetic daemons – one with a period of 10 seconds and another one with 10.5 seconds and each having an execution time of ~2300 us comparison of master distribution with that of the default run level 3 shows the synthetic daemon
Parzen Window distribution for multiple processes
0
def_run_level3
log10 [F(X)]
2_daemons_diff_period -1 -2 -3 -4 10 13
100 X: time in us
1000 September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiments – 2.8 GHz Xeon, 512 KB Cache, 1GB RAM Experiment 2B – Introduction of synthetic daemons to verify methodology two synthetic daemons – one with a period of 10 seconds and another one with 10.5 seconds and each having an execution time of ~2300 us zooming in on the peak around 2000-3000 us
30
Total Events
25 20
hidd_python_sendmail_sendmail_sendmail_ dummydaemon_1_hidd_ watchdog0_dummydaemon_1_ events0_dummydaemon_1_ dummydaemon_1_python_ dummydaemon_2_ dummydaemon_1_
15 10 5 0
14
2,184.851 2,215.267 2,249.291 2,298.867 2,333.678 2,416.380 Mean time for a bucket (microsec) September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiments – 2.8 GHz Xeon, 512 KB Cache, 1GB RAM Experiment 2C – Introduction of synthetic daemons to verify methodology two synthetic daemons – both with a period of 10 seconds and an execution time of ~1100 us each
Parzen Window distribution for multiple processes
0
def_run_level3
log10 [F(X)]
2_daemons_same_period -1 -2 -3 -4 10
100
1000
X: time in us 15
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Experiments – 2.8 GHz Xeon, 512 KB Cache, 1GB RAM Experiment 2C – Introduction of synthetic daemons to verify methodology two synthetic daemons – both with a period of 10 seconds and an execution time of ~1100 us each
dummydaemon_1_dummydaemon_2_init_rpc.idmapd_haldaddonstor_hidd_ kedac_dummydaemon_1_dummydaemon_2_ dummydaemon_1_dummydaemon_2_hidd_ watchdog0_dummydaemon_1_dummydaemon_2_ dummydaemon_1_dummydaemon_2_
16 14 Total Events
12 10 8 6 4 2 0 16
2,148.49 2,176.39 2,206.38 2,216.85 2,233.24 2,272.39 2,358.83 Mean time for a bucket (microsec) September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Conclusions and Future Work Design and implementation of a tool that can be used to identify sources of OS jitter and measure their impact compare various system configurations in terms of their jitter impact detect new sources of jitter that get introduced with time
detect patterns of scheduling (that can lead to jitter) Future Work and Work in Progress The jitter traces collected using the tool provide valuable information can be used to model jitter which is representative of a particular configuration can help predict the scaling behavior a particular cluster running a particular OS (and a particular configuration) 17
September 19, 2007
© 2007 IBM Corporation
IBM India Research Laboratory
Thank you!
18
September 19, 2007
© 2007 IBM Corporation