Multi-Frequency Sinusoidal Perturbation Method for Dynamic Characterization of Multi-Processor Computer Servers
Eugenio Schuster and Kenny C. Gross
Multi-Frequency Sinusoidal Perturbation Method for Dynamic Characterization of Multi-Processor Computer Servers Eugenio Schuster Kenny C. Gross SMLI TR-2004-130 May 2004
Abstract:
Dynamic characterization and fault detection are carried out in enterprise servers using nonparametric identification techniques based on sinusoidal excitation. The introduction of subtle sinusoidal perturbations in computer load variables or physical variables allows us to obtain a dynamic input-output characterization in the frequency domain. The input-output relationship is described in terms of coupling coefficients between a wide variety of physical and performance variables at different selected frequencies. This innovative approach in the field of computer science has been demonstrated in empirical studies to provide valuable dynamic system characterization information that can be indispensable to datacenter operations personnel for the function of performance management, capacity planning, quality-of-service (QoS) assurance, dynamic resource provisioning, and root cause analyses.
M/S MTV29-01 2600 Casey Avenue Mountain View, CA 94043
email addresses:
[email protected] [email protected]
© 2004 Sun Microsystems, Inc. All rights reserved. The SML Technical Report Series is published by Sun Microsystems Laboratories, of Sun Microsystems, Inc. Printed in U.S.A. Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to the source is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an information retrieval system, without the prior written permission of the copyright owner. TRADEMARKS Sun, Sun Microsystems, the Sun logo, and Java are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. UNIX is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company, Ltd. For information regarding the SML Technical Report Series, contact Jeanie Treichel, Editor-in-Chief .All technical reports are available online on our Website, http://research.sun.com/techrep/.
Multi-Frequency Sinusoidal Perturbation Method for Dynamic Characterization of Multi-Processor Computer Servers Kenny C. Gross
[email protected]
Eugenio Schuster
[email protected]
RAS Computer Analysis Laboratory Sun Microsystems Inc.
1
Introduction
An e-Commerce transaction server is a complex system with hundreds of resource, performance, and throughput parameters, making the study of relationships between and among variables quite difficult using traditional static analysis approaches. Currently, system performance for servers is characterized by testing their operation under maximum load, random load, and using performance benchmarks that mimic typical user loads. These conventional approaches are not able to fully characterize transfer function relationships among performance variables and establish cause/effect relationship between them. To determine such vital cause/effect relationships using conventional approaches, one could introduce a large step-function perturbation in demand vector X, and measure the responses in a variety of vectors of interest, which may include physical variables (e.g., from distributed temperature, voltage, and current transducers that are already built into the servers), system resource variables, or quality-of-service (QoS) variables derived from system performance parameters (Figure 1). However, if such experiments were to be conducted during times of high user activities, the step-function would have to be quite large to infer accurate coupling coefficients because of poor signal-to-noise relationships (poor coupling coefficients between the demand vector and the response vector(s) would make the situation even worse). Such maneuvers would likely cause system overload events, and would certainly interfere with the normal day-to-day operation of the system one is seeking to characterize. (This phenomenon is loosely referred to in the recent literature as a macroscopic variant to Heisenberg’s Uncertainty Principle). Dynamic characterization of complex systems such as enterprise compute servers and web servers can be achieved by introducing perturbations in one or more “input” variables, and measuring the time-dependent responses in one or more “response” variables. We quantify this relationship between input variables and response variables with a “dynamic coupling coefficient,” which may be a function of load, or, more generally, may be a multivariate function of very many input variables. In a dynamically executing system such as a web server, distributed synthetic transaction generators can be employed for real-time continuous monitoring of system transaction latencies. These “canary tests” provide QoS performance metrics on a 24*7 basis as a dynamical
1
Figure 1: Step perturbations to measure interaction effects among related variables. Concept illustrated with noiseless variables. Real server variables are characterized by large, chaotic perturbations, necessitating very large step changes in variable 1 to produce observable changes in response variables. function of system load. Specifically, in order to measure the impact of some performance parameter X on another performance parameter Y, the synthetic transactions introduce an (ideally small) perturbation in X, from which the resulting effect on parameter Y, if any, can be measured. As an example, one might compress a 10 Mbyte file and attempt to discern the temperature effect on one or more ASIC modules on a system board. Using time domain techniques, such a measurement would very likely be impossible on a large, multi-user, multi-cpu server, because of the extremely small effect one is seeking to discern and the poor signal-to-noise ratio. The well-known sinusoidal excitation technique for estimation of transfer functions [1] allows us to translate this input-output effect to the frequency domain. The advantage of working with this technique is that we concentrate our effort on a few number of frequency points (the frequencies of the sinusoidal excitations) where the correlation or coupling between variables is clearly seen. The technique has been already adapted by one of the authors for the dynamic system characterization of chaotic, nonlinearly interacting physical variables in nuclear power plants [2, 3]. This method, used now for dynamical system characterization of large, multi-processor servers, is an elegant and powerful exploratory analysis tool to characterize complex system behavior, particularly the relationships among various dynamic system parameters. The paper is organized as follows. Section 2 introduces the mathematical underpinnings of the sinusoidal excitation method as a nonparametric identification technique. The section is designed mainly for those readers who are not familiar with frequency domain methods for transfer function estimation. Section 3 explains how the coupling coefficients are computed taking into account the limitations imposed by the discrete Fourier transform. Several experimental results are presented in Section 4. The paper is concluded by a summary in Section 5.
2
2
Mathematical Background
Transfer function: For a discrete-time, linear, time-invariant (LTI) system with impulse response h[n], the output sequence y[n] is related to the input sequence u[n] through the convolution sum, ∞ X
y[n] = h[n] ∗ u[n] =
h[k]u[n − k],
(1)
k=−∞
where n is an integer number. It is useful to introduce a shorthand notation for the convolution sum (1). With that purpose we define the backward shift operator q −k , q −k u[n] = u[n − k]. We can now write (1) as y[n] = h[n] ∗ u[n] =
∞ X
∞ X
h[k]u[n − k] =
k=−∞
h[k]q −k u[n] = H(q)u[n],
(2)
k=−∞
where
∞ X
H(q) =
h[k]q −k ,
(3)
k=−∞
is the transfer function of the system. Strictly speaking, the term transfer function is reserved for the z-transform of the impulse response. The z-transform operator Z{·} is defined as Z{u[n]} = U (z) =
∞ X n=−∞
u[n]z −n .
(4)
Applying this operator to the convolution sum (2) and taking into account that Z{q −k u[n]} = z −k U (z), we can represent our system as Y (z) = H(z)U (z), where
∞ X
H(z) =
(5)
h[k]z −k ,
(6)
k=−∞
is the transfer function of our system, and according to definition (4), the z-transform of the impulse response h[n]. Frequency response: Consider as an input sequence a complex exponential of radian frequency ω, i.e., u[n] = ejωn for −∞ < n < ∞. The output of the system is given by y[n] = h[n] ∗ u[n] =
∞ X
∞ X
h[k]u[n − k] =
k=−∞
h[k]ejω(n−k) =
k=−∞
Defining, H(ejω ) =
∞ X
∞ X
h[k]e−jωk ejωn .
(7)
k=−∞
h[k]e−jωk ,
(8)
k=−∞
we can write the output sequence as ¯
¯
y[n] = H(ejω )ejωn = ¯¯H(ejω )¯¯ ej (ωn+arg[H(e 3
jω )
]) .
(9)
As a result, we have shown that the complex exponential ejωn is an eigenfunction of the LTI system with associated eigenvalue equal to H(ejω ). The eigenvalue H(ejω ) is called the frequency response of the system and describes the changes in amplitude and phase of the complex exponential input. An important distinction exists between continuous-time and discrete-time LTI systems. While in the continuous-time domain we need to specify the frequency response H(Ω) over the interval −∞ < Ω < ∞, in the discrete-time domain, we only need to specify the frequency response H(ejω ) over an interval of length 2π, e.g., −π < ω ≤ π. This property is based on the periodicity of the complex exponential. Using the fact that e±j2πr = 1 for any integer r, we can show that e−j(ω+2πr)n = e−jωn e−j2πrn = e−jωn .
(10)
Recall that for a discrete-time sequence, the Fourier transform and inverse Fourier transform are defined respectively as U (ejω ) = u[n] =
∞ X
u[n]e−jωn ,
(11)
1 Zπ U (ejω )ejωn dω. 2π −π
(12)
n=−∞
Comparing Eqs. (8) and (11), note that the frequency response of a LTI system is the Fourier Transform of the impulse response h[n]. As we stated above, the frequency response is periodic. Likewise, the Fourier Transform is periodic with period 2π. The inverse Fourier transform (12) represents x[n] as a superposition of infinitesimal complex exponentials over the interval −π < ω ≤ π. The Fourier transform (11) determines how much of each frequency component over the interval −π < ω ≤ π is required to synthesize x[n] using Eq. (12). The Fourier transform is usually referred to as the spectrum. Comparing Eqs. (11) and (4) we note that we can obtain the Fourier transform evaluating the z-Transform at the unit circle (z = ejω ). Based on this property, the frequency response H(ejω ) of a discrete-time LTI system h[n] can be obtained evaluating the transfer function H(z) at z = ejω . Relationship between sequences and sampled signals: A sequence u[n] is generally a representation of a sampled signal. Given a continuous signal u(t), its sampled version us (t) can be written as us (t) = u(t)s(t) where s(t) =
∞ X n=−∞
δ(t − nTs ),
(13)
and Ts is the sampling period. In this case we write us (t) = u(t)
∞ X n=−∞
δ(t − nTs ) =
∞ X n=−∞
u(nTs )δ(t − nTs ),
(14)
and u[n] = u(nTs ).
(15)
Based on the definition of the continuous-time Fourier transform, Z
U (Ω) = u[t] =
∞
u(t)e−jΩt dt,
(16)
1 Z∞ U (Ω)ejΩt dΩ 2π −∞
(17)
−∞
4
we can obtain the continuous-time spectrum for the sampled signal Us (Ω) =
∞ X n=−∞
Z
u(nTs )
∞
−∞
δ(t − nTs )e−jΩt dt =
∞ X n=−∞
u(nTs )e−jΩnTs .
(18)
Using the Fourier transform (11) we can compute U (ejω ) =
∞ X n=−∞
u[n]e−jωn .
(19)
Comparing Eqs. (18) and (19), and taking into account Eq. (15) we conclude that ¯
Us (Ω) = U (ejω ) ¯¯ω=ΩTs = U (ejΩTs ).
(20)
The Fourier Transform U (ejω ) is simply a frequency-scaled version of the continuous-time Fourier transform Us (Ω) where the scale factor is given by ω = ΩTs =
Ω f = 2π . fs fs
(21)
The Nyquist theorem relates the sampling frequency fs = 1/T s with the maximum frequency fmax of the signal before sampling. To avoid aliasing distortion, it is required that fs > 2fmax .
(22)
Therefore, every time we sample with frequency fs we are assuming that the maximum frequency of the signal to be sampled is less than fs /2. In other words, we are assuming that (
Us (Ω) =
6= 0 −2π f2s < Ω ≤ 2π f2s = 0 otherwise.
(23)
Based on the scaling (21), we will have ( jω
U (e ) =
6= 0 −π < ω ≤ π = 0 otherwise.
(24)
implying that the interval −π < ω ≤ π in the discrete-time domain corresponds to the interval −πfs < Ω ≤ πfs (−fs /2 < f ≤ fs /2) in the continuous-time domain. Frequency response estimation by sinusoidal excitation: To make the model of our system (2) more realistic we add a noise sequence at the output to model those signals affecting the system that are out of our control. The discrete-time LTI system is written now as y[n] = H(q)u[n] + v[n]
(25)
where the noise sequence v[n] may represent not only real measurement noise but also other phenomena such as uncontrollable inputs or disturbances. We consider in this case finite-length sequences, i.e. 0 ≤ n < N − 1. If the input sequence u[n] is given by u[n] = A cos(ωo n) 5
(26)
with 0 ≤ ωo < π, we can show – based on the fact that cos(ωo n) = (ejωo n + e−jωo n )/2 and our result (9) for complex exponentials – that the output will be ¯
¯
y[n] = A ¯¯H(ejωo )¯¯ cos(ωo n + arg[H(ejωo )]) + v[n].
(27)
Given the sums −1 1 NX y[n] cos(ωo n), Ic (N ) = N n=0
−1 1 NX Is (N ) = y[n] sin(ωo n), N n=0
(28)
we can insert (27) in (28) to obtain Ic (N ) =
−1 −1 ¯ ¯ 1 NX 1 NX A ¯¯H(ejωo )¯¯ cos(ωo n + arg[H(ejωo )]) cos(ωo n) + v[n] cos(ωo n) N n=0 N n=0
¯
−1 h ¯ 1 1 NX
Ic (N ) = A ¯¯H(ejωo )¯¯ + Ic (N ) =
1 N
N −1 X n=0
2N
n=0
(29)
i
cos(2ωo n + arg[H(ejωo )]) + cos(arg[H(ejωo )])
v[n] cos(ωo n)
(30)
−1 ¯ ¯ 1 NX A ¯¯ A ¯¯ ¯ ¯ ¯H(ejωo )¯ cos(arg[H(ejωo )]) + ¯H(ejωo )¯ cos(2ωo n + arg[H(ejωo )]) 2 2 N n=0
+
−1 1 NX v[n] cos(ωo n) N n=0
(31)
and similarly Is (N ) = −
−1 ¯ ¯ 1 NX A ¯¯ A ¯¯ ¯ ¯ sin(2ωo n + arg[H(ejωo )]) ¯H(ejωo )¯ sin(arg[H(ejωo )]) + ¯H(ejωo )¯ 2 2 N n=0
−1 1 NX + v[n] sin(ωo n). N n=0
(32)
When N → ∞, the second and third terms in the expressions for Ic (N ) and Is (N ) tend to zero. We finally obtain ¯ A ¯¯ ¯ ¯H(ejωo )¯ cos(arg[H(ejωo )]) 2 ¯ A ¯¯ ¯ Is (N ) = − ¯H(ejωo )¯ sin(arg[H(ejωo )]). 2
Ic (N ) =
(33) (34)
These two expressions suggest the following estimators for the Frequency Response: ¯ ¯ ¯ ˆ jωo ¯ ¯H(e )¯
q
Ic (N )2 + Is (N )2
=
A/2 Is (N ) jωo . arg[H(e ˆ )] = − arctan Ic (N )
6
(35) (36)
It is opportune to introduce at this moment the discrete Fourier transform (DFT) and inverse discrete Fourier transform defined respectively as U [k] =
N −1 X
2π
u[n]e−j N kn ,
(37)
n=0 N
−1 2π 1 2X u[n] = U [k]ej N kn , N k=− N
(38)
2
− N2
N 2
≤ k < − 1 assuming that N is even. The discrete Fourier transform (38) gives us where the Fourier transform (11) at N equally spaced frequencies over the interval −π ≤ ω < π: ¯
U [k] = U (ejω ) ¯¯ω= 2πk .
(39)
N
In addition to its theoretical importance as a Fourier representation of sequences, the DFT plays a central role in digital signal processing because there exist efficient algorithms for its computation. These algorithms are usually referred to as Fast Fourier Transform (FFT). The FFT is simply an efficient implementation of the DFT. In applications that are based on Fourier analysis of signals, it is the Fourier Transform that is desired, while it is the DFT that is actually computed. For a finite-length signal, the DFT provides frequency-domain samples of the Fourier Transform of the signal. The discrete Fourier transform of the output sequence y[n] is given by Y (ω) =
N −1 X
y[n]e−jωn ,
ω=
n=0
N N 2π k (− ≤ k < − 1; −π ≤ ω < π). N 2 2
(40)
Comparing this expression with (28) we can write 1 Y (ωo ). N The discrete Fourier transform of the input sequence u[n] is computed as Ic (N ) − jIs (N ) =
U (ω) =
N −1 X
−jωn
u[n]e
n=0
resulting
n=0
(
U (ω) =
=
N −1 X
A
(41)
N N ejωo n + e−jωo n −jωn 2π e k (− ≤ k < − 1; −π ≤ ω < π), ,ω= 2 N 2 2 (42)
N A2 for ω = ±ωo if ωo = 0 otherwise.
2π k N
for some integer −
N 2
≤k