Mar 15, 2004 - ËC22(AT. 2 (Aâ1. 1 ). T. B12 â B22). \ m. (14). Where by (8) we have. Ë. C22 = (AT. 2 (Aâ1 ..... The direction of free variability is then orthogonal to this, that is z1 = (â1 1). Normalizing these we ..... 3.82 1.10 1.18. 1.10 5.38 0.89.
A Case Study on Traffic Matrix Estimation Under Gaussian Distribution Ilmari Juva
Pirkko Kuusela
Jorma Virtamo
March 15, 2004
Abstract We report a case study on an iterative method of traffic matrix estimation under some simplifying assumptions about the distribution of the origin-destination traffic demands. The starting point of our work is the Vaton-Gravey iterative Bayesian method, but there are quite a few differences between that method and our consideration. It is assumed that the distribution of the demands follow a single Gaussian distribution instead of being a modulated process. The normality assumption allows us to bypass the Markov Chain Monte Carlo step in the iterative method and explicitly derive the expected values for mean and covariance matrix of the traffic demands conditioned on link counts. We show that under the assumption of single underlying distribution the expected values of the mean and covariance converges after the first step of the iteration. This method cannot improve on this if no relation between mean and variance is imposed in order to make use of the covariance matrix estimates, or the distribution is assumed to be modulated from a regime of distributions.
1 Introduction In many traffic engineering applications, the knowledge on the underlying traffic volumes is assumed. The Traffic Matrix gives the amount of demanded traffic between each node in the network. The traffic matrix can not be directly measured, so there are not yet many methods to obtain them, although it is recognized that accurate traffic demand matrices are crucial for traffic engineering. The only information readily available are the link loads and the routing matrix. The traffic demands x between origin-destination pairs and the routing matrix A determine the link loads
1
y through relation y = Ax.
(1)
Since in any realistic network there are more OD pairs than links, the problem of solving x from A and y is strongly underdetermined and thus explicit solutions cannot be found. Most promising proposed methods for inferring traffic matrix from link loads include Bayesian based inference techniques and network tomography. Bayesian methods compute conditional probability distributions for elements of the traffic matrix, given the observed link loads. This method usually employs Markov-chain Monte Carlo simulation for computing the posterior probability. Network tomography uses more classical statistical methods like expectation maximization algorithm for calculating the maximum likelihood estimate for the traffic matrix based on the link loads. The Vaton-Gravey method [1] consists of iteration and exchange of information between two boxes. The available data is the link counts on several successive time periods. The first box simulates the traffic matrix (OD counts) from the link counts utilizing some prior information on the OD counts at each fixed time period . As an example, the traffic counts for each OD pair are assumed to constitue a Markov modulated process. Then the successive values for each OD pair are fed into the second box that updates the parameters of the Markov modulated process, which are then fed back into the first box as a Bayesian prior and the process is repeated. The first box involves running a Markov Chain Monte Carlo simulation and the second box computes maximum likelihood estimate using the Expectation Maximization method. link counts y
Traffic matrix estimation
Estimated traffic matrices E[x|y]
Bank of markovian regimes
Parameters of the OD flows Estimated markovian regimes
Figure 1: The Vaton-Gravey iterative method In this report we consider explicitely a special case of the above idea. Our aim is to gain insight into the method and, in particular, into the output of the first box by examining a model that is simple enough to be computed analytically. We assume the OD pairs are independent and follow a single Gaussian distribution instead of a mixture of distributions. This reduces the complexity of the Vaton-Gravey method, and allows the explicit analysis. Our prior consists
2
of the mean and the covariance matrix of the Gaussian distribution. The attractive feature of this approach is that the distribution conditioned on the link counts is again Gaussian. Thus we skip the MCMC simulation by calculating analytically the expected output of the first box. Also we point out that, contrary to Vaton-Gravey method, in our approach the output of the first box is the whole conditioned distribution, not just the mean of an OD pair conditioned on the link count observation. This is because the means result in a distribution that is flattened out, and thus has a singular covariance matrix. See Figures 3-4 for illustration of the conditional “cloud” in our approach. As in our case the output of the first box is the conditional distribution of x conditioned on y, the second box has only the function of taking the expectation of these over y. This yields the new estimate for the distribution of x which is here considered constant in time. The estimate is then returned to the first box as a prior. It turns out that the expected value of the mean does not change in the iteration after the first conditioning on link counts has been made. This result is proven later in the report.
link counts y
Conditional traffic matrix estimation
Conditional distribution
f X|Y(x)
Parameter Estimation
Parameters of the OD flows (m,C)
Figure 2: Illustration of the method studied. The rest of the report is organized as follows: In section 2 the conditional distribution of x conditioned on y is derived. Then the the expected values for mean and covariance matrix estimates (m, C) are calculated. Section 3 illustrates the results of the previous section through example cases in a much simplified situation. In Section 4 we state that the iteration converges after the first step, and prove this result for the estimator of the mean. Section 5 gives some illustrative numerical examples, and in section 6 the report is summarized and some final conclusions made.
3
2 Conditional Gaussian distribution 2.1 Introduction In this section we derive the equations for the conditional gaussian distribution. These are well known results but presented here for the sake of completeness. The distribution of variable X, representing here the OD traffic amounts, is derived conditioned on y, the measured link loads. From the conditional distribution we are able to solve the mean and the covariance matrix and their expected values. The covariance matrix is solved two ways. The conditional covariance approach is mathematically easier and more elegant, while the co-ordinate transformation method follows more the idea of the algorithm and is therefore probably more intuitively understandable.
2.2 The conditional distribution Let the n-vector x represent a multivariate gaussian variable X with mean m and covariance matrix C,
1 f (x) ∼ exp(− (x − µ)T Σ−1 (x − µ)). 2
(2)
Assume that we have a prior estimate (m, C) for (µ, Σ). We wish to determine the distribution of X conditioned on AX = y,
(3)
where y in an m-vector and A is an m × n matrix with m < n. First we partition x and A as Ã
x=
x1
!
x2
,
A = (A1 , A2 ),
where x1 is an m-vector, x2 is an (n − m)-vector, A1 in an m × m-matrix, and A2 in an m × (n − m)-matrix. From (3) we have x1 = A−1 1 (y − A2 x2 ).
(4)
By making the corresponding partition in the exponent of (2), (x1 − m1 )T B 11 (x1 − m1 ) + (x1 − m1 )T B 12 (x2 − m2 ) +(x2 − m2 )T B 21 (x1 − m1 ) + (x2 − m2 )T B 22 (x2 − m2 ), with
Ã
C
−1
=B=
4
B 11
B 12
B 21
B 22
!
,
(5)
and substituting (4) into (5) we obtain T
T
−1 −1 −1 −1 T T xT 2 (A2 (A1 ) B 11 A1 A2 − A2 (A1 ) B 12 − B 21 A1 A2 + B 22 ) x2 T
T
−1 −1 −1 T T +xT 2 ((B 21 − A2 (A1 ) B 11 )(A1 y − m1 ) + (A2 (A1 ) B 12 − B 22 )m2 )
+(transpose)x2 + constant.
(6)
Because of symmetry the multiplier of x2 is just the transpose of the multiplier of xT 2 We wish to write this as complete square of the form T ˜ −1 ˜ −1 (x2 −m ˜ −1 ˜ 2 )T C ˜ 2 )+constant = xT ˜ 2 +(transpose)x2 )+constant, (x2 −m 22 2 C 22 x2 −(x2 C 22 m
(7) −1 −1 ˜ ˜ ˜ 2 from (6), which are the multipliers of the Now we can pick out the terms C 22 and C 22 m T quadratic term and the x2 term respectively. ˜ −1 C 22 ˜ −1 m C 22 ˜ 2
= =
T
T
−1 T −1 −1 −1 AT 2 (A1 ) B 11 A1 A2 − A2 (A1 ) B 12 − B 21 A1 A2 + B 22
−((B 21 −
−1 T −1 AT 2 (A1 ) B 11 )(A1 y
− m1 ) +
−1 T (AT 2 (A1 ) B 12
(8)
− B 22 )m2 ) (9)
2.3 Mean of the conditional distribution ˜ 2 from (8-9). This yields It is possible to solve for m T
T
1 1 −1 −1 T −1 ˜ 2 = −(AT m · 2 (A1 ) B 11 A1 A2 − A2 (A1 ) B 12 − B 21 A1 A2 + B 22 ) T
T
−1 −1 −1 T ((B 21 − AT 2 (A1 ) B 11 )(A1 y − m1 ) + (A2 (A1 ) B 12 − B 22 )m2 ).(10)
˜1 Further, using (4) we can then solve for m −1 ˜ 1 = A−1 ˜ 2. m 1 y − A1 A 2 m
So now we have
µ
(11)
¶
˜1 m ˜2 m Then inserting (10) and (11) into (12) we obtain ˜ = m
(12)
˜ = m
=
+
−1 T −1 T −1 T −1 T ˜ A−1 y − A A C · ((B − A (A ) B )(A y − m ) + (A (A ) B − B )m ) 2 22 21 11 1 12 22 2 2 2 1 1 1 1 1 ˜ 22 · ((B 21 − AT (A−1 )T B 11 )(A−1 y − m1 ) + (AT (A−1 )T B 12 − B 22 )m2 ) −C 2 2 1 1 1 −1 T −1 T −1 ˜ A−1 + A A C (B − A (A ) B )A 2 22 21 11 2 1 1 1 y 1 −1 T −1 −C˜22 (B 21 − AT (A ) B )kA 11 2 1 1 −1 T −1 T −1 T ˜22 (B 22 − AT ˜ −A−1 A C (A ) B ) A A C (A (A ) B − B ) 2 11 2 22 12 22 2 2 1 1 1 m1 + 1 m2 (13) T T C˜22 (B 22 − AT (A−1 ) B 11 ) −C˜22 (AT (A−1 ) B 12 − B 22 ) 2
2
1
5
1
−1 T −1 T −1 ˜ A−1 1 + A1 A2 C22 (B 21 − A2 (A1 ) B 11 )A1 y −1 T −1 −C˜22 (B 21 − AT 2 (A1 ) B 11 )A1 ! Ã T −1 T −1 T −1 T ˜ ˜ −A−1 1 A2 C22 (B 21 − A2 (A1 ) B 11 ) A1 A2 C22 (A2 (A1 ) B 12 − B 22 ) m −1 T −1 T C˜22 (B 21 − AT ) B 11 ) −C˜22 (AT ) B 12 − B 22 ) 2 (A 2 (A
=
+
1
1
Where by (8) we have ˜ 22 = (AT (A−1 )T B 11 A−1 A2 − AT (A 1 )T B 12 − B 21 A−1 A2 + B 22 )−1 C 2 1 1 2 1 1 So the coefficient matrices for y and m that depend on A and B only. We write ˜ = Gy + Hm m where G
H
(15)
−1 −1 −1 T ˜22 (B 21 − AT ) B )A A−1 + A A C (A 2 11 2 1 1 1 1 (16) = −1 −1 T ) B )A −C˜22 (B 21 − AT (A 11 2 1 1 ! Ã −1 T −1 T −1 T T ˜ ˜ −A−1 1 A2 C22 (B 21 − A2 (A1 ) B 11 ) A1 A2 C22 (A2 (A1 ) B 12 − B 22 ) (17) = −1 T −1 T C˜22 (B 21 − AT ) B 11 ) −C˜22 (AT ) B 12 − B 22 ) 2 (A 2 (A
1
1
There are a lot of common components in the elements of G and H. This is more clearly visible if we arrange the terms using the following notation. L = A−1 1 A2
(18)
˜ 22 (B 21 − (A−1 A2 )T B 11 ) J = C 1
(19)
˜ 22 (B 21 − LT B 11 ) = C ˜ 22 (A−1 A2 )T B 12 − B22 ) K = C 1 ˜ 22 (LT B 12 − B22 ) = C
(20) (21) (22)
˜ 22 can be written as Now G, H and C
G =
H =
−1 −A−1 1 + LJA1
−J A−1 1 −LJ LK J
(23)
−K
˜ 22 = (LT B 11 L − LT B 12 − B 21 L + B 22 )−1 C
(24) (25)
Equation (15) is conditioned on a particular y. The result of the iteration is the ˜ over sample of ys. If we substitute the relation (3) into (15) it expected value E[m] yields ˜ = GAx + Hm, m 6
(26)
(14)
and the result of the iteration round is the expected value of this, giving ˜ (i) ] = GAµ + Hm(i) . m(i+1) = E[m
(27)
2.4 Covariance matrix of the conditional distribution 2.4.1 The conditional covariance approach The term (i, j) of covariance matrix of X is by conditional covariace
Cov[xi , xj ] = E[Cov[xi , xj |y]] + Cov[E[xi |y], E[xj |y]] = E[Cov[xi , xj |y]] + Cov[m ˜ i (y), m ˜ j (y)]
(28) (29)
So the whole covariance matrix can be calculated from ˜ ˜ Cov[x, xT ] = E[Cov[x, xT |y]] + Cov[m(y), m(y)]
(30)
˜ is The latter term, the covariance matrix of m ˜ ˜ S = Cov[m(y), m(y)]
(31)
T ˜ ˜ ˜ ˜ = E[(m(y) − E[m(y)])( m(y) − E[m(y)]) ]
(32)
T ˜ ˜ ˜ ˜ = E[(m(y) − m(y))( m(y) − m(y)) ]
(33)
= E[(GAx − GAµ)(GAx − GAµ)T ]
(34)
= GA E[(x − µ)(x − µ)T ](GA)T
(35)
= GAΣ(GA)T .
(36)
Where Hm terms from equation (26) cancel out because m as the previous rounds estimate is a constant. Note that the term in the middle is the covariance matrix of Y ΣY = AΣAT so we do not need to know Σ to calculate (36), which would be of course rather inconvenient since that is exactly what we are trying to find out. Since −1 A−1 1 A = A1 (A1 , A2 ) = (I, L) := M
(37)
and using the notation M Ã
GA =
(LJ − I)M −JM 7
!
,
(38)
and we obtain S = GAΣ(GA)T Ã ! (LJ − I)M = Σ ( ((LJ − I)M )T (−J M )T ) −JM Ã ! (LJ − I)M Σ((LJ − I)M )T (LJ − I)M Σ(−JM )T = −J M Σ(LJ − I)M )T JM Σ(J M )T ! Ã (LJ − I)M ΣM T (LJ − I)T −(LJ − I)M ΣM T J T = −J M ΣM T (LJ − I)T JM ΣM T J T
(39) (40) (41) (42)
˜ and also corresponds This is the expected value of sample covariance matrix for m to one produced by an infinite sample of measurements y. Note that the n × n matrix GA has rank that is at most the column rank of A which is m. Therefore it is singular, and hence S is singular. The first term Cov[x, xT |y], the conditional covariance matrix for x conditioned on y can be obtained by calculating ˜ 11 = E[(x1 − m ˜ 1 )(x1 − m ˜ 1 )T ] C −1 −1 −1 ˜ 2) · = E[(A−1 1 y − A1 A2 x2 − A1 y + A1 A2 m T
T
T
T
−1 −1 T −1 T T T ˜T (y T (A−1 1 ) − x2 A2 (A1 ) − y (A1 ) + m 2 A2 (A1 ) )] T
−1 ˜ 2 ))(−(x2 − m ˜ 2 )T AT = E[(−A−1 1 A2 (x2 − m 2 (A1 ) ) T
−1 ˜ 2 )((x2 − m ˜ 2 )T ]AT = A−1 1 A2 E[(x2 − m 2 (A1 ) T
T −1 ˜ = A−1 1 A2 C 22 A2 (A1 )
˜ 22 LT = LC
(43)
˜ 12 = E[(x1 − m ˜ 1 )(x2 − m ˜ 2 )T ] C −1 −1 −1 ˜ 2 )(x2 − m ˜ 2 )T = E[(A−1 1 y − A1 A2 x2 − A1 y + A1 A2 m
˜ 2 ))(x2 − m ˜ 2 )T ] = E[(−A−1 1 A2 (x2 − m ˜ 2 )(x2 − m ˜ 2 )T ] = −A−1 1 A2 E[(x2 − m ˜ = −A−1 1 A2 C 22 ˜ 22 = −LC
(44)
˜ 21 = E[(x2 − m ˜ 2 )(x1 − m ˜ 1 )T ] C T
˜ 22 AT (A−1 ) = −C 2 1
8
˜ 22 LT = −C
(45)
˜ 22 = E[(x2 − m ˜ 2 )(x2 − m ˜ 2 )T ] C = (LT B 11 L − LT B 12 − B 21 L + B 22 )−1 Ã
˜ = Cov[x, x |y] = C T
˜ 11 C ˜ 21 C
˜ 12 ! C , ˜ 22 C
(46) (47)
˜ in (47) depends on A and B only, that is not on X, it is constant with Since C regard to the expectation operation. Hence ˜ =C ˜ E[Cov[x, xT |y]] = E[C]
(48)
So now the updated covariance matrix in (30) can be written as C (i+1)
˜ +S = C
(49)
˜ + GAΣ(GA) = C (50) µ ˜ ¶ T ˜ LC 22 L −LC 22 = T ˜ ˜ 22 −C 22 L C µ ¶ (LJ − I)M ΣM T (LJ − I)T −(LJ − I)M ΣM T J T + (51) −J M ΣM T (LJ − I)T JM ΣM T J T µ ˜ ˜ 22 − (LJ − I)M ΣM T J T ¶ LC 22 LT + (LJ − I)M ΣM T (LJ − I)T −LC = ˜ 22 LT − J M ΣM T (LJ − I)T ˜ 22 + JM ΣM T J T −C C T
(52)
Where L and M depend only on A and thus, along with the real covariance matrix ˜ 22 and J are dependent on B Σ, do not change during iteration. On the other hand C and thus do change. 2.4.2 The co-ordinate transformation approach The conditional covariance matrix for x was calculated in equations (43)-(47) as Ã
˜ = C
˜ 11 C ˜ 21 C
˜ 12 ! C , ˜ 22 C
(53)
Because of condition (3) there cannot be any variance in the direction of y vectors. All the variance is orthogonal to these directions, as is shown in figure 3 for the two dimensional case. So transformed co-ordinates can be selected such that there is variation only in n − m directions. First m axes are the y vectors y i = eT i A 9
22 20 18 16 14 12 12
14
16
18
20
Figure 3: Sample conditioned on y with 100 points generated from the conditional sample. where ei is an m-vector, where the ith element is one and all the others are zero. This way we can obtain m directions of the new co-ordinate axes. Let us denote the rest of the co-ordinate axes by (z 1 , . . . , z n−m ). The zi -vectors have to be orthogonal to these and also be selected so that there are no covariance terms in the transformed covariance matrix. So let us make the transformation from X to X 0 , such that x0i are the unit vectors of yi and zi , and T is the transformation matrix compiled from their unit vectors. Then mean and covariance matrix of x and the real covariance matrix are m0 = T m ˜ 0 = T CT ˜ T C
(54)
Σ0 = T ΣT T
(56)
(55)
Now we could draw a random sample with mean and variance as above, so that the values x01 , . . . , x0m remain constant in the sample, and x0m+1 , . . . , x0m+n vary. Note that since we hold the first m values constant, it follows that this approach is valid only when we have a prior where all the variances are equal. Thus we get a random sample of the conditional distribution, then return these to the original coordinates. Then we can pick the next measurement y and repeat the procedure. This way we get the distribution conditioned on y with several different y. This method is illustrated in figure 4. The expectation of the Covariance matrix of the sample cloud can be obtained by 10
22.5
25
20 20
17.5 15
15
12.5
10
10 7.5
5 10
12
14
16
18
20
22 5
10
15
20
25
Figure 4: Conditional sample. On the left with five different y values, on the right with 500 values. selecting the the non-zero term from C˜0 and the rest of the terms from Σ0 , so that 0
covariance between these terms is zero. Then returning this matrix C (1) to original co-ordinates 0
C (1) = T −1 C (1) (T −1 )T
(57)
An example of this is given in Section 3.4.
3 Example: Network with one link 3.1 Introduction To illustrate the behavior of the method we consider an example that is simplified to extremity. Consider the network in figure 5. It has a single link and two traffic flows.
x1
A
y1
B
x2
Figure 5: Two dimensional example network This is an unrealistic example, since both x1 and x2 should belong to OD pair AB and there are no way to differentiate how much a specific xi contributes to the link load y. However, because of this, the solution depends heavily on the prior estimate, and thus gives good opportunity to study the effect of different priors, as well as display 11
graphically the situation. In this section we derive the same results for this example network as in section 2 for the general case.
3.2 Mean Let us first define A = (1 1) à ! r 0 Σ = 0 s ΣY = AΣAT = r + s à ! b11 b12 −1 B = C = b21 b22
(58) (59) (60) (61)
And, using notation w for simplicity ˜ 22 = (b11 − b12 − b21 + b22 )−1 w−1 = C Ã ! w−1 −w−1 ˜ = C −w−1 w−1
(62) (63)
equations (16)-(17) for G and H give
1−
(b11 −b21 ) w
G =
(b11 −b21 ) w b11 −b21 w
=
b12 −b22 w
H = b21 −b11 w
b22 −b12 w
(b22 −b12 ) w (b11 −b21 ) w
(64)
(65)
Now we can calculate the estimate for mean using equation (81). ˜ = GAµ + Hm E[m]
=
(b22 −b12 )(µ1 +µ2 ) w (b11 −b21 )(µ1 +µ2 ) w
+
(b11 −b21 )m1 +(b12 −b22 )m2 w (b21 −b11 )m1 +(b22 −b12 )m2 w
(66)
(67)
1 (b11 − b21 )m1 + (b22 − b12 )((µ1 + µ2 ) − m2 ) = w (b11 − b21 )((µ1 + µ2 ) − m1 ) + (b22 − b12 )m2
(68)
˜ is found by moving to the direcIt follows from the equation that the result for E[m] tion specified by the variances of the prior distribution. The narrower the distribution the more certain we are of this prior, and the movement is larger in the other direction where the uncertainty is larger. This is shown in figure (6). 12
25
25
25
20
20
20
15
15
15
10
10
10
5
5
5
5
10
15
20
25
5
10
20
15
25
5
10
20
15
Figure 6: Covariance matrix of the prior distribution affects the mean of the result distribution.
3.3 Covariance matrix Equation (36) for S now yields
(b22 −b12 )2 (r+s) w2
(b11 −b21 )(b22 −b12 )(r+s) w2
S = GAΣAT GT = )2 (r+s))
((b11 −b21 )(b22 −b12 )(r+s) w2
((b11 −b21 w2
(69)
˜ is as given in (63), so covariance matrix is and C Cov[x, xT ]
= = =
˜ +S C µ −1 w
−w
−1
¶
Ã
(b22 −b12 )2 (r+s) w2 (b11 −b21 )(b22 −b12 )(r+s) w2
+ −w w−1 Ã (b22 − b12 )2 (r + s) + w 1 w2 (b11 − b21 )(b22 − b12 )(r + s) − w −1
(b11 −b21 )(b22 −b12 )(r+s) w2 (b11 −b21 )2 (r+s) w2
!
(70) (71)
(b11 − b21 )(b22 − b12 )(r + s) − w
!
(b11 − b21 )2 (r + s) + w (72)
3.4 Covariance matrix by transformation method First we have need obtain the transformation matrix T . The direction of no variance in two dimensional case is y1 = ( 1
1 ) which is the only row of routing matrix A.
The direction of free variability is then orthogonal to this, that is z1 = ( −1
1 ).
Normalizing these we obtain 1 T =√ 2
Ã
1
1
−1 1 13
!
(73)
25
Then from (55)-(56) Ã
˜0
C
Σ0
=
2w−1
0
!
(74)
0 0 Ã ! 1 r+s r−s = 2 r−s r+s
(75)
We pick the non zero term from C 0 and from Σ0 the terms that are not in the same row or column with the term already picked, since all covariance including the C 0 term are set to zero. ˜ 0 (1) = C
Ã
2w−1 0
!
0 1 (r 2
(76)
+ s)
And finally we use equation (57) to return this to original co-ordinates. C
(1)
=
Ã1 (r + s) + w−1 4 1 (r 4
+ s) − w−1
1 (r 4 1 (r 4
+ s) − w−1
!
(77)
+ s) + w−1
The transformation method here is defined so that it gives correct answers only when prior variances of x1 and x2 are the same. So we have to set b11 = b22 = b and b12 = b21 = 0 in (72). So now
1 b2 (r + s) + w b2 (r + s) − w Cov[x, xT ] = w2 b2 (r + s) − w b2 (r + s) + w
=
=
b2 (r (2b)2 b2 (r (2b)2 1 (r 4 1 (r 4
+ s) + + s) −
w w2 w w2
+ s) + w−1 + s) − w−1
b2 (r (2b)2 b2 (r (2b)2 1 (r 4 1 (r 4
+ s) − + s) +
+ s) − w−1 + s) + w−1
w w2 w w2
(78)
(79)
(80)
Where we used the fact that w = (b11 − b12 − b21 + b22 ) = 2b. And we can see that (72) and (77) indeed yield the same result, as we would expect.
4 Iteration 4.1 Introduction In this section the equations for the iteration are given. Then we show that in this situation where we assume a single underlying distribution, the iteration is useless as 14
the result converges to its final values after the first step. This is proven for the mean in general case in theorem 1. Examples show that the same is true for the covariance matrix in two-dimensional case, and in fact for all the examples we have studied. The proof for the general case is left for future work.
4.2 General case The expectation of the result of an iteration round is m(i+1) C (i+1)
= GAµ + Hm(i) ˜ + GAΣ(GA)T = C
(81)
˜ depend on C (i) through B (i) , while A and Σ remain constant. where G, H and C Theorem 1 The expected value of the mean m does not change in the iteration after the first iteration. That is m(i+1) = m(i)
∀i > 1.
(82)
Proof From (27) we can see that m(i) = GAµ + Hm(i−1) m(i+1) = GAµ + H(GAµ + Hm(i−1) ) m(i+1) − m(i) = H(GAµ + Hm(i−1) ) − Hm(i−1) = HGAµ + (HH−H)m(i−1)
(83) (84) (85) (86)
So to prove that m(i+1) = m(i) we need to show that HG = 0
(87)
HH = H
(88)
J L + K = C 22 (B 21 − LT B 11 )L + C 22 (LT B 12 − B 22 )
(89)
First let us show the following
= −C 22 (LT B 11 L − LT B 12 − B 21 L + B 22 )
(90)
= −C 22 (C 22 )−1
(91)
= −I
(92) 15
and we can use this result to proof the relations
HG =
=
=
=
HH =
=
=
=
−1 −1 −LJ A−1 1 − LJ LJ A1 − LKJ A1
−1 −1 J A−1 1 + J LJ A1 + KJ A1
−L(I + JL + K)J A−1 1 (I + J L + K)J A−1 1 −L(I − I)J A−1 1 (I − I)J A−1 1 0
(93)
(94)
(95)
(96)
0
LJLJ + LKJ −LJ LK − LKK −J LJ − KJ
−L(−J L − K)J L(−J L − K)K (−J L − K)J −L(I)J L(I)K (I)J
−K
(97)
−(−J L − K)K
(98)
−(I)K
−LJ LK J
J LK + KK
(99)
=H
(100)
So m(i+1) − m(i) = HGAµ + (HH−H)m(i−1)
(101)
= 0 · Aµ + (H − H)m(i−1)
(102)
= 0
(103)
And this completes the proof. Theorem 2 The ecpexted value of the estimated covariance matrix C does not change in the iteration after the first iteration. That is C (i+1) = C (i)
∀i > 1.
(104)
The proof has not been completed for the general case. In the following section it is proven for the two dimensional case. All numerical examples of various topologies indicate that the covariance matrix also converges after the first iteration step. 16
4.3 Two dimensional case 4.3.1 Mean In Theorem 1 it is shown that the result does not change after first iteration. The proof is given in matrix form. Here the same result is proven as an example specifically for two dimensional case. After first iteration the estimate m(1) = ( m1 points ( µ1
µ2 ), ( µ1 + µ2
0 ) and ( 0
m2 ) is on the line that goes through
µ1 + µ2 ) as can be seen from figure 8. It
obviously satisfies Am = Aµ
(105)
m 1 + m 2 = µ1 + µ2
(106)
and can thus be written Ã
m
(1)
=
!
m1 (µ1 + µ2 ) − m1
.
(107)
Then from (68)
m(2) =
1 (b11 − b21 )m1 + (b22 − b12 )((µ1 + µ2 ) − m2 ) w (b11 − b21 )((µ1 + µ2 ) − m1 ) + (b22 − b12 )m2
(108)
=
1 (b11 − b21 )m1 + (b22 − b12 )((µ1 + µ2 ) − ((µ1 + µ2 ) − m1 ) (109) w (b11 − b21 )((µ1 + µ2 ) − m1 ) + (b22 − b12 )((µ1 + µ2 ) − m1 )
=
(b11 − b21 − b12 + b12 )(m1 ) 1 w (b11 − b21 − b12 + b22 )((µ1 + µ2 ) − m1 )
=
(110)
m1 (µ1 + µ2 ) − m1
= m(1)
(111)
4.3.2 Covariance matrix The sum of the elements of the Covariance matrix Cov[x, xT ] after first iteration is r +s, the sum of the real variances of OD pair distributions. This can be seen from the fact that the coefficients of (r + s) in elements of (69) sum to one, and the elements of (63) obviously sum to zero. Ã
C
(1)
T
= Cov[x, x ] =
17
v1
c12
c12
v2
!
(112)
where v1 + v2 + 2c12 = (r + s). B = (C
(1) −1
)
1 = v1 v2 − c212
w = b11 − b12 − b21 + b22
Ã
v2
−c12
!
−c12 v1 v1 + v2 + 2c12 = v1 v2 − c212
(113) (114)
Let us look at the upper left element of (72) in the second iteration (C (2) )11 =
(b22 − b12 )2 (r + s) + w−1 w2
= ³
(v1 +c12 )2 (v1 v2 −c212 )2 v1 +v2 +2c12 v1 v2 −c212
´2 (r + s) +
v1 v2 − c212 v1 + v2 + 2c12
(v1 + c12 )2 v1 v2 − c212 (v + v + 2c ) + 1 2 12 (v1 + v2 + 2c12 )2 v1 + v2 + 2c12 2 2 2 v + 2v1 c12 + c12 + v1 v2 − c12 = 1 v1 + v2 + 2c12 v1 (v1 + 2c12 + v2 ) = v1 + v2 + 2c12 = v1 =
(115) (116)
(117) (118) (119) (120)
And similar results for other elements can be obtained to show that Ã
C
(2)
=
v1
c12
c12
v2
!
= C (1)
(121)
5 Numerical examples 5.1 Introduction In this section specific numerical examples are discussed to show the behavior of the method in different situations. First the two dimensional example of section 3 is studied. Then a more realistic, yet still very small, example network with two links and three separate OD pairs is considered. The topology for this network is shown in figure 7.
5.2 Two dimensional numerical example Choosing the values m = (2
4) 18
(122)
Ã
C =
1 0 0 1
!
(123) Ã
1
0
0 µ = ( 15 10 ) Ã ! 9 0 Σ = 0 4
1
B = C
−1
=
!
(124) (125) (126)
Inserting these values to equation (81) (or to the equations specific to the two dimensional case (68) and (72)) yields ˜ (i+1) = ( 11.5 13.5 ) m à ! 3.75 2.75 (i+1) C = 2.75 3.75
(127) (128)
The results are illustrated in figure 8. In figure 9 the sample cloud of the transformation method is plotted in the same picture.
5.3 Three dimensional numerical example example1
µ = ( 5 10 15 ) 3 0 0
(129)
Σ = 0 4 0 0 0 5 m(0) = ( 5 5 5 )
A
(130) (131)
x1
x2
y1
y2
B x3
Figure 7: Example network considered in section 5.3
19
C
25
20
15
10
5
5
10
20
15
25
Figure 8: Result of the iteration. The arrow shows the movement of the estimate from prior distribution to new estimate distribution shown in red. The real distribution (µ, Σ) is shown in black
1
0
0
C (0) = 0 0
1
0
0
1
(132)
With these starting values the method yields the following result m(1) = ( 6.67 11.67 13.33 ) 3.33 −1.00 1.33
(133)
C (1) = −1.00 1.33
(134)
2.67
0.67
0.67
3.00
In figure 10 the result is shown as two dimensional projection for each co-ordinate plane. In x1 , x3 -plane and x2 , x3 -plane the line where the new mean estimate is located comes from equation Aµ = Ax
(135)
which in three-dimensional case becomes µ1 + µ 3 = x 1 + x 3 20
(136)
25
20
15
10
5
5
10
20
15
25
Figure 9: Sample cloud produced by co-ordinate transformation method coincides with explicit result obtained by conditional covariance method. µ2 + µ 3 = x 2 + x 3
(137)
and are equations for planes in three dimensional space. The line in x1 , x2 -plane is the transversal of these planes x 1 − x 2 = µ1 − µ2
(138)
example2 Let us change the prior estimate for the covariance matrix. µ = ( 5 10 15 ) 3 0 0
Σ = 0 4 0 0 0 5 (0) m = (5 5 5)
21
(139) (140) (141)
x2 25
x3 25
x3 25
20
20
20
15
15
15
10
10
10
5
5
5
5
10
15
20
25
x1
5
10
20
15
25
x1
5
10
15
20
Figure 10: Example 1
1
0
0
C (0) = 0 0
3
0
0
2
(142)
The result is now m(1) = ( 6.82 11.82 13.18 ) 1.67 −0.42 1.06
(143)
C (1) = −0.42 1.06
(144)
4.48
0.15
0.15
4.21
And is shown in figure 11. Notice that the estimate for the mean is very close to that in example 1, even though we changed the covariance prior. This is because the difference between m1 and m2 is fixed, because it is known from the difference between the measured link loads. The results move together in the iteration so uncertainty in the form of large prior variance for one of them does not affect the result as clearly as was the case in figure 6 for two-dimensions, where no limiting relations are known between elements of m. example3
µ = ( 5 10 15 ) 3 0 0
(145)
Σ = 0 4 0 0
(146)
22
0 5
25
x2
x2 25
x3 25
x3 25
20
20
20
15
15
15
10
10
10
5
5
5
5
10
15
20
25
x1
5
10
20
15
25
x1
5
10
15
20
Figure 11: Example 2 m(0) = ( 5 5 5 ) 2 0 0
(147)
C (0) = 0 0
(148)
4
0
0
1
Now we have changed the covariance prior so that the difference between variances of x2 and x3 is large. There is no relation known between them, so no the change affects the result more dramatically. This is shown in figure 12.
m(1) = ( 10 15 10 ) 3.82 1.10 1.18
(149)
C (1) = 1.10 5.38
(150)
1.18 0.89
0.89 1.82
6 Conclusion 6.1 Summary In this report a case study on traffic matrix estimation under Gaussian OD pair traffic distributions was made. We studied the behavior of a special case of the Vaton-Gravey method inferring estimates for OD pair traffic demands based on link counts and some prior distribution. In Section 2 the expected values for parameters of the conditional distribution of x conditioned on y were computed by utilizing the characteristics of 23
25
x2
x2 25
x3 25
x3 25
20
20
20
15
15
15
10
10
10
5
5
5
5
10
15
20
25
x1
5
10
15
20
25
x1
5
10
15
20
Figure 12: Example 3 the normal distribution. By writing the density function of the conditional distribution in the complete square form, we were able to pick out necessary terms to obtain expressions for conditional mean and covariance estimators. Then taking expectation of these over the link counts y, we obtained the estimators for the distribution of OD pairs. This method was illustrated through simple examples in Section 3. The main result of the report was given in Section 4 were we state that the iteration converges after first step, and proved this result for the estimator of the mean. However, the proof for the covariance matrix estimator has not yet been completed. This is left as future work. The result indicates that under the assumptions we make about the distributions there is no benefit of iteration. The first estimator is as accurate as it can come in this situation. In Section 5 some numerical examples were studied in simple topologies. In the degenerate case of one link and two OD pairs the prior chosen determined largely the result, since link count data obviously cannot give any indication about the OD pair traffic amounts. In realistic, yet still very simple, case of two links and three OD pairs the link counts give some information about the relative sizes of the OD pair means. Hence, the result is not completely dependent by the prior chosen. As the network topology grows larger to model more realistic communications network, there are more an more OD pair dependency information in the link counts.
6.2 Further work Theorem 2 in page 16 is still to be proven in the general case matrix form. 24
25
x2
Considering further this kind of static model, where the expected traffic amounts for the OD pairs do not change as a function of time, one way of improving the result is to utilize the covariance matrix in estimation of the mean. To do this we need to assume a relation between mean and variance of the OD pairs. In [2] the authors propose a power law relation where the variance of ith OD pair σi2 would depend linearly on the corresponding mean µi raised to power c. σi2 = φ µci
(151)
where φ, c are parameters to be defined. With this kind of assumption we could get additional information to our estimate.
References [1] S. Vaton, A. Gravey, "Network tomography: an iterative Bayesian analysis", ITC18, Berlin, August 2003. [2] J. Cao, D. Davis, S. Vander Wiel, B. Yu, "Time-Varying Network Tomography: Router Link Data" The Journal of American Statistics Association, Vol. 95, No. 452, 2000.
25