EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 01 - 05 June 2008.
Optimal Scenario Tree Reduction for Stochastic Streamflows in Power Generation Planning Problems Welington Luis de Oliveira∗ ; Claudia Sagastiz´ abal, D´ ebora Dias Jardim Penna; Maria Elvira Pi˜ neiro Maceira; Jorge Machado Dam´ azio Research Center of Energy Electric - CEPEL. Av. Hor´ acio Macedo, 354, Cidade Universit´ aria, Ilha do Fund˜ ao - Rio de Janeiro, RJ, CEP 21941-911. ∗ Fone: +55 21 2598 6059, e-mail:
[email protected]
1. Abstract For the sake of precision, mid-term operation planning of hydro-thermal power systems needs a large number of synthetic sequences to represent accurately stochastic streamflows. However, if the number of synthetic sequences is too big, the optimization planning problem may be too difficult, due to computational time. This work employs a technique called Scenario Optimal Reduction that minimizes the distance between probability measures -associated to different scenario trees- to select an optimal subset of synthetic sequences that represents well enough the uncertainty. Optimality of the subset is twofold, since uncertainty is accurately represented and the size of the reduced optimization planning problem can thus be controlled without impairing the quality of the corresponding optimal solution. 2. Keywords: Scenarios Reduction, Stochastic Programming, Mid-Term Operation Planning Problem of Hydro-Thermal Systems. 3. Introduction Brazilian power system is predominantly hydraulic: hydro-generation represents over 80% of the total electricity produced in the country. The operation of hydro-plants is coupled both in time and in space, due to the presence of several power plants on cascade over the same basin. These features make the operation planning of the Brazilian Interconnected System (BIS) a complex and huge problem. For this reason, the problem is solved in several steps, which consider different time horizons and different degrees of detail to represent the BIS [1]. In hydro dominated hydraulic systems the most important source of uncertainty is the amount of water arriving into the reservoirs at each time of the planning period. In mid-term planning, an accurate representation of the streamflows in hydroelectric plants is crucial. In this context, the streamflows are represented by using synthetic sequences, generated by a periodic autoregressive model [2]. Naturally, in order to represent accurately the underlying stochastic process, a large number of synthetic sequences is needed. However, if these number is too big, the corresponding optimization problem may be too difficult to solve, due to computational time. It is then necessary to have some criterion for choosing among the many sequences generated a small subgroup of synthetic sequences. Such criterion should provides both statistical adherence and, ideally, optimal decisions close to those obtained when using all the sequences. This work employs the Scenario Optimal Reduction (SOR) technique to select an optimal subset of synthetic sequences that represents well enough the uncertainty. The SOR technique, introduced in [3], makes use of the Fortet-Mourier metric and the duality theory in Linear Programming, to compute the distance between two probability measures, corresponding, respectively, to a huge scenario tree with all the sequences and to a smaller scenario tree, of fixed cardinality. The computed distance function defines the objective function for an optimization problem, which selects the best reduced tree, by picking the most representative scenarios among all the sets of fixed cardinality. This higher-level optimization problem has a combinatorial nature and is essentially a set-covering problem [4]. Its solution is found by employing a heuristic procedure. This work presents a methodology along the lines of [3], with the important difference that the reduced scenario tree is chosen so that the periodic autoregressive model defining the sequences is not modified. The optimal reduced tree, covers all the planning horizon, can be built stage by stage, by first generating and then reducing a set of scenarios for each stage. In this sense, the technique is suitable for keeping computational burden controlled, since the original huge tree does not have to be defined a
1
priori. This work is organized as: Section 4 presents the scenarios reduction problem; in Section 5 is shown the algorithm for the construction of scenarios tree and a statistical analysis of the methodology is discussed. Finally, in Section 6, a numerical validation is done, with very satisfactory results, obtained when applying the SOR technique to the Brazilian hydraulic system. 4. Scenario Optimal Reduction Let X ⊂ RN be the feasible set of the operation variables for the mid-term planning such as thermal generation, energy interchanges between the subsystems∗ , stored volumes in reservoirs, etc.. Set Θ ⊂ RN the sample space of uncertainty, with N scenarios wi ( of streamflows) , with associated distribution of probabilities that assigns each scenario wi a probability pi , for i ∈ I := {1, · · · , N }. Given the cost function f : Θ × RN → R, the mid-term planning optimization problem can be written as XN
min EP f (x), where EP f (x) :=
i=1
x∈X
pi f (wi , x)
(1)
is the expected value of the cost. When the number of scenarios is too big, the solution of the problem (1) becomes very complex. In this situation, it is convenient to close a smaller subset of representatives scenarios {wj1 , wj2 , · · · , wjN red } with N red 0, we want to find a optimal pair (J , Q ) - the optimal set of index of scenarios to be discarded and the optimal redistribution of scenarios preserved - that ensures that k min EP f (x) − min EJ ∗ ,Q∗ f (x)k ≤ ǫ and S(Q∗ ) ⊂ S(P ) + B(0, ρ), x∈X
x∈X
(3)
where B(0, ρ) is a ball centered in 0 ∈ Rn with radius ρ. Theorem 1 in [3] ensures that, under specific conditions satisfied by the cost and constraints functions defining mid-term planning operation, for any ǫ > 0 there exists a constant ρ > 0 such as relation (3) is satisfied. Intuitively, this means that it is possible to take almost the same optimal decisions (generation, interchanges, stored volumes, etc.) and, consequently, to achieve similar cost and optimal deficit from a smaller subset of scenarios, if this set is properly selected. A natural criterion yielding the proximity between solutions of the original (1) and reduced (2) problems is to minimize the gap between the functional values of the respective optimization problems min kEP f (x) − EJ,Q f (x)k for each x ∈ X. J,Q
(4)
This is the main motivation for using the Fortet-Mourier metric for probability measures explain below.
∗ It
is understood by subsystem each electrical system in North, Northeast, South and Southeast Brazil region.
2
4.1. The Fortet-Mourier Metric Given two scenarios w and w′ we say that they are close when its distance is small enough. Such distance is measured by the function d : Θ × Θ → R+ , for example, a norm, see 5.3 below. The Fortet-Mourier metric, defines the distance between two measures of probabilities, in our case P and Q. In order to motivate its definition, note that when the cost function is Lipschitz continuous in w and X is a bounded set, we have the following inequality kf (w, ·) − f (w′ , ·)k ≤ Ld(w, w′ ), Here L is Lipschitz constant for f , Therefore, L−1 f (w, ·) belongs to the functional set Φd := {φ : Θ → R; kφ(w) − φ(w′ )k ≤ d(w, w′ )}.
(5)
To determine the distance between two probabilities P and Q, the Fortet-Mourier metric, denoted by DJ (P, Q), uses Φd as a feasible set of the following optimization problem DJ (P, Q) := sup kEP φ(x) − EJ,Q φ(x)k for x ∈ X.
(6)
φ∈Φd
The SOR problem determines the optimal pair (P ∗ , Q∗ ) by minimizing this metric over the set of P and Q.
4.2. Bi-level Formulation of Scenarios Problem Reduction Problem (3) can be split into the following two levels: minJ {c(J) : J ⊂ I, #J = N − NP red} c(J) := supQ {DJ (P, Q) : Q ≥ 0, l∈I/J ql = 1}
first level problem, second level problem.
This bi-level formulation of the SOR problem has the purpose of separating variables J and Q for obtaining a explicit formulation for the second level problem. Namely, Theorem 2 in [3] shows that the optimal value c(J) is given by the expression X c(J) = min d(wj , wl ). j∈J
l∈I/J
Optimal solution of the second level problem is given by X pj , where Jl := {j ∈ J : l ∈ arg min d(wj , wl )}. q l = pl + l∈I/J
j∈Jl
(7)
Hence, the SOR problem could be solved by an exhaustive manner for analyzing each possible set J with a fixed cardinality N − N red. But this choice is computationally impracticable, due to the large number of possible combinations of scenarios sets. Since it is not possible to examine each one of the candidates J for selecting scenarios, a heuristic algorithm, given in [5] an called fast forward reduction is used. The main idea of this algorithm is: to iteratively solve min{c(J) : J ⊂ I : #J = N − k}, where k = 1, · · · , N red is an iteration counter. For more information about the technique of reduction scenarios, we refer to [3, 5]. Now we present a methodology along the lines of [5], with an important difference, that allows us to use variate multi scenarios trees generated by a periodic autoregressive model. 5. Construction of Streamflows Scenario Tree The BIS operating planning uses synthetic sequences for represent the stochastic process of streamflows. This is the model GEVAZP, developed by CEPEL [6], generate synthetic sequences of monthly streamflows by a periodic autoregressive model. For more information, see [2]. 3
With the information generated by the model GEVAZP, the next step is to determine the optimal generation for each power plant, subject to the streamflow uncertain and operational rules. The corresponding optimization problem is solved by the DECOMP, developed by CEPEL [1]. Following section we present some characteristics of the synthetic sequences generation. It will be shown the structure autoregressive and some statistics that we are interested in keeping. 5.1. Some Characteristics of the Synthetic Sequences Generation For the first month of the planning problem, let {Zt−1 , · · · , Zt−m } be past values of streamflows for a given hydraulic plant. Typically, m ≤ 12. Le µt and σt2 be respectively, the history mean and variance. The periodic autoregressive model defines tth streamflow as Zt = µt +
pt X
σt φti
i=1
Zt−i − µt−i σt−i
+ σt ξt ,
(8)
where ξ is a noise i.i.d. with mean zero and variance one, and pt is the order of model (p < 12). Using the notation [·|− ] to express conditioning to the past, the theorical mean of the relation (8) is given by E[Zt |− ] = µt +
pt X
σt φti
i=1
Zt−i − µt−i σt−i
,
(9)
which the theorical variance is V [Zt |− ] = σt2 σξ2t .
(10)
Similarly to (8), at time t + 1 the streamflow is give by In this sense, we can use the value generated at time t for generating the values at time t + 1, pt+1
Zt+1 = µt+1 +
X
σt+1 φt+1 i
i=1
Zt+1−i − µt+1−i σt+1−i
+ σt+1 ξt+1 ,
Which together with (9) yields the relation Zt+1 = µt+1 +
σt+1 φt+1 1
E[Zt |− ] + σt ξt − µt σt
pt+1
+
X
σt+1 φt+1 i
i=2
Zt+1−i − µt+1−i σt+1−i
+ σt+1 ξt+1 .
As a result, the theorical values for step t + 1 are given by E[Zt+1 |− ] = µt+1 +
pt+1 X Zt+1−i − µt+1−i σt+1 t+1 t+1 , (E[Zt |− ] − µt ) + σt+1 φi φ σt 1 σt+1−i i=2
and V [Zt+1 |− ] =
σt+1 t+1 φ σt 1
2
2 V [Zt |− ] + σt+1 σξ2t+1 .
(11)
(12)
The reason to consider such statistics is the verification of the synthetic sequences statistics. As will be seen in Section 6, the theoretical values are useful to validate a small tree constructed by the algorithm below. The reduction scenarios algorithm in [5] is applied to a time horizon of T steps, going backward to time step t = 1. Scenarios aggregation break down the autoregressive structure of the stochastic process. But in our application, preserving the autoregressive structure of stochastic streamflows is important, because it ensures that all streamflows at step t + 1 can be actually computed, given the known values of streamflows until step t. To preserve the periodic autoregressive structure, we build trees by applying the scenarios selection algorithm (given in [5]) in a forward manner, from times steps t to T . Moreover, we use the theoretical statistics (9)-(12) to calculate the distances between scenarios in order to build small trees, that represent accurately the underlying stochastic process. 4
We now present the algorithm for building small, however statistically accurate, scenarios trees. 5.2 Algorithm for Building Scenarios Tree Suppose we are interested in generating a scenario tree with 12 time steps, with N = 1000 scenarios at each node, where each node corresponds to streamflow arriving to one reservoir. Thus, it results in a total number of scenarios equals to 100012 , impossible to be stored or even manipulated. Our method allows working indirectly with all this information, using a tree approximations. More precisely, the generation of a scenarios tree through of the SOR strategy is based on the generation of N scenarios for the first period and followed by a immediate application of the scenarios selection algorithm. For each scenario preserved, N scenarios are generated, and scenarios selection procedure is applied to each branch of the tree. This procedure is repeated until t = T . Figure 1 illustrates the scenarios tree generation for T = 2 time steps.
Figure 1: Generation and reducing stage by stage. The algorithm SOR can be formally stated as follows. Algorithm SOR. Parameters: For k = t, · · · , T , let N (k)red be the number of scenarios that will be preserved, satisfying N (k)red < N (k) and J k ⊂ I k := {1, · · · , N (k)} the index subset of discarded scenarios. • Step 0 (initialization ) Set k = t. • Step 1 (generation) Generate N (k) scenarios. • Step 2 (distances) Calculate the distances between the scenarios by some function d : Θ×Θ → R+ . • Step 3 (scenarios selection) Apply some algorithm for selection scenarios (for example, the fast forward selection in [5]) to find N (k)red representative scenarios that have index belong to I k /J k . • Step 4 (redistribution) Redistribute the scenarios by (7). • Step 5 (update) Set k = k + 1. If k ≤ T go to Step 1, else stop. Thus, at each time step k N (k) scenarios are generated and them are reduced to N (k)red . Now we comment on the distance function. 5.3 The Distance Function The function d from (7)has a key role in the scenarios selection. As shown in [7] the distance function d must satisfy certain conditions such as, continuity, not negativity, and triangular inequality. The function used in [7] is given by d(w, w′ ) = dr (w, w′ ) := kw − w′ k max{1, kwkr , kw′ kr }, where k · k is a norm and r > 1 a constant. For the mid-term operation planning problem there are usually more than N plant = 100 hydroelectric plants, so each scenario wti is a vector in RN plant with coordinates wti (j). Hence, the distance function is
5
ˇ t , w′ ) = d(w t
max
j=1,···,N plant
{dj (wt , wt′ )}
(13)
where dj (wt , wt′ ) =
q
o n cjt |wt (j) − wt′ (j)| max 1, cjt |wt (j)|2 , cjt |wt (j)|2 ,
and | · | is the absolute value function, and the scaling factor cjt can be take, for example, equal to the ratio between the theoretical and sample variance V [ZSt2|− ] of the j th hydroelectric plant. The reason of our choice the scaling factor is the following. When scenarios are discard, variance tends to decrease. But such reduction is not interesting for the streamflows stochastic process, because it is convenient to keep extreme scenarios that represent more severe floodings or droughts. The penalty terms q o n cjt and max 1, cjt |wt (j)|2 , cjt |wt (j)|2 play a key role in stabilizing the values of variance. In fact, for the final reduced tree, the use of such penalty terms avoid eliminating the most extreme scenarios. ˇ ·) does not satisfy the triangular inequality. However, the results obtained with dˇ The function d(·, are better than those obtained with the other functions (e.g. k · k1 , k · k2 and k · k∞ ). We present some numerical results obtained by SOR technique on a real sized BIS configuration. 6. Numerical Results This section evaluates, by using statistical tests, the application of the SOR technique to hydrological sequences generated by model GEVAZP on a real configuration of the mid-term operation planning problem. The case studied has the following features: history of streamflows between 1931 and 2006; planning horizon of 2 mouths, April and May; configuration with 111 hydroelectric plants. The number of scenarios is • April: 500 generated scenarios and 120 preserved scenarios (24% of scenarios kept); • May : 60000 generated scenarios and and 960 preserved scenarios (1.6% of scenarios kept). A first assessment of the quality of preserved scenarios is by comparing the statistics of both trees (original and reduced) with the statistics predicted by theory. 6.1. Mean and Standard Deviation Figure 2 below shows the mean and standard deviation of the original and reduced trees. Note that the corresponds statistics are in the acceptability ranges represented by the errors bars in the figure. In relation to the theoretical values, we conclude the statistics of the reduced tree obtained by SOR technique are satisfactory in relation to the theoretical values.
6
TRES MARIAS: 1º per.
TRES MARIAS: 2º per.
1000
600
400
3
m3/s
600
m /s
Generated Theoretical Reduced
800
400
200
200 0
Means
0
Standard deviation
Means
ITA: 1º per. 400
600
m /s
200 100 0
Means
200 0
Standard deviation
Means
TUCURUI: 1º per.
4
x 10
15000
m3/s
1
10000
0.5 0
Means
5000 0
Standard deviation
Means
XINGO: 1º per. 3000
1500
2000
1000
1000 0
Means
Standard deviations
XINGO: 2º per. m3/s
m3/s
Standard deviations
TUCURUI: 2º per.
1.5
m3/s
400
3
3
m /s
300
2
Standard deviations
ITA: 2º per.
500 0
Standard deviation
Means
Standard deviations
Figure 2: Mean and Standard Deviation 6.2. Linear Regression: Mean and Standard Deviation Figure 3 shows the linear regression between the means and standard deviations of the original and reduced trees for all the hydro-plants in the configuration. It is expected that both the slope of each linear regression line, and the determination coefficient R-square, R2 (measuring the fit of linear regression line, [8], take values close to 1. Since the SOR technique is based on the minimization of the differences between the expected values (recall (4)), the excellent results in relation to the statistical mean, confirmed by Figure, 3 were predictable. Moreover, the function dˇ defined in (13) takes into account the variance of scenarios. Hence, as show Figure 2 and right side of the Figure 3, the standard deviations are effectively preserved in both time steps. 2
2
Means 1º per.: y =0.99x, R =0.99987
Means 2º per.: y =0.99x, R =0.99921
20000 12000 Generated
Generated
15000 10000
10000
5000
8000 6000 4000 2000
0
0
0.5
1 Reduced
1.5
0 0
2
2
4000
4000
3000
Generated
Generated
S. deviations 2º per.: y =0.94x, R =0.99148
5000
3000 2000
10000 2
S. deviations 1º per.: y =0.97x, R =0.99827
2000 1000
1000 0 0
5000 Reduced
4
x 10
1000
2000 3000 Reduced
4000
0 0
5000
1000
2000 3000 Reduced
4000
Figure 3: Linear Regressions: Means and Standard Deviations 7
6.3. Linear Regression: Spacial Correlations The spacial correlation determines the hydrological dependencies among the hydroelectric plants. For example, hydro-plants in the same basin are positively correlated. Figure 4 shows the spacial correlations between streamflows of Trˆes Marias, It´ a, Tucuru´ı and Xing´ o hydroelectric plants, with the others hydroelectric plants in the same configuration. For this study, the preserved scenarios represent only 24% of the total generated scenarios for the first month and 1.6% for the second one. However, as shown in Figure 4, the small percentage of scenarios still plays efficiently the spacial correlations between hydroelectric plants configuration. 2
Spacial corr.: Generated
Spacial corr.: Generated
2
TRES MARIAS: 1º per.: y =1.02x, R =0.87438 1 0.5 0 −0.5 −1 −1
−0.5
0
0.5
1
TRES MARIAS: 2º per.: y =1.05x, R =0.81157 1 0.5 0 −0.5 −1 −1
−0.5
ITA: 1º per.: y =1.03x, R =0.76412 1 0.5 0 −0.5 −1 −1
−0.5
0
0.5
1
Spacial corr.: Generated
Spacial corr.: Generated
TUCURUI: 1º per.: y =1.07x, R =0.73505 1 0.5 0 −0.5
ITA: 2º per.: y =1.06x, R =0.70418 1 0.5 0 −0.5 −1 −1
−0.5
0
0.5
1
Spacial corr.: Generated
Spacial corr.: Generated
XINGO: 1º per.: y =1.08x, R =0.576 1 0.5 0 −0.5 0
0.5
1
TUCURUI: 2º per.: y =1.10x, R =0.65201 1 0.5 0 −0.5 −1 −1
−0.5
0
0.5
1
Spacial corr.: Reduced
2
−0.5
0
2
Spacial corr.: Reduced
−1 −1
1
Spacial corr.: Reduced 2
−0.5
0.5
2
Spacial corr.: Reduced
−1 −1
0
Spacial corr.: Reduced
2
Spacial corr.: Generated
Spacial corr.: Generated
Spacial corr.: Reduced
0.5
1
2
XINGO: 2º per.: y =1.11x, R =0.60722 1 0.5 0 −0.5 −1 −1
−0.5
Spacial corr.: Reduced
0
0.5
1
Spacial corr.: Reduced
Figure 4: Spacial Correlations 6.4. Goodness of Fit Table 1 gives the values related to critical Kolmogorov-Smirnov [9] and Cram´er-von Mises tests, [10], for the distributions of each plant. These tests measure the adhesion between two distributions of probabilities. If the value is less than the critical value, it is considered that the distributions of probabilities, referring to the big and small tree, are adherent.
Adhesion Tests Kolmogorov-Smirnov
Cram´er-von Mises
Table 1: Goodness of Fit Test H. Plant Critical Value Trˆes Marias It´ a 95% ⇒ 1.628 Tucuru´ı 99% ⇒ 1.358 Xing´ o Trˆes Marias It´ a 95% ⇒ 0.461 Tucuru´ı 99% ⇒ 0.743 Xing´ o
1o per. 1.039 1.411 0.921 0.725 0.112 0.483 0.089 0.117
2o per. 1.006 1.517 0.779 0.964 0.148 0.510 0.111 0.176
The values in Table 1 confirm once again the efficiency of the SOR technique. The values obtained are 8
below the limits of acceptance. This result ensures adherence between the distributions of probabilities P and Q. 7. Conclusion The high cardinality of information needed to represent accurately uncertainty in streamflows for the mid-term planning operation problem makes its soluction computationally impracticable for the BIS. As shown by our results, the SOR technique is an effective tool to build scenarios tree that are sufficiently representative, and provide good statistical adhesion between the big and small scenarios trees. The use of the Fortet-Mourier metric to define the SOR problem and select a subgroup of scenarios from a big tree with many scenarios allows eliminating duplicated information, and significantly reduces the CPU time. Namely, to build a tree with 960 scenarios, from a tree of 60000, 14 minutes on a computer Pentium 4 3.00 GHz were needed. The numerical experience shows satisfactorily low levels for the error in means, standard deviations, and spacial correlations when the SOR technique applied. 7. References [1] M.E.P. Maceira; L.A. Terry; J.M. Damazio; F.S. Costa; A.C.G. Melo, Chain of Models for Setting the Energy Dispatch and Spot Price in the Brazilian System, Power System Computation Conference - PSCC’02, 2002, Sevilla, Spain, June 24-28. [2] G.E.P. Box; G.M. Jenkins, Time Series Analysis, Forecasting and Control, Holden-Day, 1994, San Francisco, Third Edition. [3] J. Dupocov´ a; N. Gr¨ owe-Kuska; W. R¨omisch, Scenario reduction in stochastic programming: An approach using probability metrics, Mathematical Programming,2003, Ser. A 95, 493-51. [4] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms. 2nd Edition. MIT Press and McGraw-Hill, 2001, Section 35.3, The set-covering problem, pp.1033-1038. [5] N. Gr¨ owe-Kuska; H. Heitsch; W. Romisch, Scenario reduction and scenario tree construction for power management problems, IEEE Bologna Power Tech Proceedings, 2003, (A. Borghetti, C.A. Nucci, M. Paolone eds.), IEEE, pp. 2-4. [6] M.E.P. Maceira;C.V. Bezerra, Stochastic Streamflow model for Hydroelectric Systems In: Proceedings of 5th International Conference on Probabilistic Methods Applied to Power Systems, 1997, Vancouver, Canada, Sep., pp. 305-310. [7] H. Heitsch, W. Romisch, Scenarios Reduction Algorithms in Stochastic Programming, Computational optimization an Applications, 2003, 187-206. [8] D.N. Gujarati, Basic Econometrics, McGraw-Hill,2000, 3th ed. [9] F.J. Massey, The Kolmogorov-Smirnov Test for Goodness of Fit,Journal of the American Statistical Association, 46 (March 1956), pp 68-77. [10] S. Seigel, Nonparametric statistics for the behavioral sciences, New York. McGraw-Hill Book Company, 1956.
9