A Clustering-based Preprocessing on Feeder Power ...

0 downloads 0 Views 219KB Size Report
result is evaluated by applying more comprehensive clustering validity indices. Figure 1. Sumer and winter days load profile and EPF. Four year hourly EPF data ...
A Clustering-based Preprocessing on Feeder Power in Presence of Photovoltaic Power Plant Navid Haghdadi

Behzad Asaei

Ziba Gandomkar

ECE Department, EAT Lab University of Tehran Tehran, Iran [email protected]

ECE Department, EAT Lab University of Tehran Tehran, Iran [email protected]

ECE Department University of Tehran Tehran, Iran [email protected]

Abstract—The equivalent power of feeder, as one of the inputs of chronological simulation of a power network in presence of a large scale photovoltaic power plant, needs some preprocessing in order to reduce simulation time. The Fuzzy C-means, Gustafson Kessel, K-means and K-medoid methods are applied to find a few numbers of representatives for the equivalent power of feeder. The proposed techniques are validated and optimal number of clusters is found out by means of several clustering validity indices. A comparison with a conventional technique is made on the score of invariant separation index.

12 P (MW)

10 8

4

2

4

6

8

10

12

14

16

18

20

22

24

P (MW)

12

Keywords- Photovoltaic power plant; data clustering, fuzzy clustering, validity indices,

10

b. Winter day

8 Load EPF

6 2

4

6

8

10

12 14 Time (Hour)

16

18

20

22

24

Figure 1. Sumer and winter days load profile and EPF

INTRODUCTION

Energy crisis, environmental problems, and global population growth are the main reasons of rapid increment in using renewable energies such as solar power. For simulating a power network in presence of a large scale on-grid photovoltaic power plant (PVPP), one can assume the output power of the PVPP, which is injected to network, and the load profile of feeder and then calculate the equivalent power of feeder (EPF) by subtracting the output power of the PVPP from the feeder load profile. Fig.1 shows these values in a specific feeder in Tehran for two days of a year. The peak value of the PVPP output power and the load profile are 4 MW and 10 MW respectively. The mean value of hourly interval is considered for each interval. The varietal nature of the PVPP output power eventuates to the EPF fluctuation in different days of year. There are several ways to evaluate the energy obtained by the PVPP [1], [2]. In one of the conventional methods, the EPF data is classified to seasonal intervals and their mean value is used in grid simulation program. A more precise approach is gathering the hourly intervals of the irradiance and temperature for the past few years and calculating the EPF, using the actual feeder load profile. The main drawback of this approach is its long simulation time due to the extremely large amount of data. Therefore, it is obvious that a data preprocessing is necessary in order to reduce the amount of data such as clustering-based preprocessing which is proposed in [3], [4]. Using clustering quality control indices such as mean absolute percentage error (MAPE) and mean square error (MSE), some good results obtained in the aforementioned approaches. In this paper, the result is evaluated by applying more comprehensive clustering validity indices.

Load EPF

6

4

I.

a. Summer day

Four year hourly EPF data is used in present article. K-means, K-medoid, Fuzzy C-means (FCM) and Gustafson Kessel (GK) methods are applied to the dataset. The optimal number of clusters and the best clustering method should be found out. In order to evaluate proposed technique, a comparison with the conventional method should be made. II.

DATA PREPARATION

The raw input data are the feeder load profile [7] and the four year hourly irradiation and temperature [8]. Using irradiation and temperature, the output power of the PVPP is calculated. By subtracting output power from the feeder load profile, EPF is obtained. It has 1461 rows and 24 columns. Each row represents a day ( ), and each column represents an hour. A. Calculating PVPP Output Power The maximum output power of a photovoltaic array ( ) is simply an analytical function of the irradiance level ( ) and ) [9]. the array temperature ( , •

(1)

Irradiance[3]: The global irradiance consists of diffuse irradiance ( ) and direct beam irradiance ( ): (2)

The PVPP arrays are tilted in order to achieve more output power. The global irradiance on tilted photovoltaic array ( ) is sum of three components: (3) where is the diffuse component, is the direct beam, and is the reflected component. These parameters can be calculated straightforwardly from the following equations: (4) 0.5 1

1

1

(5)

1

(6)

1

(7)

In the above equations is the zenith angle and is the angle between the beam irradiance on a tilted surface and the normal to that surface. The amount of diffuse and direct beam irradiance, and the ambient temperature in hourly intervals for Tehran which located on 35°44'N and 51°30'E are collected from NASA [8]. The optimal tilt angle is considered 35 degrees as mentioned in [10] and the amount of is considered 0.3 because of the climate condition. All irradiance parameters are in W/m2. •

of these techniques, thus optimal number of clusters should be found using some well-known validity measurement indices. A. Clustering Techniques In order to use different clustering algorithms, the data should be normalized. By a linear operation, values are normalized between 0 and 1. After normalization, different clustering methods are utilized. Partitioning the data set into clusters is whether crisp or fuzzy. Although crisp partitioning methods are straightforward, they have some computational problems in some cases. Both crisp and fuzzy methods are used in this article. As an example of crisp partitioning methods, Kmeans and K-medoid are used [6]. To avoid numerical problems and achieve more reliable results, FCM method is utilized. In this approach, each object belongs to each cluster to some membership grade. Due to various geometrical shapes of the clusters, an extension of the standard FCM is developed. This extension is called Gustafson Kessel algorithm wherein adaptive distance norm is computed and each cluster has its own norm inducing matrix. B. Clustering Validation Two problems still remain challenging: selecting the appropriate number of clusters and choosing the best clustering technique. Computing some validity indices which indicate whether the method and number of clusters fits to the input data is useful in solving these problems. Some validity indices are introduced in table I [5], [6], [12]: TABLE I.

Temperature The temperature of the solar cell can be calculated from the ambient temperature (T ), the irradiance level, and the NOCT factor that are given in the solar cell datasheet by the following equation [11]:

VALIDITY INDICES DEFINITION

Indices DI c

(8)

min

C,

min

,

C





, ,

,



,

,

,

,

B. Load Profile For implementation of the methods, the past four years hourly load profile of an actual feeder of Tehran is used [7]. The EPF is obtained from the PVPP output power and the load profile ( ) for hourly intervals as follows:



log

(10)

Crisp

(11)

Crisp

(12)

Fuzzy

(13)

Fuzzy

/ Where:

(9)

,

METHOD

Because of uncertainty in nature of PV output power, the feeder power in presence of the photovoltaic power plant is non-linearly separable data; consequently its partitioning is a challenging problem. Conventionally, the data are partitioned into three clusters: spring and autumn, collectively in one cluster, summer, and winter. To achieve more reliable clusters and cluster representatives, different crisp and fuzzy clustering methods are employed. These methods should be compared in case of the EPF. Moreover, the number of clusters is the input

,

C

,

In present article, the PVPP is assumed to be constructed of 20,000 REC-AE220 solar arrays with NOCT of 47.5°C, and the peak power of 220 W each.

III.

Crisp or Fuzzy

Equation

(14)







|

(15)

|

Crisp and Fuzzy

Crisp and Fuzzy



Where: (c)

Crisp



,



(16)

IV.

RESULTS

In this section, the indices are calculated to find the optimal number of clusters. Then this optimal number is considered as an input of different clustering methods and theses techniques are compared. Finally by computing within and between scatter of clusters a comparison is made between the clustering-based method and the conventional technique for preprocessing the EPF in presence of large scale PVPP. A. Optimal number of clusters The number of clusters must be determined before employing clustering methods, because it plays essential role as an input parameters of different clustering techniques. Different validity indices are plotted versus the number of clusters to find the optimal number of clusters. Fewer the number of the clusters, less expensive and time consuming the further simulation of grid is, consequently we seek minimum number of clusters. Fig. 2 indicates the values of the validity indices in K-medoid method versus the number of clusters. On the score of both DI and ADI optimal number of clusters is about 23. In this value, ADI approaches zero for first time and DI is in its local minimum. Some helpful diagrams are shown in Fig. 3 for GK method. CE is not plotted because this index is less informative as a result of no direct connection to data structure. PC has the same problem and as one can see in Fig. 3, it decreases monotonically as the number of the clusters increases. In the point where the value of the number of clusters is optimal, SC and XB do not decrease a lot as the number of clusters increases. So, for achieving just a little better value, increasing the number of clusters seems to be illogical. This increment in number of clusters made grid simulation more time consuming. By assessing different diagrams, the appropriate value for C seems to be 25.

0.08

Appropriate Values for C

DI

0.06 0.04 0.02

2

x 10

15

-4

20

25

30

Appropriate Values for C ADI

C is the number of clusters, Ci is a set of data points in the ith clusters and d x, y is Euclidian distance between two points: x and y. μ is membership grade, and m is weighting parameter. None of indices is perfect and further evaluation should be done in order to validate clustering methods. Dunn Index (DI) and Alternative Dunn Index (DAI), used in validation of crisp partitioning, are useful when data is well separated, compact and mass. Therefore, relying only on these indices seems to be insufficient in case of clustering our data. Other indices are mainly used to evaluate fuzzy clustering. The value of partition coefficient (PC) is between 1/C and 1. The closer to 1/C the index, fuzzier the partitioning is. Another validity index involving just the membership grade is classification entropy (CE). The main weakness of these two indices is monotonic decreasing with the number of clusters and absence of direct connection with data in index calculation In most popular fuzzy index, which is Xie and Benni index (XB), two terms play roles. One of them measures compactness and the other represents the cluster separation. SC is based on both membership grade and data structure. The same as XB, separation and compactness, are considered in its definition. Within scatter measures the variance of data in each cluster and between scatter evaluate the distances between different clusters. For big values of J, one can guess the result of partitioning method is well separated clusters.

1

0 10

15

20 25 Number of clusters

30

Figure 2. The appropriate values for C

TABLE II.

VALIDITY INDICES COMPARISON IN DIFFERENT METHODS

Validity Measurement Indices

Methods DI

ADI

PC

CE

XB

SC

K-means

0.057

5.4

1.00

NaN

2.34

0.205

K-medoid

0.043

1.3

1.00

NaN

Inf

0.202

FCM

0.047

0.5

0.16

2.434

0.46

0.430

GK

0.012

1.4

0.04

3.198

0.10

0.422

On the score of each index, the best method is determined; its cell color is gray.

TABLE III.

CONVENTIONAL AND PROPOSED METHOD COMPARISON

Methods

Number of Clusters

Invariant separation index

Conventional method

3

9.88

3

242.61

K-medoid

23

2.57

10

25

1.92

10

B. Comparision between different clustering methods As mentioned before, the number of clusters should be known apriori. In previous section, it is shown that 25 is a proper value for C. K-means, K-medoid, FCM and GK for C=25 are used and validity indices are embraced in table II. PC and CE are not useful in crisp partitioning while their value is always 1 and NaN respectively. When crisp and fuzzy methods are compared, DI is more popular, as one can see in table I GK is the best method on the score of DI. XB and PC are commonly used in validation of fuzzy clustering. Their values for GK are the smallest, so GK has the best result on the score of XB and PC too. Using ADI and SC are not reliable here, because their values for different methods are near to each other.

-4

x 10

0.02

0.005 20 40 Number of clusters (C)

0.4

XB

ADI

DI

0.01

3

0.6

4

0.015

2 0 3

0.2 20 40 Number of clusters (C)

0.3

60 40

PC

SC

0 3

20 40 Number of clusters (C)

20 0 3

0 3

20 40 Number of clusters (C)

20 40 Number of clusters (C)

Figure 3. Validity indices of FCM

V.

CONCLUSION:

EPF is considered as one of the inputs of the chronological simulation and analysis of the power network in presence of the large scale photovoltaic power plant. Since the simulation is time consuming in the case of considering all four year hourly intervals, the fuzzy and crisp clusteringbased preprocessing techniques was proposed in this article in order to extract some representatives for data. Presented techniques were validated and optimal number of clusters was found out on the score of several clustering validity indices. Finally a comparison with conventional technique was made. Because of not well-separated nature of the clusters, the fuzzy methods had better result than the crisp ones. On the score of invariant separation index, the presented method worked better than the conventional method, which splits the data into sessional intervals without considering the effect of random nature of photovoltaic output power. For example cloud profile, which is an important factor in photovoltaic output power, has haphazard nature in the high land region such as Tehran. In addition mountainous and mild climate condition and overpopulation result in the enormous air pollution in Tehran leads to fluctuation in the output power PVPP. Therefore, precise analysis of the power of feeder in presence of PVPP, such as clustering-based methods is necessary.

1

0.8

Representatives

C. Comparing clustering-based method with conventional technique One of the frequently used conventional methods is splitting data into three subsets: Spring and autumn, summer and winter. The representatives are mean values of subsets. In this article, to compare this technique with the clusteringbased method, the invariant separation indices are computed. The result is shown in table III. As seen in the table, the values of this index for clustering-based methods are significantly more than the one for the conventional method. Fig. 4 shows 11 well-separated representatives of 25 clusters which obtained with K-medoid method. As seen in the figure the values of various clusters are different from each other to a good extent. So it is a good representation of all data.

0.6

0.4

0.2

0

2

4

6

8

10

12

14

16

18

20

22

24

Time (Hour)

Figure 4. The representatives of some clusters

REFERENCES [1] I. Abouzahr, and R. Ramakumar, "An approach to assess the performance of utility-interactive photovoltaic systems," IEEE Transactions on Energy Conversion, Vol. 8, No. 2, June 1993. [2] B. H. Chowdhury, "Effect of central station photovoltaic plant on power system security," Proc. of 21st IEEE Photovoltaic Specialist Conference, Kissimmee, FL, May, 1990. [3] W. A. Omran, M. Kazerani, M.M.A. Salama, "A clustering-based method for quantifying the effects of large on-grid PV systems," Power Delivery, IEEE Transactions on , vol.25, no.4, pp.2617-2625, Oct. 2010. [4] A. Pregelj, M. Begovic, and A. Rohatgi, “Quantitative techniques for analysis of large data sets in renewable DG,” IEEE Transactions on Power Systems, Vol. 19, No. 3, August 2004. [5] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “Clustering validity checking methods: part ii,” SIG-MOD Rec., vol. 31, no. 3, pp. 19–27, 2002. [6] B. Balasko, J. Abonyi and B. Feil, “Fuzzy clustering and data analysis toolbox for use with Matlab” available on http://www.mathworks.com/ [7] www.tavanir.org.ir [8] www.nasa.gov [9] G. Farivar, B. Asaei, and M.A. Rezaei, "A novel analytical solution for the PV-arrays maximum power point tracking problem," Power and Energy, IEEE International Conference on, Nov. 29 2010-Dec. 1 2010. [10] E. Asl-Soleimani, S. Farhangi, and M.S. Zabihi, "The effect of tilt angle, air pollution on performance of photovoltaic systems in Tehran," Renewable Energy, Vol. 24, pp.459–468, 2001. [11] P. Trinuruk, C. Sorapipatana, and D. Chenvidhya, "Estimating operating cell temperature of BIPV modules in Thailand," Renewable Energy, vol. 34, pp. 2515-2523. 2009. [12] P.J.G. Lisboa, I.O. Ellis, A.R. Green, F. Ambrogi, and M.B. Dias, “Cluster-based visualisation with scatter matrices” Pattern Recognition Letters, Vol.29, pp.1814–1823, 2008.