Rock Mech Rock Eng (2015) 48:375–385 DOI 10.1007/s00603-014-0569-x
TECHNICAL NOTE
K-means Algorithm Based on Particle Swarm Optimization for the Identification of Rock Discontinuity Sets Yanyan Li • Qing Wang • Jianping Chen Liming Xu • Shengyuan Song
•
Received: 16 July 2013 / Accepted: 5 March 2014 / Published online: 22 March 2014 Springer-Verlag Wien 2014
Keywords Discontinuity sets K-means algorithm Particle swarm optimization (PSO) Discontinuity orientation List of a b cj h D Ds d ei f iter K n M l1 , l2 N n nj Pi Pg POP r 1, r 2 Sj Vi
Symbols Dip direction () Dip angle () Center of the jth cluster Deviation angle from the mean orientation angle Dimension of the search space Discontinuity data set Search direction Unit normal vector of the ith discontinuity Cost function (fitness value) Iteration number Number of clusters Normalized eigenvector Orientation cosine matrix Learning factors Population size of particles Total number of discontinuities Number of discontinuities in the jth cluster Personal best position of the ith particle Global best position of the swarm Population of particles Random variables ranging from 0 to 1 Discontinuity set Velocity of the ith particle
Y. Li Q. Wang J. Chen (&) L. Xu S. Song College of Construction Engineering, Jilin University, Changchun 130026, China e-mail:
[email protected]
v w Xi
Ray–Turi index Inertia weight Position of the ith particle
1 Introduction Discontinuities within a rock mass strongly influence the behavior of the rock mass (Kulatilake and Wu 1984; Zhou and Maerz 2002), endowing the rock mass with discontinuous, non-homogeneous, and anisotropic features (Zhang et al. 2012). Therefore, collecting and analyzing discontinuity data within rock masses carefully has significant meaning for engineering applications (Hammah and Curran 1998). Engineers try to delineate discontinuities into subsets based on similar orientations when interpreting discontinuity data. Discontinuity orientations are commonly represented by the hemispherical projection of discontinuity poles for visualization. Based on this technique, the counting method which is performed by contouring on stereographic plot projections of discontinuity poles was introduced by Schmidt (1925). This method is very convenient and easy to use because computer software packages used for the plotting and contouring of pole information on stereonets exist today. However, the interpretation of the stereonets is subjective, and depends greatly on the experience of the data analysts. Moreover, both the sampling bias and the size of the reference circle strongly influence the clustering results (Klose et al. 2005; Jimenez-Rodriguez and Sitar 2006). Such problems have led to the development of alternative methods for the automatic identification of discontinuity sets. Cluster analysis fulfils the essential requirement for developing an analogous approach that partitions discontinuities into subsets automatically.
123
376
The first method of tackling the problem of subjectivity in clustering discontinuities was proposed by Shanley and Mahtab (1976), and was then further improved by Mahtab and Yegulalp (1982). Dershowitz et al. (1996) introduced an iterative method for grouping discontinuities based on a probabilistic, geological approach. These methods are based on the assumption of some probabilistic structures of discontinuity properties. Since determining an available probabilistic model of discontinuity properties may be a difficult task in some cases, more research has focused on developing cluster algorithms that use no prior probabilistic information. Based on fuzzy logic, Harrison (1992) first used the fuzzy K-means algorithm for the analysis of discontinuity orientation data. The algorithm used by Harrison has flaws due to the difficulty it encounters in dealing with spherical data. Subsequently, Hammah and Curran (1998) made some modifications to the fuzzy K-means algorithm that were specific to the clustering of discontinuity orientations. Considering that the clustering results of the fuzzy Kmeans algorithm are highly dependent on the distance metric, Hammah and Curran (1999) published a special discussion on the choice and use of various distance measures for clustering discontinuities. More recently, Zhou and Maerz (2002) adopted four methods to cluster discontinuity data, namely: nearest neighbor, K-means, fuzzy logic, and vector quantization methods. Klose et al. (2005) presented a clustering algorithm based on vector quantization, using the arc length between pole vectors on a unit sphere as a measure of their distance. To study the performance of this approach, the method of Shanley and Mahtab (1976) and the expert supervised method proposed by Pecher (1989) were chosen as benchmarks. Jimenez-Rodriguez and Sitar (2006) established a spectral method that used eigenvectors of matrices constructed by measures of similarity between the data points to identify discontinuity sets. Benchmark test cases were also used to assess the performance of this method. Jimenez (2008) performed an integration of the spectral method and fuzzy K-means algorithm for clustering discontinuities based on their orientations. Tokhmechi et al. (2011) adopted the K-means algorithm to group discontinuities into subsets according to multiple properties, including dip, dip direction, roughness, hardness, ends, continuity, and aperture. Markovaara-Koivisto and Laine (2012) introduced an approach for analyzing and visualizing scanline data based on MATLAB tools, using the Kmeans algorithm for the clustering of discontinuities. The K-means and fuzzy K-means are dynamic clustering algorithms. Compared with other clustering algorithms, these algorithms are best applied for the purposes of the rock engineering experts because of the specific classification problem encountered in the analysis of rock discontinuity
123
Y. Li et al.
data and the advantage that these algorithms are less susceptible to outliers than other algorithms (Hammah and Curran 1998). Both the K-means and fuzzy K-means require an initial assignment of cluster centers and the number of desired clusters. The clustering results of the two methods are influenced by the initial cluster centers, and both of them often achieve a local optimum. In order to obtain a global optimum, Xu et al. (2013) proposed a method using fuzzy set theory based on mutative scale chaos optimization for the classification of discontinuities. Particle swarm optimization (PSO) is a stochastic, population-based evolutionary optimization algorithm. Unlike conventional methods, this method is a global search method which can find the global optimum to a solution, and is not sensitive to the initial values. In this study, a K-means algorithm based on particle swarm optimization (KPSO) is proposed for the identification of rock discontinuity sets based on discontinuity orientations.
2 Methods 2.1 Representation of Directional Data A rock discontinuity is often assumed to have a planar structure and its spatial orientation is expressed in terms of dip direction a (0 B a B 360) and dip angle b (0 B b B 90). The orientation of a discontinuity can also be represented by a unit normal vector ei (Fig. 1), which is often described by its direction cosines ei = (xi, yi, zi) in a Cartesian coordinate system (Shanley and Mahtab 1976): 8 > < xi ¼ cos a sin b ð1Þ yi ¼ sin a sin b > : zi ¼ cos b
z
ei β α
North x
East y Fig. 1 Representation of a rock discontinuity orientation (Shanley and Mahtab 1976)
K-means Algorithm Based on Particle Swarm Optimization
377
2.2 K-means Algorithm The K-means algorithm is a classical algorithm for the classification of data and has been widely used in the field of data mining and knowledge discovery. This algorithm is designed to partition the discontinuity data set Ds = (Ds1, Ds2,…, Dsn) into K subsets S = (S1, S2,…, SK). The clustering results should follow these principles: ( [K Ds ¼ i¼1 Si ; Si 6¼ ; ði ¼ 1; 2; . . .; KÞ ð2Þ Si \ Sj ¼ ; ði; j ¼ 1; 2; . . .; K; i 6¼ jÞ A traditional K-means algorithm for grouping discontinuity data involves the following steps (Steinley and Brusco 2007; Tokhmechi et al. 2011): 1. 2.
Randomly distribute the discontinuities to K clusters. Calculate the center cj of each cluster according to the initial clustering results: 1X cj ¼ ei ð3Þ nj e 2S i
3.
4. 5.
j
where nj is the number of discontinuities in the jth cluster. Compute the distance between each discontinuity and the centers of K clusters, and allocate each discontinuity to its nearest cluster. Update the center of each cluster using Eq. (3). Calculate the cost function by: f ¼
K X X
dðei ; cj Þ
ð4Þ
j¼1 ei 2Sj
6. 7.
where d(ei, cj) is the distance between the ith discontinuity and the jth cluster center. Repeat steps (3), (4), and (5) to minimize the cost function. The iteration process will end when no discontinuity can be reallocated to different clusters.
2.3 Selection of Distance Measure and Computation of Cluster Centers The distance metric d(ei, cj) plays a crucial role in the clustering of discontinuities. The selection of a distance metric that is adequate for each particular case has been identified as a key issue for the successful application of clustering methods (Hammah and Curran 1999; JimenezRodriguez and Sitar 2006). Two factors should be considered in the computation of the distance matrix: (1) the selection of distance measure and (2) the computation of cluster centers.
Several distance measures have been used for measuring the distance between the orientations of discontinuities, such as the Euclidean distance measure, the distance measure based on the sine of the acute angle between discontinuity unit normal vectors, and the distance measure based on the acute angle itself. Our results show that the KPSO method performs well when the distance measure based on the acute angle between discontinuity unit normal vectors is employed. In this study, it is selected as a measure of distance, which is described as: dðei ; cj Þ ¼ arccosei cTj ð5Þ Considering that sometimes there are sub-vertical discontinuity sets which consist of discontinuities with nearly opposite dip directions, Eq. (3) has a drawback for calculating the centers of such discontinuity sets. To find the correct centers of discontinuity sets, an eigenanalysis method is adopted (Markland 1974; Shanley and Mahtab 1976; Hammah and Curran 1998). Given a discontinuity data set with N unit vectors, the center can be calculated in the following manner: (1) compute the orientation cosine matrix M with elements given by the following formula and (2) find the largest eigenvalue of M and its corresponding normalized eigenvector n = [n1, n2, n3] that is the desired center of the N unit vectors: 3 2 N N N X X X xi xi xi yi xi zi 7 6 7 6 i¼1 i¼1 i¼1 7 6 7 6X N N X X 7 6 N 6 ð6Þ M¼6 yi xi yi yi yi zi 7 7 7 6 i¼1 i¼1 i¼1 7 6 7 6X N N N X X 5 4 zi xi zi yi zi zi i¼1
i¼1
i¼1
where (xi, yi, zi) is the direction cosines of the ith discontinuity. 2.4 PSO Method PSO was first proposed by Kennedy and Eberhart (1995) when studying the foraging behavior of a flock of birds. It is a swarm intelligence algorithm following five principles, namely, proximity, quality, diverse response, stability, and adaptability (Kennedy et al. 2001; Parsopoulos and Vrahatis 2007). In PSO, each particle is viewed as a potential solution for a problem, and the population generated by all the particles is called a swarm. A particle i in a D-dimensional space is represented by its position and velocity, which can be described as Xi = (xi1, xi2,…, xiD) and Vi = (vi1, vi2,…, viD), respectively. Each particle flies around in the search space, with a velocity adjusted by its own flying experience
123
378
Y. Li et al.
and the experience of its companions. The position of each particle is changed by its velocity. The adjustment of the position and velocity of each particle is performed by the iteration. Each particle changes its position continually to find the best solution to the problem. The personal best position of each particle and the global best position of the entire swarm are represented as Pi = (pi1, pi2,…, piD) and Pg = (pg1, pg2,…, pgD), respectively. The position and velocity of a particle i can be updated by: vid ¼ w vid þ l1 r1 ðpid xid Þ þ l2 r2 ðpgd xid Þ
ð7Þ
xid ¼ xid þ vid
ð8Þ
Set parameters K, µ1, µ2, wmin, wmax and N
Initialize a population
Initialize Pi of each particle and initialize Pg
Update Xi and Vi of each particle by Eqs. (7) and (8)
where d is the search direction (d = 1, 2,…, D), w is the inertia weight ranging from 0.4 to 0.9, l1 and l2 are positive constants for learning (l1 = l2 = 2), and r1 and r2 are two random numbers in the range [0, 1] (Yagiz and Karahan 2011). The inertia weight w reflects the impact that the previous velocity of a particle has on its current velocity and plays a role in balancing global and local search. The PSO algorithm will have a strong ability for global search when w takes a larger value, while a smaller inertia weight will facilitate local search ability. A linearly decreasing inertia weight w by starting at wmax = 0.9 and ending at wmin = 0.4 is used to control the balance between global and local search (Shi and Eberhart 1998; Yagiz and Karahan 2011; Kalyani and Swarup 2011): wmax wmin w ¼ wmax iter ð9Þ itermax
K-means clustering algorithm for each particle
For each particle, update Xi and f(Xi)
Evaluate the fitness of each particle
If f(Xi) < f(Pi)
No
Yes Pi = Xi
where iter is the current iteration number and itermax is the total number of iterations. If f(Pi) < f(Pg)
2.5 KPSO Method The main idea of the KPSO method is to embed the K-means algorithm into PSO so as to combine their advantages and avoid their disadvantages. In KPSO, each particle needs to be encoded. An encoding method based on the clustering results of the K-means algorithm is adopted: the position of each particle corresponds to the centers of K discontinuity sets, where K is the desired number of sets; each particle has a velocity V to change its position and a fitness value f to evaluate the fitness of the position (i.e., the validity of the clustering results). In this study, the fitness value of each particle is represented by the accumulated distance between the centers of K discontinuity sets and their discontinuities, which is computed by Eq. (4). Because the number of discontinuity sets is K and the dimension of a discontinuity vector is 3, the position and velocity of each particle are
123
No
Yes Pg= Pi
No
Meet convergence criterion? Yes Output
Fig. 2 Flow chart of the KPSO method for the identification of discontinuity sets
K-means Algorithm Based on Particle Swarm Optimization
379
Table 1 Parameters used for discrete fracture network (DFN) generation Set
Dip direction/dip angle
Number of discontinuities
Distribution
Form
Parameters
f ðhÞ ¼ c sin h ec cos h =ðec ec Þ
c = 20
1 (?)
50/50
110
Fisher
2 (d)
110/65
80
Fisher
3 (m)
200/25
60
Exponential
( f ðhÞ ¼
4 (j)
280/35
40
Uniform
5 (.)
350/45
70
Uniform
( f ðhÞ ¼
c = 13 ke 0;
kh
;
k=4
h0 h\0
1=ðb aÞ; a h b 0; otherwise
a = 0, b = 0.2 p a = 0.01 p, b = 0.18 p
h is the deviation angle from the mean orientation angle
3 9 K-dimension variables. The fitness value of each particle is a one-dimension variable. The coding structure of a particle i can be described as follows (Lin et al. 2012):
1.
2. where cj = (cj1, cj2, cj3) represents the center of the jth cluster (1 B j B K) and D = 3 9 K. The KPSO method is described as follows (Kalyani and Swarup 2011; Lin et al. 2012; Jin et al. 2013): Step 1 Determine the number of clusters K and initialize the parameters, including learning factor (l1, l2), inertia weight (wmin, wmax), and population size N. Step 2 Initialize a population. A particle i (i = 1, 2,…, N) is initialized in the following manner: (1) randomly assign the discontinuities to K clusters and calculate the cluster centers using the eigenanalysis method; (2) initialize the position Xi = (xi1, xi2,…, xiD) of particle i by the K cluster centers and randomly initialize the velocity Vi = (vi1, vi2,…, viD); and (3) compute the fitness value f(Xi) by Eq. (4). A population which consists of N particles can be generated: 2 3 x11 ; x12 ; . . .; x1D ; v11 ; v12 ; . . .; v1D ; f ðX1 Þ 6 x ; x ; . . .; x ; v ; v ; . . .; v ; f ðX Þ 7 21 22 2D 21 22 2D 2 7 6 POP ¼ 6 7 4 ...5 xN1 ; xN2 ; . . .; xND ; vN1 ; vN2 ; . . .; vND ; f ðXN Þ
3.
Compute the distance between each discontinuity and the K cluster centers which are obtained by the new position of particle i and assign each discontinuity to its nearest cluster. Calculate the cluster centers using the eigenanalysis method, according to the clustering results. Calculate the fitness value by Eq. (4).
Step 6 For each particle, update its position and fitness value according to the results of the K-means algorithm in step 5. Step 7 For each particle i, compare the fitness value of its current position Xi with that of its previous personal best position Pi. If f(Xi) \ f(Pi), then assign Xi to Pi (let Pi = Xi).
N
ð10Þ where (xi1, xi2,…, xiD) = (c11, c12, c13, c21, c22, c23,…, cK1, cK2, cK3) and D = 3 9 K.
W
E
Step 3 Initialize each particle’s personal best position Pi and the swarm’s global best position Pg. For each particle i, set Xi to be Pi. Set the position of the particle with a minimum fitness value to be Pg. Step 4 Update the position and velocity of each particle by Eqs. (7) and (8), respectively. S
Step 5 Run the following steps in the K-means algorithm for each particle i:
Fig. 3 Poles of the artificial discontinuities
123
380
Y. Li et al. N
N
E
W
W
E
S
S
(a)
(b)
Fig. 4 Comparison of the clustering results of the K-means method and those of the new method with the same initial cluster centers. a The Kmeans method. b The new method
N
Table 2 Comparison of the clustering results of the K-means method and those of the new method with the same initial cluster centers Method
Set
Initial cluster centers (dip direction/dip angle)
Clustering results (dip direction/dip angle)
Assignment error rate
K-means method
1 (?)
43/45
36.3/48.8
29.7 %
2 (d)
85/52
74.3/53.2
3 (m)
110/60
110.9/71.7
4 (j)
211/20
219.4/25.1
5 (.)
351/35
341.0/44.7
1 (?)
43/45
50.2/49.9
2 (d)
85/52
106.5/67.3
3 (m)
110/60
194.7/28.3
4 (j)
211/20
282.2/36.3
5 (.)
351/35
353.1/46.1
The new method
W
6.7 %
Step 8 For each particle i, compare the fitness value of its personal best position Pi with that of the swarm’s previous global best position Pg. If f(Pi) \ f(Pg), then assign Pi to Pg (let Pg = Pi). Step 9 Check the convergence criterion, which is a maximum number of iterations. If converged, output the clustering results, else loop to step 4. It should be noted that an empty cluster may be generated in step 5. If an empty cluster exists, randomly choose a discontinuity that has the maximum distance to the cluster to which it belongs from other non-empty clusters as the center of this empty cluster. A flow chart is presented to help to understand the overall process of the KPSO method, as shown in Fig. 2.
123
E
S
Fig. 5 Clustering results of the KPSO method
3 Example Analyses 3.1 Application to an Artificial Data Set In this section, a discrete fracture network (DFN) approach was used to generate synthetic discontinuity sets with various distributions, various dispersion levels, and various amounts of overlap between sets (Min et al. 2004; Xu and
K-means Algorithm Based on Particle Swarm Optimization
381 N
Table 3 Clustering results of the new method Set
Initial cluster centers (dip direction/dip angle)
Clustering results (dip direction/dip angle)
Assignment error rate
1 (?)
50/50
48.9/49.8
6.4 %
2 (d)
110/65
106.9/67.1
3 (m) 4 (j)
200/25 280/35
195.7/28.1 280.3/36.4
5 (.)
350/45
350.2/45.6
W
Dowd 2010). An artificial data set consisting of five clusters of discontinuity orientations was generated. Table 1 shows the parameters used for DFN generation. Every simulated orientation is represented by its equal-area, higher hemisphere projection for visualization (Fig. 3). The K-means algorithm was applied to the artificial data set. The clustering results (Fig. 4a; Table 2) indicate that the K-means algorithm fails to identify the discontinuity sets when improper initial cluster centers are chosen. As listed in Table 2, the assignment error rate for this case is 29.7 %. The K-means algorithm is a local optimum method and suffers from difficulty in choosing proper initial cluster centers. Figure 4a shows that the K-means algorithm fails to distinguish discontinuities of set 4 from discontinuities of set 3, and discontinuities which are a part of set 4 were wrongly assigned to set 3. With the same initial values, the KPSO method performed well in identifying the discontinuity sets. The results of the KPSO method for this case are shown in Fig. 4b and Table 2, indicating that the partitions it provides match well the natural clusters defined in advance. The assignment error rate of the KPSO method is only 6.7 %. The mean orientations of the re-grouped discontinuity sets are very similar to the orientations which were selected as mean orientations to generate the synthetic discontinuity sets. Compared with the clustering results of the K-means algorithm, an improvement is demonstrated. To test the performance of the KPSO method, the orientations listed in Table 1 which were selected as mean orientations to generate the artificial data set were chosen as the initial values for the KPSO method. The clustering results are shown in Fig. 5 and Table 3. For this case, the assignment error rate is 6.4 %. By comparing the results of the KPSO method with different initial cluster centers, it is found that this method is not sensitive to the initial cluster centers, and the clustering results are slightly influenced by the initial values. A global optimum is achieved even when Fig. 6 Two-dimensional discontinuity traces
E
S
Fig. 7 Data set of discontinuity orientations in a biotite granite rock mass
initial values which are improper for the K-means algorithm were chosen. 3.2 Application to a Real Data Set The new algorithm was further applied to a discontinuity data set collected from a real survey at the dam site of the Songta hydropower station, which is located on the upper reaches of the Nu River, in the southwest of China. The rock mass in this area is mainly composed of biotite granite from the late Yanshanian (Cretaceous) period. Data collection was undertaken in an adit of PD271 with an elevation of 1,732 m on the left bank of the dam site. The strike of the adit is approximately E–W, which is perpendicular to the flow direction of the Nu River. The size of the adit is 200 m long, with a cross section 2 m wide and 2 m high. A total of 242 discontinuities were sampled using the window sampling method, and the orientations were obtained by means of compass measurements. The map of the two-dimensional discontinuity traces is shown in Fig. 6. Figure 7 shows an equal-area, higher hemisphere projection of the pole vectors of the 242 discontinuities, together with a contoured stereonet of pole density. The contoured stereonet of pole density could help a well-trained expert to decide on the number of clusters. In Fig. 7, there is one area with a very high to high density and two areas with medium
100° 2m 0
40
80
120
160
200
123
382
Y. Li et al.
to low densities, and the boundaries between clusters are unclear. From expert experience, it seems reasonable to partition the discontinuities into three sets. A validity measure proposed by Ray and Turi (1999) was used as an assistant tool to determine the number of clusters K. The Ray–Turi index is described as follows: Table 4 Validity measure for clustering of the real data with different numbers of sets K
2
3
4
5
Ray–Turi index
0.4004
0.3520
0.5630
0.6254
v¼
PK P j¼1
e2Sj
dðe cj Þ
N
E
E
W
S
S
(a)
(b)
N
N
W
ð11Þ
mini6¼j dðci cj Þ
where n is the total number of discontinuities and cj is the center of the discontinuity set Sj. The ideal value of K is determined by the clustering which gives a minimum value for the validity measure. Table 4 shows that the Ray–Turi index achieves a minimum when the number of clusters K is three, which is in accordance with the contoured stereonet of pole density for the discontinuities. Therefore, the ideal number of discontinuity sets should be three. The results of the KPSO
N
W
1 n
E
W
E
S
S
(c)
(d)
Fig. 8 Comparison of the clustering results of different methods with three discontinuity sets. a The new method. b Shanley and Mahtab method. c Spectral clustering method. d Xu et al. method
123
K-means Algorithm Based on Particle Swarm Optimization
383
method with K = 3 discontinuity sets are shown in Figs. 8a and 9a and Table 5. To study the performance of the new method, the Shanley and Mahtab method was chosen as a benchmark. We also compared the clustering results of the new algorithm with those of the spectral clustering algorithm proposed by Jimenez-Rodriguez and Sitar. In addition, the Xu et al. method was also selected for comparison. The clustering results of the four methods with K = 3 discontinuity sets are presented in Figs. 8 and 9 and Table 5. The results show that the KPSO clustering algorithm performs well in identifying the discontinuity sets and provides partitions that agree well with the results of the other methods considered. The clustering results of the KPSO method are very similar to those of the spectral clustering algorithm and the Xu et al. method. However, there are some differences in the clustering results of the Shanley and Mahtab method and those of the other methods. Figure 8b shows that the Shanley and Mahtab method assigned some discontinuities with high dip angles to the discontinuity set with low dip angles (set 3). Therefore, the mean dip angle of set 3 is noticeably enlarged. As listed in Table 5, the mean dip angle of set 3 computed with the Shanley and Mahtab Fig. 9 Clustering results of different methods represented by two-dimensional discontinuity traces (red lines set 1; green lines set 2; purple lines set 3). a The new method. b Shanley and Mahtab method. c Spectral clustering method. d Xu et al. method (color figure online)
method is 30.5, which is larger than the angles computed with the spectral method (28.8), the Xu et al. method (28.5), and the new method (28.2). Since the mean orientation of a discontinuity set is an important parameter for rock mass characterization, this enlarged mean dip angle computed with the Shanley and Mahtab method may cause some problems in rock engineering design. The results indicate that the other three methods outperform the Shanley and Mahtab method in the clustering of the discontinuities. Figure 8 shows that the differences in the clustering results of the four methods lie mainly in the clustering of the poles on the boundaries between clusters. Such poles were extracted from the entire data set and are presented in Fig. 10. Considering that there is no ‘‘ground truth’’, it is difficult to decide which of the partitions is more valid (Klose et al. 2005). We analyzed the results of each method for the clustering of the poles shown in Fig. 10 and calculated the distance between these poles and the cluster centers to determine whether these poles were assigned to the cluster to which they are closest. The results are presented in Table 6. As shown in Table 6, for the clustering results of the new method, there is only one pole which was not assigned to its nearest cluster; for the results of the Shanley
100° 2m 0
80
40
120
160
200 (m)
120
160
200 (m)
(a) 100° 2m 0
80
40
(b) 100° 2m 0
80
40
160
120
200 (m)
(c) 100° 2m 0
80
40
160
120
200 (m)
(d) Table 5 Clustering results of the real data set using different methods Set
The new method Dip direction/dip angle
Numbers
Shanley and Mahtab method
Spectral clustering method
Xu et al. method
Dip direction/dip angle
Dip direction/dip angle
Dip direction/dip angle
Numbers
Numbers
Numbers
1 (?)
332.7/85.4
63
327.7/89.4
63
331.5/86.1
55
330.1/87.3
2 (m)
65.4/86.9
71
58.6/82.7
55
64.0/87.8
77
64.2/85.7
62 67
3 (d)
295.2/28.2
108
293.1/30.5
124
298.0/28.8
110
295.6/28.5
113
123
384
Y. Li et al.
and Mahtab method, the spectral method, and the Xu et al. method, there are eight, six, and two poles which were not assigned to their nearest cluster, respectively. Thus, the results show that the new method performed slightly better than the other methods in successfully assigning the discontinuities to their nearest cluster. 4 Conclusions The K-means algorithm that is widely used in the field of data mining and pattern recognition provides a simple yet very powerful tool for the identification of rock discontinuity sets. N
4 5
3 1 2
6 W
7 8 9 10 12 11 13
E
28 29 27 26 25 24
14
15 18 16 17 20 19 22 21 23 S
Fig. 10 Discontinuity poles that were assigned to different clusters using different methods
Table 6 Evaluation of the assignment of the poles using different methods
The new method
Shanley and Mahtab method
Spectral clustering method
Xu et al. method A ‘‘H’’ means that a pole was assigned to its nearest cluster center and a ‘‘9’’ indicates that it was not
123
As it is sensitive to the initial cluster centers, the K-means algorithm suffers from difficulty in choosing proper initial clustering centers. Besides, it is a local optimum method and the clustering results of the K-means algorithm obtained by minimizing a cost function are not guaranteed to be optimal. This paper presents a K-means algorithm based on particle swarm optimization (KPSO) method for the identification of discontinuity sets based on orientations. In the KPSO method, the acute angle between discontinuity unit normal vectors was chosen as a distance measure and the eigenanalysis method was used to calculate the centers of discontinuity sets. For an example case with an artificial data set, the KPSO method performed well in identifying the discontinuity sets, providing partitions that agreed well with the natural clusters defined in advance. Compared with the clustering results of the K-means algorithm, an improvement is demonstrated. To study the performance of the proposed method, the Shanley and Mahtab method, the spectral clustering algorithm, and the Xu et al. method were chosen as benchmarks. By applying these methods to a discontinuity data set collected from a real survey, it is found that the clustering results of the KPSO method match well with the partitions provided by the other methods. We analyzed the results of each method for the clustering of poles on the boundaries between clusters to determine whether these poles were assigned to their nearest cluster. The results show that the new method performed slightly better than the other three methods in successfully assigning the discontinuities to their nearest cluster. One major advantage of the new method is that it is not sensitive to the initial cluster centers, which overcomes the difficulty in choosing proper initial cluster centers. Moreover, it is a global optimal method.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
H
H
H
H
H
H
H
H
H
H
H
H
H
3
H
16
17
18
19
20
21
22
23
24
25
26
27
28
29
H
H
H
H
H
H
H
H
H
H
H
H
H
H
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
H
H
3
3
H
H
H
H
H
3
3
H
H
H
H
16
17
18
19
20
21
22
23
24
25
26
27
28
29
3
H
H
3
H
H
H
H
H
H
H
H
3
3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
H
H
H
H
3
H
H
H
H
H
H
H
3
3
H
16
17
18
19
20
21
22
23
24
25
26
27
28
29
H 1
3 2
3 3
H 4
3 5
H 6
H 7
H 8
H 9
H 10
H 11
H 12
H 13
H 14
15
H
H
H
H
3
H
H
H
H
H
H
H
H
3
H
16
17
18
19
20
21
22
23
24
25
26
27
28
29
H
H
H
H
H
H
H
H
H
H
H
H
H
H
K-means Algorithm Based on Particle Swarm Optimization Acknowledgments This work was supported by the State Key Program of National Natural Science of China (Grant No. 41330636), 2010 non-profit scientific special research funds of the Ministry of Water Resources (Grant No. 201001008), the Doctoral Program Foundation of Higher Education of China (Grant No. 20090061110054), and Jilin University’s 985 project (Grant No. 450070021107). The authors are very grateful to the editor and the anonymous reviewers for their thorough and constructive reviews.
References Dershowitz W, Busse R, Geier J, Uchida M (1996) A stochastic approach for fracture set definition. In: Aubertin M, Hassani F, Mitri H (eds) Proceedings of the 2nd North American rock mechanics symposium (NARMS ’96), rock mechanics: tools and techniques, Montreal, Quebec, June 1996, pp 1809–1813 Hammah RE, Curran JH (1998) Fuzzy cluster algorithm for the automatic identification of joint sets. Int J Rock Mech Min Sci 35(7):889–905. doi:10.1016/S0148-9062(98)00011-4 Hammah RE, Curran JH (1999) On distance measures for the fuzzy K-means algorithm for joint data. Rock Mech Rock Eng 32(1):1–27. doi:10.1007/s006030050041 Harrison JP (1992) Fuzzy objective functions applied to the analysis of discontinuity orientation data. In: Hudson JA (ed) Rock characterization. Proceedings of the international society of rock mechanics (ISRM) symposium, Eurock’92, Chester, UK, September 1992. Thomas Telford, British Geotechnical Society, London, pp 25–30 Jimenez R (2008) Fuzzy spectral clustering for identification of rock discontinuity sets. Rock Mech Rock Eng 41:929–939. doi:10. 1007/s00603-007-0155-6 Jimenez-Rodriguez R, Sitar N (2006) A spectral method for clustering of rock discontinuity sets. Int J Rock Mech Min Sci 43:1052–1061. doi:10.1016/j.ijrmms.2006.02.003 Jin X, Liang YQ, Tian DP, Zhuang FZ (2013) Particle swarm optimization using dimension selection methods. Appl Math Comput 219:5185–5197. doi:10.1016/j.amc.2012.11.020 Kalyani S, Swarup KS (2011) Particle swarm optimization based Kmeans clustering approach for security assessment in power systems. Expert Syst Appl 38:10839–10846. doi:10.1016/j.eswa. 2011.02.086 Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, Perth, Australia, November/December 1995, pp 1942–1948 Kennedy J, Eberhart RC, Shi Y (2001) Swarm intelligence. Morgan Kaufmann, San Francisco Klose CD, Seo S, Obermayer K (2005) A new clustering approach for partitioning directional data. Int J Rock Mech Min Sci 42:315–321. doi:10.1016/j.ijrmms.2004.08.011 Kulatilake PHSW, Wu TH (1984) Estimation of mean trace length of discontinuities. Rock Mech Rock Eng 17:215–232. doi:10.1007/ BF01032335 Lin YC, Tong N, Shi MJ, Fan KD, Yuan D, Qu LC, Fu Q (2012) Kmeans optimization clustering algorithm based on particle swarm optimization and multiclass merging. In: Jin D, Lin S (eds) Advances in computer science and information engineering. Springer, Berlin, vol 168, pp 569–578. doi:10.1007/978-3642-30126-1_90 Mahtab MA, Yegulalp TM (1982) A rejection criterion for definition of clusters in orientation data. In: Goodman RE, Heuze FE (eds)
385 Proceedings of the 23rd US symposium rock mechanics, Berkeley, California, August 1982, pp 116–123 Markland J (1974) The analysis of principal components of orientation data. Int J Rock Mech Min Sci Geomech Abstr 11(5):157–163. doi:10.1016/0148-9062(74)90882-1 Markovaara-Koivisto M, Laine E (2012) MATLAB script for analyzing and visualizing scanline data. Comput Geosci 40:185–193. doi:10.1016/j.cageo.2011.07.010 Min K-B, Jing L, Stephansson O (2004) Determining the equivalent permeability tensor for fractured rock masses using a stochastic REV approach: method and application to the field data from Sellafield, UK. Hydrogeol J 12(5):497–510. doi:10.1007/s10040004-0331-7 Parsopoulos KE, Vrahatis MN (2007) Parameter selection and adaptation in unified particle swarm optimization. Math Comput Model 46:198–213. doi:10.1016/j.mcm.2006.12.019 Pecher A (1989) SchmidtMac—a program to display and analyze directional data. Comput Geosci 15(8):1315–1326. doi:10.1016/ 0098-3004(89)90095-2 Ray S, Turi RH (1999) Determination of number of clusters in Kmeans clustering and application in colour image segmentation. In: Pal NR, De AK, Das J (eds) Proceedings of the 4th international conference on advances in pattern recognition and digital techniques (ICAPRDT’99), Calcutta, India, December 1999. Narosa Publishing House, New Delhi, pp 137–143 Schmidt W (1925) XXIII. Gefu¨gestatistik. Tschermaks Mineralogische und Petrographische Mitteilungen 38:392–423. doi:10. 1007/BF02993943 Shanley RJ, Mahtab MA (1976) Delineation and analysis of clusters in orientation data. J Int Assoc Math Geol 8(1):9–23. doi:10. 1007/BF01039681 Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: Proceedings of the IEEE international conference on evolutionary computers, Anchorage, Alaska, May 1998, pp 69–73. doi:10. 1109/ICEC.1998.699146 Steinley D, Brusco MJ (2007) Initializing K-means batch clustering: a critical evaluation of several techniques. J Classif 24(1):99–121. doi:10.1007/s00357-007-0003-0 Tokhmechi B, Memarian H, Moshiri B, Rasouli A, Noubari HA (2011) Investigating the validity of conventional joint set clustering methods. Eng Geol 118:75–81. doi:10.1016/j. enggeo.2011.01.002 Xu C, Dowd P (2010) A new computer code for discrete fracture network modeling. Comput Geosci 36:292–301. doi:10.1016/j. cageo.2009.05.012 Xu LM, Chen JP, Wang Q, Zhou FJ (2013) Fuzzy C-means cluster analysis based on mutative scale chaos optimization algorithm for the grouping of discontinuity sets. Rock Mech Rock Eng 46:189–198. doi:10.1007/s00603-012-0244-z Yagiz S, Karahan H (2011) Prediction of hard rock TBM penetration rate using particle swarm optimization. Int J Rock Mech Min Sci 48:427–433. doi:10.1016/j.ijrmms.2011.02.013 Zhang W, Chen JP, Liu C, Huang R, Li M, Zhang Y (2012) Determination of geometrical and structural representative volume elements at the Baihetan dam site. Rock Mech Rock Eng 45:409–419. doi:10.1007/s00603-011-0191-0 Zhou W, Maerz NH (2002) Implementation of multivariate clustering methods for characterizing discontinuities data from scanlines and oriented boreholes. Comput Geosci 28:827–839. doi:10. 1016/S0098-3004(01)00111-X
123