KAIST Department of Computer Science

A Wireless Data Clustering Method for Partial Match Queries Ji Yeon Lee, Yon Dohn Chung, Yoon Joon Lee and Myoung Ho Kim CS/TR-2000-155 June 2000

KAIST Department of Computer Science

Abstract This paper proposes a broadcast data clustering method for partial match queries in mobile distributed systems. An eective broadcast data clustering method enables mobile clients to access the data in short latency. Our method utilizes the properties of the Gray coding scheme - the Gray codewords have high locality. We describe the way the Gray code method eectively clusters wireless data for partial match queries. And we analyze and evaluate the performance of the Gray code clustering method through comparison with other methods. Keywords: data clustering, partial match query, Gray code, wireless data broadcasting, mobile computing

1 Introduction There are two important parameters related to wireless data broadcasting: access time and tuning time. The access time is the amount of elapsed time from the point of a query submitted by a client to the receipt of the required data through the broadcast channel. The tuning time is the amount of time spent by a client listening to the channel. There have been some studies on reducing the access time such as caching and nonuniform broadcasting [1, 12]. Studies on reducing the tuning time such as indexing and hashing have also been made in the past [3, 6]. Most of these studies consider that a query quali cation for required data objects is based on a single attribute. In this paper we focus on eective clustering of wireless data for partial match queries [8, 2, 9] that have been widely used in various applications. In partial match queries, data objects are speci ed by more than one attribute. The eective clustering of wireless data enables the mobile clients to access the data in a short access time. We propose a wireless data clustering method that utilizes some properties of the Gray code. The Gray code provides a sequence of codewords that has high locality. For the use of Gray codes, we represent data records as bit-vectors using multiattribute hashing. After describing the method, we analyze and evaluate the method by experiments. The rest of the paper is organized as follows. In Section 2 we introduce the wireless data clustering and explain some properties of wireless data and the measure Query Distance (QD). We propose a clustering method for partial match queries in Section 3. In the method we use the multiattribute hashing for the representation of data and query signatures. We analyze the proposed method in Section 4, and discuss on the ordering of attributes 1

for signature representation in Section 5. In Section 6 we describe the experimental results. In the last section we conclude this paper with future work.

2 Preliminaries The clustering of wireless data is dierent from that of disk-resident data. The clustering of wireless broadcast data is to nd a broadcast schedule (i.e., the sequence of data broadcasting) such that mobile clients can access the data on the air in short latency. The access time of a mobile query is determined by the duration from the query start time to the time of all required data records being downloaded. Figure 1 illustrates a wireless data broadcasting example, where ve data records R1 R5 are broadcasted based on the broadcasting schedule =< R1; R2; R3; R4; R5 >. If a mobile client accesses the data fR1; R4g1 at the time of R2 in the gure, then the client completes its data retrieval at the end of R1 in the next bcast, since the data record R1 in the current bcast is passed over when the query starts. If the query issues before R1 is broadcasted, then the query accesses both R1 and R4 in the current bcast. Since, as the gure shows, the access time of a mobile query depends on its start time, it is dicult to use the access time measure for developing clustering methods. Therefore, we use a new clustering measure named Query Distance (QD) that was proposed in our previous work [4]. The Query Distance of a query represents the coherence degree of the data that the query accesses, and is independent of the query's start time.

De nition 1 Let qi be a query that accesses data records fRi1; Ri2; ; Rin g i

1 In the paper we assume there is no precedence relationship in accessing the data.

2

current bcast

R1 R2 R3 R4

next bcast

R5

R1 R2 R3 R4

R5

access time query start time

completion

time

Figure 1: Query Processing on the Wireless Broadcast Data

R ini

i δ ni

R i1

i δ1

R i2

i δ2

R i3

R ini ...

i δ ni

R i1

time Figure 2: Placement of the set of Data of a Query (ni is the number of data records that qi accesses), B be the size of one bcast and ji be the interval between two data records Rij and Rij +1 in the given schedule . The QD of query qi on schedule is: (See Figure 2.)

QD (qi ) = B ? MAX (ki ); k = 1 ni :

De nition 2 Given two schedules 1 and 2, if QD1 (qi) is equal to QD2 (qi) for all queries qi , then 1 is distance-equivalent to 2, denoted by 1 2 . Property 1 Suppose that QDS (qi) is the set of data that query qi accesses,

N is the total number of data records that are broadcasted in a single bcast 3

and there are two schedules organized as follows:

1 = < R1; R2; ; Ri?1; Ri; Ri+1; ; Rj?1; Rj ; Rj+1; ; RN ?1; RN > 2 = < R1; R2; ; Ri?1; Rj ; Ri+1; ; Rj?1; Ri; Rj+1; ; RN ?1; RN > : Then, the following property holds.

QD1 (qi ) = QD2 (qi), if Ri ; Rj 2 QDS (qi ):

Property 2 If a schedule 2 is the mirror image of 1 as in the below, then 1 is distance-equivalent to 2 i.e., 1 2 . 1 = < R1; R2; ; Ri?1; Ri; Ri+1; ; Rj?1; Rj ; Rj+1; ; RN ?1; RN > 2 = < RN ; RN ?1; ; Rj+1; Rj ; Rj?1; ; Ri+1; Ri; Ri?1; ; R2; R1 > :

3 The Gray Code Clustering Method 3.1 Gray Codes The Gray coding scheme is one of the linear mapping schemes for multidimensional space. In the binary re ected Gray code, numbers are coded into binary strings such that successive strings dier in exactly one bit position. It is observed that the dierence in only one bit position has a relationship with locality [5, 7]. Figure 3 illustrates 3-bit binary re ected Gray codes with corresponding binary codes. From now on, we use the term \Gray code" for \binary re ected Gray code" in this paper. The Gray value of a binary string is the order (or position) of the binary string in the Gray code. For instance, the Gray value of `110' is 4, which is equal to the binary value of `100.' In the gure the column `decimal' represents the Gray and binary value of each codeword. The conversion of a Gray codeword to its order (i.e, Gray value) is described in [10]. 4

decimal Gray code binary code 0 000 000 1 001 001 2 011 010 3 010 011 4 110 100 5 111 101 6 101 110 7 100 111 Figure 3: Illustration of the 3-bit binary re ected Gray code

3.2 Signature Representation In our method we represent data records and queries as data signatures and query signatures using multiattribute hashing. We use a running example for explaining how data records and queries are represented as signatures based on multiattribute hashing. Suppose there is a wireless information system that broadcasts stock price information in real time. Each data record has four attributes: A1 (company), A2 (amount of sale), A3 (amount of purchase) and A4 (price: $). For each attribute, we assume the following simple hash functions.

8 >> 00 >< 01 hcompany (x) = > >> 10 : 11

if x is a nancial company if x is a manufacturing company if x is a computer and communication company otherwise

5

8 >> 00 if x < 10000 >< 01 if 10000 x < 20000 hamount of sale (x) = > >> 10 if 20000 x < 30000 : 11 if 30000 x 8 >> 00 if x < 10000 >< 01 if 10000 x < 20000 hamount of purchase (x) = > >> 10 if 20000 x < 30000 : 11 if 30000 x 8 >> 000 if x < 5 >> >> 001 if 5 x < 10 >> 010 if 10 x < 20 >> < 011 if 20 x < 30 hprice (x) = > >> 100 if 30 x < 40 >> 101 if 40 x < 50 >> >> 110 if 50 x < 60 >: 111 if 60 x

A data signature is generated by concatenating the hashed li -bit vector of each attribute i. Therefore the signature becomes l-bit binary bit vector, P where l = ki=1 li (k is the number of attributes). Here, we assume that the attributes are arranged < A1 , A2 , A3 A4 > in this order, and l1; l2; l3 = 2 and l4 = 3. We will discuss on the arrangement of attributes and the bit allocation in Section 5. Based on the above hashing scheme, we can represent a data record fcompany = `New York Bank', amount of sale = `13000', amount of purchase = `2000', price = `22'g as \010100011". Similarly, we can represent a partial match query famount of sale `30000', `10' price < `20' g as \**11**010", where `*' denotes a don't-care condition. 6

3.3 The Clustering Method The proposed method consists of two steps: 1) representing the data as data signatures based on the given hash functions, 2) sorting the data signatures based on their Gray values. When there are seven data records R1 R7 (in the following table), and the attributes and hash functions are the same as in Section 3.2, the clustering method works as follows. id

R1 R2 R3 R4 R5 R6 R7

company

amount of amount of price record Gray binary sale purchase ($) signature value value New York Bank 5000 35000 69 000011111 21 31 LA Broadcasting 15000 25000 49 110110101 294 437 San Jose Computer 25000 15000 39 101001100 392 332 Dallas Machine 35000 5000 29 011100011 190 227 Texas Electronics 500 45000 19 010011010 236 154 Miami Industry 8000 31000 4 010011000 239 152 Hawaii Comm. 24000 19000 32 101001100 392 332

In the rst step of the proposed method, we represent the data record as record signature, which is described in the `record signature' column of the table. In the second step, we sort the data based on their Gray values. As a result, we get the following schedules. Gray is the result of the Gray coding method (GCM) and binary is the result of the binary coding method (BCM). The binary coding method is the method which sorts the data signatures based on binary values.

Gray =< R1; R4; R5; R6; R2; R3; R7 >; binary =< R1 ; R6; R5; R4; R3; R7; R2 > In the sorting step, the data records of the same Gray value or binary value are mutually interchangeable based on Property 1. (In this example R3 and R7 has the same Gray value.) Also, because of Property 2, it does not 7

matter whether the set of data is sorted in an increasing order or decreasing one. In the paper we sort the data in a non-decreasing order.

4 Analysis First, we brie y show the clustering eects of the Gray coding scheme. Figure 4 illustrates the 3-bit Gray and binary codewords, where the data p records satis ed by partial match query \1" are marked with ` ' symbols. We can nd that the Gray code clusters the data more eectively than the binary one i.e., QDGray (\ 1 ") = 4 and QDbinary (\ 1 ") = 6. decimal Gray code 0 000 1 001 2 011 3 010 4 110 5 111 6 101 7 100

p p p p

binary code 000 001 010 011 100 101 110 111

p p p p

Figure 4: QD of Partial Match Query \1" Now, we analyze the Query Distance of a partial match query when the Gray and binary clustering methods are used. For convenience, we assume that each record signature corresponds to exactly one data record and vice versa. That is, if the number of bits is n, then it is assumed that there are 2n data records each of which has a unique signature representation. In the BCM, the QD of a query which is denoted by n-bit query signature 8

Notation Meaning n the number of bits in a query signature ~q a n-bit query signature; ~q = (bn; bn?1; ; b1) the value of the i-th bit in ~q; bi is one of `0', `1', or `' bi l the location of a bit speci ed by `0' or `1' in ~q the location of the left most bit (in ~q) un that has the value `' mlb the location of the left most bit (in ~q) that has the value `0' or `1' QDB (~q) the Query Distance of ~q on the binary code QDG (~q) the Query Distance of ~q on the Gray code Figure 5: Notations

~q is as follows: QDB (~q) = 2n ?

X specfied bit locations l

2l?1

In the GCM, the QD of a query which is denoted by n-bit query signature ~q is as follows:

QDG (~q) = 2n ?

8 2 ?1 >> >> 2 >> 2 ?1 >> 0 >< 2 (l) = 2 >> 0 >> 0 >> >> 2 >: 2 0 n l

l

l

l

l

l

X

specfied bit locations l

(l)

if l = n if l = mlb and l 6= n if l 6= mlb and l > un if l 6= mlb, l < un, bl = 0 and bl+1 = `' if l 6= mlb, l < un, bl = 1 and bl+1 = `' if l 6= mlb, l < un, bl = 0, l+1 = mlb and l+1 6= n if l 6= mlb, l < un, bl = 1, l+1 = mlb and l+1 6= n if l 6= mlb, l < un, bl = 0, l+1 6= mlb and (l + 1) = 0 if l 6= mlb, l < un, bl = 0, l+1 6= mlb and (l + 1) = 2l+1 if l 6= mlb, l < un, bl = 1, l+1 6= mlb and (l + 1) = 0 if l 6= mlb, l < un, bl = 1, l+1 6= mlb and (l + 1) = 2l+1

9

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)

Example 1 Suppose that the number of bits (n) is 5 and a partial match query has the form of `010'. Then the QDB and QDG of the query are as follows.

QDB (~q = (; 0; ; 1; 0)) = 25 ? QDG (~q = (; 0; ; 1; 0)) = 25 ?

X l=1;2;4

X

l=1;2;4

2l?1 = 32 ? (8 + 2 + 1) = 21

(l) = 32 ? (16 + 4 + 2) = 10

2

By using the formulas QDB and QDG , we analyze the average QD gain of the GCM over the BCM (AQDgain (n)) as follows: (For details, see Appendix 1.) n?1 ? 1) AQDgain (n) = 2(6 5(3n ? 2n )

5 The Optimal Attribute Ordering In the previous sections the attributes are assumed to be referenced with equal probabilities. However, if the reference probability of each attribute is not the same, then we have to consider the following two issues in the signature representation of the data record. 1. the number of bits for each attribute 2. the order of attributes in the record signature. The rst issue was studied in [2], and hence we can use its result without much modi cation. However, the second issue has not been studied in the past. Although [11] studied the ordering of attributes in the AG (Attribute Gray) encoding scheme, the work is for the disk-based environment. Thus, in this section, we propose a method that eectively nds the ordering of the attributes for the wireless environment. 10

The reference probability of attribute Ai is denoted by ref (Ai). We assume an attribute referenced when the value of the attribute is speci ed `0' or `1'. Thus, the probability that the value of attribute Ai is `' is 1 ? ref (Ai ). We use the the weighted sum of QD (WSQD) for the criteria of P the good attribute ordering. The WSQD is de ned as 8~q prob(~q) QD(~q). Q Here, prob(~q) is de ned as i (Ai ; ~q), where (Ai ; ~q) is 1=2 ref (Ai ) if the value of Ai in ~q is speci ed `0' or `1', and otherwise 1 ? ref (Ai ), In the de nition of WSQD, QD(~q) is the Query Distance on the given schedule i.e., it is QDG (~q) in the GCM, QDB (~q) in the BCM, and so on. For generality, we assume that each attribute is represented by one bit in the signature i.e., an attribute Ai is represented as a bit bi in the signature. We also assume that attributes are independently speci ed. We show the eects of attribute ordering with an example. Suppose that the number of data records is 8 and each data record has three attributes A1 , A2 and A3. The reference probability of each attribute is as follows: ref (A1 ) = 0:2, ref (A2 ) = 0:4 and ref (A3) = 0:7. So, in representing the data signature, 6 (i.e., 3!) orderings are possible. Figure 6 shows the WSQD of 6 orderings in the GCM and BCM. Now we describe the optimal arrangement (i.e., ordering) of attributes when using the GCM and BCM. An attribute ordering is optimal if its WSQD is minimum among all possible orderings.

Theorem 1 Let < A[n]; A[n?1]; ; A[2]; A[1] > be an arrangement of

attributes for n-bit signature representation. In the BCM, an arrangement of attributes is optimal if the following condition holds.

ref (A[i+1] ) ref (A[i] ); 1 i n ? 1

Proof: See Appendix 2.

2 11

Ordering

< A 1 ; A2 ; A3 > < A 1 ; A3 ; A2 > < A 2 ; A1 ; A3 > < A 2 ; A3 ; A1 > < A 3 ; A1 ; A2 > < A 3 ; A2 ; A1 >

WSQD in the GCM WSQD in the BCM 4.668 5.644 4.128 5.344 4.668 5.244 3.868 4.744 4.128 4.344 3.868 4.144

Figure 6: WSQD of Dierent Attribute Orderings: An Example

Theorem 2 Let < A[n]; A[n?1]; ; A[2]; A[1] > be an arrangement of

attributes for n-bit signature representation. In the GCM, an arrangement of attributes is optimal if ref (A[i]) ref (A[i?1]) (2 i n ? 1) and ref (A[j+1] ) ref (A[n] ) ref (A[j]) where j is determined by the following condition:

((3n?1 ? 2n?1 ) 2j + 6j ) ref (A[j +1] ) ((3n?1 ? 2n?1) 2n?1 ) ref (A[n]) ((3n?1 ? 2n?1) 2j?1 + 6j?1) ref (A[j])

Proof: See Appendix 3.

2

6 Experimental Results In this section we describe our two experiments and their results. The rst experiment veri es the correctness of formulas in Section 4, and the second experiment compares the optimal attribute ordering (in Section 5) with other orderings. 12

n 2 3 4 5 6 7 8 9 10 11 12 13

GCM 2.40 3.58 5.91 10.45 19.34 36.90 71.69 140.84 278.55 553.18 1101.42 2196.48

BCM 2.80 4.31 7.23 12.90 24.02 45.97 89.45 175.88 348.02 691.31 1376.60 2745.44

RCM OPT 2.72 2.40 4.19 3.53 7.97 15.74 33.40 70.16 151.43 323.78 686.15 1446.32 3032.40 6317.40 -

Figure 7: AQD of all partial match queries Figure 7 shows the result of our rst experiment. In the experiment we compare four clustering methods: the Gray code clustering method (GCM), the binary code clustering method (BCM), a random clustering method (RCM) and the exhaustive search (i.e., optimal) method (OPT). By the result we can verify our analysis formulas described previously. (For instance, avg: (5) ? QDavg: (5) when the number of bits n is 5, AQDgain (5) = QDbinary Gray = 12.90 - 10.45 = 2.45.) In addition, although our experiment for the exhaustive search method was done only for small numbers of bits due to the high computational complexity2 , we nd the performance of the proposed method is close to that of the optimal method. Figure 8 shows the QD reduction ratios of the GCM and BCM compared 2 The complexity of n-bit experiment is O(NlogN), where N = (2n )!.

13

70 Gray Method Binary Method 60

TSQD Reduction Ratio (%)

50

40

30

20

10

4

6

8 Number of Bits

10

12

Figure 8: QD reduction ratios of GCM and BCM with the RCM. With the increase of the number of bits, the proposed method signi cantly reduces the QD compared with the RCM. 80 GCM BCM 75

70

WSQD Reduction Ratio (%)

65

60

55

50

45

40

35

30 4

5

6

7 Number of Bits

8

9

10

Figure 9: Comparison of the Optimal and Worst Attribute Ordering In the second experiment, we evaluate the eects of attribute ordering for signature representation. For the reference probability of each attribute, we assume a normal distribution. Figure 9 shows the result of our experiment. The measurement is the WSQD reduction ratio of the optimal attribute arrangement compared with the worst attribute arrangement i.e., the reverse 14

ordering of Theorem 1 and 2.

7 Conclusion Most of the previous work on wireless data broadcasting considers the data retrieval based on a single attribute, such as the single-key indexing and the data scheduling for a single attribute. However, our work considers the multi-key data retrieval environment, especially the data clustering for partial match queries. In this paper we proposed a wireless data clustering method for partial match queries by using the Gray code concept. In the method we represent the data record (and the query) as the record (and the query) signature through multiattribute hashing. The method clusters the data based on their Gray values. We showed the clustering characteristics of the Gray code for wireless data and evaluated the performance in analytic and experimental ways. We also proposed an optimal attribute ordering strategy for signature representation. For future work, we will investigate the clustering method for range queries and the multi-key index structure.

15

References [1] S. Acharya, R. Alonso, M. Franklin, and S. Zdonik. \Broadcast Disks : Data Management for Asymmetric Communication Environments". In Proceedings of ACM SIGMOD Conference, pages 199{210, 1995. [2] A. V. Aho and J. D. Ullman. \Optimal Partial Match Retrieval when Fields are Independently Speci ed". \ACM Transactions on Database Systems", 4(2):168{179, 1979. [3] Y. D. Chung and M. H. Kim. \An Index Replication Scheme for Wireless Data Broadcasting". Journal of Systems and Software, 51:191{199, 2000. [4] Y. D. Chung and M. H. Kim. \Eective Data Placement for Wireless Broadcast". Distributed and Parallel Databases, (to be published). [5] C. Faloutsos. \Multiattribute Hashing Using Gray Codes". In Proceedings of ACM SIGMOD Conference, pages 227{238, 1986. [6] T. Imielinski, S. Viswanathan, and B. R. Badrinath. \Data on Air : Organization and Access". IEEE Transactions on Knowledge and Data Engineering, 9(3), 1997. [7] H. V. Jagadish. \Linear Clustering of Objects with Multiple Attributes". In Proceedings of ACM SIGMOD Conference, pages 332{342, 1990. [8] M. H. Kim and S. Pramanik. \Optimal File Distribution For Partial Match Retrieval". In Proceedings of ACM SIGMOD Conference, pages 173{182, 1988. 16

[9] S. Moran. \On the Complexity of Designing Optimal Partial-Match Retrieval Systems". \ACM Transactions on Database Systems", 8(4):543{ 551, 1983. [10] E. M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice Hall Inc., 1977. [11] D. Rotem. \Clustered Multiattribute Hash Files". In Proceedings of ACM PODS Conference, pages 225{234, 1989. [12] C. Su, L. Tassiulas, and V. J. Tsotras. \Broadcast Scheduling for Information Distribution". Wireless Networks, 1998.

17

Appendix 1: Derivation of AQDgain (n) We describe some notations for deriving AQDgain(n). TSQDB (n) is the total sum of the QD of all n-bit partial match queries on the binary code. TSQDG (n) is the total sum of the QD of all n-bit partial match queries on the Gray code. And, TSQDgain (n) is TSQDB (n) ? TSQDG (n). In the formulas for (l) (in Section 4), the equations `(4) and (5)', `(6) and (7)', `(8) and (9)', and `(10) and (11)' are formed in a pairwise way. Thus, we can compute the dierence between TSQDB (n) and TSQDG (n) by summing up the cases of Equation (2). In consequence, the TSQDgain(n) is computed as follows.

TSQDgain (n) = TSQDB(n) ? TSQDG(n) = =

nX ?1

nX ?1

i=1

i=1

((2 3i?1 ) 2i?1 ) =

X 8~q

(QDB (~q) ? QDG (~q))

2i 3i?1

The AQDgain (n) is TSQDgain(n) divided by the number of all n-bit partial match queries, and there are 3n ? 2n partial match queries in the n-bit signature. Thus, AQDgain (n) is computed as follows: Pn?1 2i 3i?1 2(6n?1 ? 1) TSQD ( n ) gain AQDgain (n) = 3n ? 2n = i=13n ? 2n = 5(3n ? 2n)

Appendix 2: Proof of Theorem 1 P

In the BCM, the QD of a n-bit query ~q is 2n ? l 2l?1 , where l the locations of speci ed bits. Because the QD is determined only by speci ed bit values (i.e., `0' and `1'), we can compute the TSQDB as follows: iX =n n n n n ? 1 n ? 1 TSQDB(n) = (3 ? 2 ) 2 ? 2(3 ? 2 ) 2i?1 i=1

18

By applying the reference probability of each attribute, we can compute the WSQDB as follows:

WSQDB(n) = (3n ? 2n) 2n ? (3n?1 ? 2n?1 )

iX =n

2i?1 ref (A[i] )

i=1 Since the terms `(3n ? 2n ) 2n ' and `(3n?1 ? 2n?1 )' are positive conP stants, we consider on maximizing the term ` ii==1n 2i?1 ref (A[i])', which is described as 2n?1 ref (A[n] )+2n?2 ref (A[n?1] )+ +21 ref (A[2] )+

ref (A[1]):

Therefore, based on the greedy philosophy, we can see the above formula is maximized when ref (A[i+1] ) ref (A[i] ) for all i, 1 i n ? 1.

Appendix 3: Proof of Theorem 2 Since the formula QDG ~q (in Section 4) is complex, we compute the TSQDG(n) as follows:

TSQDG(n) = TSQDB (n) ? TSQDgain(n) = (3n ? 2n ) 2n ? 2(3n?1 ? 2n?1 )

iX =n i=1

2i?1 ?

i=X n?1 i=1

2i 3i?1

Thus, by adding the probability factor in the above formula, we compute the WSQDG (n) as follows:

WSQDG(n) = (3n ? 2n) 2n ? ((3n?1 ? 2n?1 ) n?1 1 i=X

i=n X i=1

2i?1 ref (A[i] )

2i 3i?1 ref (A[i]) ? 2 i=1 n n n Since `(3 ? 2 ) 2 ' is a positive constant, we consider on maximizing

the sum of the other two terms:

i=X n?1 iX =n 1 n ? 1 n ? 1 i ? 1 2i 3i?1 ref (A[i]) (3 ? 2 ) 2 ref (A[i]) + 2 i=1 i=1

19

= (3n?1 ? 2n?1 )

iX =n

2i?1 ref (A[i]) +

i=1 n ? 1 n ? 1 ((3 ? 2 ) 20 + 60 )ref (A

= [1]) + ((3n?1 ? 2n?1 ) 21 + 61 )ref (A[2]) + ((3n?1 ? 2n?1 ) 22 + 62 )ref (A[3])

i=X n?1 i=1

6i?1 ref (A[i])

+ ((3n?1 ? 2n?1 ) 2n?2 + 6n?2 ) ref (A[n?1] ) + (3n?1 ? 2n?1 ) 2n?1 ref (A[n] ) For arranging the coecients of ref (A[i]) in an increasing order, we seek the position of ref (A[n] ) such that ((3n?1 ? 2n?1 ) 2j + 6j ) ref (A[j +1] ) ((3n?1 ? 2n?1 ) 2n?1) ref (A[n]) ((3n?1 ? 2n?1 ) 2j?1 + 6j?1) ref (A[j])

(12) (13) (14)

Therefore, the above equation is maximized when the attributes are arranged as ref (A[i]) ref (A[i?1]) for i, 2 i n ? 1, and ref (A[j +1] ) ref (A[n] ) ref (A[j] ) where j is determined through the inequalities (12)(14). 2

20