European Journal of Scientific Research ISSN 1450-216X Vol. 86 No 2 September, 2012, pp.254-263 © EuroJournals Publishing, Inc. 2012 http://www.europeanjournalofscientificresearch.com
Efficient Cluster Based Privacy Preservation Data Perturbation Technique in Multi-Partitioned Datasets V. S. Prakash Assistant Professor, Department of Computer Applications Bannari Amman Institute of Technology Sathyamangalam – 638401, Tamil Nadu, India E-mail:
[email protected] Tel: 91 - 98426 10318; Fax: 91 – 4295 226666 A. Shanmugam Principal, Bannari Amman Institute of Technology Sathyamangalam – 638401, Tamil Nadu, India E-mail:
[email protected] Tel: 91 – 98422 17170 P. Murugesan Assistant Professor (Senior Grade), Department of Computer Applications Bannari Amman Institute of Technology Sathyamangalam – 638401, Tamil Nadu, India E-mail:
[email protected] Tel: 91 - 98947 91601 Abstract Multi-partitioned data includes both horizontal and vertical data sets which are recent stipulate of e-commerce and e-business data mining environment. In e-business data mining representation, privacy turns into a key concern in defending individual’s data on service / product transactions. Nevertheless the precision and revelation of the service / product enhance the amount of transaction to other new and offered clients. In multiparty data mining, users give their individual data sets and expect to mine an inclusive model supported on the pooled data set. How to proficiently extract a eminent model without violating each party’s privacy is the most important challenge. The previous work presented a combinatorial function for privacy preservation for multi-partitioned dataset but the scalability and authenticity of individual dataset is very low. To deal with data privacy and precision of individual’s data, data perturbation system is offered with corroboration and substantiation. Gaussian distribution model is suitable for data perturbation to conserve the secretive data of the individual’s. Data validity for distribution is offered with particular individuals beside with its authorized period of sharing. Nevertheless in the multipartitioned data distribution, data perturbation elevated uncertainty among horizontal and vertical partitions of the data. To conquer the uncertainty, we plan to commence divisive kneighbor clusters for multi-partitioned data sets to achieve the privacy preserved data. An experimental evaluation is carried out to estimate the performance of the proposed cluster based privacy preserving data perturbation technique (CPPDP) is evaluated with bench data sets obtained from popular e-business / e-commerce sites. (Amazon, e-bay etc.,) in terms of
Efficient Cluster Based Privacy Preservation Data Perturbation Technique in Multi-Partitioned Datasets
255
ratio between data privacy and transparency, data perturbed object clusters, adversary effect of accessing unauthenticated data. Keywords: Privacy Preservation, Multi-partitioned dataset, Data perturbation, Multipartitioned dataset, Gaussian model.
1. Introduction Privacy is appropriate an progressively more significant concern in many data-mining applications that compact with security, health care, behavioral, financial, and other types of receptive data. It is mainly appropriate significant in counterterrorism and home defense-related applications. These applications may need generating profiles, creating social system models, and noticing terrorist communications amongst others from confidentiality sensitive data. These techniques are regularly utilized in circumstances where individuals can agitate their private data with various identified arbitrary noise and account the agitated data to the data miner. Since the circulation of the added noise is recognized, the data miner could renovate the unique distribution using different geometric methods and excavate the renovated data. When we inspect the particulars of the perturbation techniques, commencing noise and rebuilding the unique division appear as the two most significant phases. During the noise calculation phase, arbitrary noise from a recognized allocation (e.g. Gaussian Noise with mean 0, variance r 2) is new to isolation sensitive data. At present, all of the existing noise calculation methods append the similar quantity of noise for all the individuals. The proper crisis is not data mining, but the method data mining is prepared. PPDM is a promising method in data mining where privacy and data mining could coexist. It provides the précis results with no any loss of privacy during data mining process. In common there are two major advances in PPDM: (i) Data alteration based (ii) Cryptographic-based methods. The data alteration based approach changes sensitive data in such a way that it drops its sensitive meaning .In these procedure geometric properties of concern can be preserved but precise values cannot be dogged through the mining process. Data perturbation has been named as a more proficient submission of data safety in health care than de-identification/re-identification remaining to the superior prospect that hits could receive which connect unrestricted data sets to unique identifiers or subjects. So data perturbation is named as a rigid request for electronic health care data security. The possibility division approach obtains the data and precedes it from the analogous distribution model or from the division itself. The value alteration advance excites data by multiplicative or preservative noise, or other randomized procedures. It is calculated to be more proficient than the preceding form of perturbation. This approach collects resolution tree classifiers where each constituent is detached arbitrary noise from the Gaussian distribution, for case. By data mining, the exclusive data distribution is reestablished from its disturbed version. Though, opponents’ summit to the truth that arbitrary preservative noise can be filtered which can outcome in privacy conciliations. Data perturbation is named a quite easy and proficient system in for shielding sensitive electronic data from illicit use. The previous work presented the combinatorial function to preserve the data sets which are to be shared among the users and it does not allow the third party members to seize the data but the scalability and authenticity of individual dataset is very low. In the multi-partitioned data distribution, data perturbation raised ambiguity between vertical and horizontal partitions of the data. In this work, we plan to introduce divisive k-neighbor clusters for multi-partitioned data sets.
256
V. S. Prakash, A. Shanmugam and P. Murugesan
2. Literature Reivew The major purpose of data mining is to mine formerly unidentified patterns from huge compilation of data. With the quick development in software, hardware and networking knowledge there is exceptional development in the quantity data collection. Organizations gather enormous volumes of data from assorted databases which also enclose responsive and confidential information about and individual. The paper [1] discovers the prospect of using multiplicative arbitrary protrusion matrices for confidentiality conserving distributed data mining. It particularly believes the crisis of calculating geometric aggregates like the central artifact matrix, association coefficient matrix, and Euclidean distance matrix from dispersed privacy receptive data probably possessed by numerous parties. In recent times, data perturbation schemes have happen to popular for privacy-preserving data mining [9] due to the comparatively low cost to organize them contrast to the cryptographic techniques. Privacy preserving data mining schemes provides novel trend to resolve this problem. PPDM provides suitable data mining outcome without learning the primary data values. In [2], proposed a structure that permits universal alteration of unique data using randomized data perturbation technique and the customized data is then presented as outcome of client’s query during cryptographic approach. Privacy-preserving data mining (PPDM) distress the crisis of implementing data mining tasks [4] without any straight admission to the unique data sets, since the providers maintain privacy on their data. A protocol [6] can be employed to sustain such investigates in a privacy-sensitive manner. The design is to resolve a radius enfolding (only) the top-k matches to a query. A preface formulation of gradient descent [7] and geometric data perturbation technique [8] are used for data privacy preservation in multi-party collaborative mining [10]. Privacy preservation has been extensive to contraption learning algorithms that can be employed for data mining also. A secure manner for privacy is to achieve genetic algorithm [11] for rule detection. The major confront is to permit two parties to firmly assess the fitness charge of each chromosome. They also lectured to privacy-preserving linear Fisher differentiate analysis [12] for two parties that searches for to divide dissimilar classes in so far as possible. In current years, clustering developed into one of the primary methods of huge dataset examination. In scrupulous, Clustering [5] is a vital constituent of real-time image firmness and utilization algorithms, such as segmentation of SAR, vector quantization, EO/IR, and group tracking, hyper-spectral imagery, and performance outline analysis. Clustering schemes are broadly utilized in data compression, pattern recognition, data mining, but the crisis of employing them in real-time systems has not been an attention of most algorithm designers. The perturbation method [3] has been widely considered for privacy preserving data mining. In this scheme, arbitrary noise from a notorious distribution is added to the seclusion receptive data before the data is drive to the data miner. Due to the accumulation of noise, defeat of information against conservation of privacy is constantly a trade off in the perturbation based approaches. In this we have implemented divisive k-means clustering technique for privacy preservation in multi-partitioned dataset.
3. Proposed Efficient Cluster Based Privacy Preservation Data Perturbation Technique in Multi-Partitioned Datasets The proposed work is efficiently designed for privacy preservation in multi-partitioned datasets by adapting the divisive k neighbor clusters. To evaluate a trade of between data privacy and transparency of individual’s data, data perturbation technique is presented with validation and authentication. The most commonly used data perturbation technique is Gaussian distribution model appropriated to preserve the private data of the individual’s. Data authenticity for sharing is provided with respective individuals along with its validated period of sharing. However in the multi-partitioned data distribution, data perturbation raised ambiguity between vertical and horizontal partitions of the data. The architecture diagram of the proposed cluster based privacy preservation data perturbation technique in multi-partitioned datasets [CPPDP] is shown in fig 3.1.
Efficient Cluster Based Privacy Preservation Data Perturbation Technique in Multi-Partitioned Datasets
257
Figure 3.1: Architecture diagram of the proposed CPPDP Partition the dataset horizontally and vertically
Dataset
Data perturbation technique
Gaussian distribution model
Ambiguity occurs in dataset
Divisive k neighbor clusters
Achieves privacy preservation for individual data
Multi-partitioned data set consists of data which have been divided from any logical database. The data has been partitioned into two types: horizontally data partition and vertically data partition. Horizontal partitioning engages setting diverse rows into diverse tables. A general outline of vertical partitioning is to divide dynamic data from static data in a table where the dynamic data is not used as often as the static. Generating a view for the two tables re-establish the unique table with a performance penalty, though the performance will increase when accessing the static data e.g. for statistical analysis. From the fig 3.1, it is being observed that the privacy preservation is achieved by adapting the divisive k means clustering which begin with individual, inclusive cluster and, at every step, divide a cluster until only singleton clusters of entity points stay behinds. In this case, we call for to choose, at every step, which cluster to divide and how to carry out the split. The divisive k means cluster is efficiently used for privacy preservation mechanism which overcome the issue of data perturbation technique raises the ambiguity among the clustered datasets results in unreliability. The divisive Kmeans is supported the idea that an axis point can symbolize a cluster. In scrupulous, for K-means we exercise the concept of a centroid, which is the mean or middle point of a cluster of datasets. Note that a centroid roughly on no account communicates to a definite data set. 3.1. Overview of Data Perturbation Technique for Multi-Partitioned Datasets In data perturbation scheme for multi-partitioned dataset, the privacy of the data can be confined by distressing receptive data with randomization algorithms before discharging it to the data miner. The agitated data description is then utilized to excavate patterns and models. The algorithm is so preferred that united possessions of the data can be improved with sufficient accuracy while entity entries are significantly unclear. In this scheme privacy of secret data can be attained by adding up small noise constituent which is attained from the probability distribution. The process of data perturbation using Gaussian distribution model is shown in fig 3.2.
258
V. S. Prakash, A. Shanmugam and P. Murugesan Figure 3.2: Process of perturbation technique Apply Gaussian distribution
Dataset
Perturb the data
Met uncertainty in data
In a collection of data records denoted by X = {x1,…,xN}. For each record xiϵ X a noise constituent, this is drained from the probability distribution. Normally used distributions are the consistent allocation over a period [−α, α] and Gaussian distribution with standard deviation σ and mean μ = 0. These noise constituents are drained separately, and are indicated y1 . . . yN. Thus, the novel collection of indistinct records are specified by x1 + y1 . … x N + y N . We indicate this newfangled set of records by z1 . . . z N. In common, it is understood that the discrepancy of the added noise is huge enough, so that the unique record values cannot be simply estimated from the distorted data. This results in uncertainty and the frequent set of division of data is accessible in the cluster group. This might be perplexed for preserving the users’ individual product data. 3.2. Implementing Divisive Cluster based Technique for Privacy Preservation in MultiPartitioned Dataset A hierarchical clustering is of two types, one is agglomerative clustering (bottom up) in which two groups are combined if distance among the cluster group is minimum than a threshold, another is divisive k-means clustering (top down) in which one group is divide into two if inter-group distance higher than a threshold value. Here, in this work, we are going to present a divisive k-means clustering for privacy preservation in multi-partitioned dataset. The alternative of hierarchical clustering is called top-down clustering or divisive clustering. We establish with all datasets in one cluster. The cluster is divided using a flat clustering algorithm. This practice is functioned recursively until every dataset is in its individual singleton cluster. K-Means algorithms are admired and extensively used clustering methods. They divide the data to diminish the principle: K
E
d
j 1 xi S j
2
( xi , s j ) ………
(1)
Where K is the amount of clusters, xi sj =
xi S j
Sj
sj is the axis of cluster of Sj
d (a,b) is the Euclidean distance Divisive K-Means clustering algorithms regularly occupy deciding an arbitrary primary division or centers, and continually re-computing the centers supported on division and then reevaluating the division based on the centers. Such process can be established to congregate to a restricted minimum, whereas the crisis of recognizing the inclusive minimum is NP-hard. We suggest a hierarchical divisive K-Means algorithm that reduces the same principle as the standard K-Means with clustering process planned as a hierarchical process. For a given set of N items to be grouped, and an N*N distance (or resemblance) matrix, the procedure of divisive hierarchical clustering is this:
Efficient Cluster Based Privacy Preservation Data Perturbation Technique in Multi-Partitioned Datasets
259
1. Establish by conveying every item to a cluster, so that if you enclose N datasets, you now enclose N clusters, each comprising just one item. Let the distances (resemblances) among the clusters the similar as the distances (similarities) among the items they include. 2. Discover the contiguous (most similar) pair of clusters and combine them into a particular cluster, so that now you contain one cluster less. 3. Calculate resemblances between the new group and each of the old groups. 4. Reiterate steps 2 and 3 until all items are grouped into a distinct cluster of size N. (*) Divisive clustering initiates from the top, indulging the entire dataset as a cluster. It constantly divides a present cluster (a leaf node in a binary tree) until the amount of clusters achieves a predefined value K, or some other ending measures are met. The most significant concern here is how to choose the next candidate cluster to divide. 3.2.1. Cluster Split A general approach is to choose the cluster with prevalent size to divide. This approach provides priority to create size-balanced clusters. This can be printed as decide p consistent with p = arg mink (1/nk). This is a sensible approach. Nevertheless, normal clusters are not limited to the condition where every cluster has the similar size. 3.2.2. Identifying the Similarity in Clusters The self-similarity of group Ck is s(Ck, Ck) ≡ skk, which is to be exploited in clustering. Specify s standard self-similarity for every cluster, evaluated as skk kk 2 . A cluster Ck with huge skk indicates nk that group members are more standardized; if we specify resemblance as the contrary of distance, wij = 1/dij, we might declare that cluster members are more secure to each other in Euclidean space, i.e., Ck is dense or tight. A group with small s kk is less standardized or loose. A purpose of min-max clustering standard is to create clusters as compressed and balanced as probable. Therefore, our precedence in dividing clusters is to augment standard similarity for all clusters. Our principle is to decide the loosest (negligible average similarity) cluster p to divide, p = arg mink (skk/n2k). 3.2.3. Cluster Cohesion For a cluster with a specified average resemblance, there could be numerous divergent shapes. An extended cluster (or a group comprising of two well-separated subclusters) could contain the similar standard similarity as a extremely spherical cluster. Cluster cohesion is the minimum value of the MinMaxCut purpose function when the group is divided into two sub-clusters. A cluster k which has minimum cohesion hk indicates it can be significantly divide into two. Consequently a cohesion-based measure is to select the cluster p with the minimum cohesion amid the existing leaf clusters: p = arg mink hk. The grouping of consistency with standard similarity could be a fine cluster assortment criterion. 3.2.4. Stopping Criterion There are two stopping measures are used for terminating the divisive procedure. (i) Finish when the amount of leaf nodes attains the pre-defined K. (ii) Finish when the existing clusters on leaf nodes sets out over a threshold value. Theorem 1 implies that as the divisive procedure persists and the amount of leaf clusters increase, the clustering efficiency also increases. Since it identifies the overlap among divergent clusters (correctly subjective beside self-similarities), Theorem 2 (above Jstop) implies that the existing cluster is previously greatly standardized and it is improved not to incise it further. In applications where we do not recognize the exact K, we desire to use (ii) as ending criterion.
260
V. S. Prakash, A. Shanmugam and P. Murugesan
4. Experimental Evaluation The proposed cluster based privacy preservation data perturbation technique in multi-partitioned datasets has been implemented in Java. The experiments were run on an Intel P-IV machine with 2 GB memory and 3 GHz dual processor CPU. We are going to compare the proposed cluster based privacy preservation data perturbation technique in multi-partitioned datasets with an existing privacy preserving partitioning data using combinatorial function for multi-partitioned data for individual partition data sets. While using combinatorial function for multi-partitioned datasets, the scalability of the products/services is less. But in the proposed cluster based privacy preservation data perturbation technique in multi-partitioned datasets, the strength of the data set to be shared remains identical from the beginning of the dividing process employing divisive k means clustering algorithm. The performance of the proposed cluster based privacy preservation data perturbation technique in multipartitioned datasets is measured in terms of a. data privacy and transparency, b. data perturbed object clusters, c. adversary effect of accessing unauthenticated data
5. Results and Discussion The proposed cluster based data perturbation technique is reliably made for privacy preservation in multi-partitioned dataset in which users can share their file with other users by achieving a safe and secure communication. An experimental evaluation has also been carried out with benchmark dataset to estimate the performance of the proposed CPPDP. The below table and graph describes the performance of the proposed CPPDP and compared the results with an existing Combinatorial function (CF) for multi-partitioned dataset. Table 5.1: No. of transactions vs. Data privacy and transparency Data privacy and transparency (%) Proposed CPPDP Existing CF 48 19 42 26 59 12 60 21 65 33
No. of transactions 5 10 15 20 25
The above table (table 5.1) describes the users’ data privacy and transparency rate when they involved in a communication with other users in the environment. The outcome of the proposed cluster based data perturbation technique is reliably made for privacy preservation in multi-partitioned dataset is compared with an existing Combinatorial function (CF) for multi-partitioned dataset. Figure 5.1: No. of transactions vs. Data privacy and transparency
Data privacy and transparency (%)
70 60 50 40 30 20 10 0 5
10
15
20
No. of transactions Proposed CPPDP
Existing CF
25
Efficient Cluster Based Privacy Preservation Data Perturbation Technique in Multi-Partitioned Datasets
261
Fig 5.1 describes the users’ and their data privacy and transparency rate based on the number of transaction they done with the other users. In the proposed CPPDP, the divisive k-means cluster technique is used for privacy preservation in which it clusters the dataset until all the data in the dataset are clustered into a single cluster size. Since the data in the dataset are clustered, the privacy and transparency of the data is extremely high meant that an unauthorized user cannot observe the data in the multi-partitioned dataset hold by the user. In this case, the privacy and transparency of the users’ data are high contrast to an existing Combinatorial function (CF) for multi-partitioned dataset which cluster the object in a simple manner and the variance on privacy and transparency is 30-40% high in the proposed CPPDP. Table 5.2: No. of objects vs. perturbed data clustering efficiency Perturbed data clustering efficiency Proposed CPPDP Existing CF 24 10 32 16 41 24 53 32 60 43
No. of objects 10 20 30 40 50
The above table (table 5.2) describes the perturbed data clustering efficiency for a privacy preservation of data in the network. The outcome of the proposed cluster based data perturbation technique is reliably made for privacy preservation in multi-partitioned dataset is compared with an existing Combinatorial function (CF) for multi-partitioned dataset.
Perturbed data clustering efficiency (%)
Figure 5.2: No. of objects vs. perturbed data clustering efficiency 70 60 50 40 30 20 10 0 10
20
30
40
50
No. of objects Proposed CPPDP
Existing CF
Fig 5.2 describes the clustering efficiency of the multi-partitioned dataset based on number of objects present in the dataset. To evaluate a trade of between data privacy and transparency of individual’s data, an existing data perturbation technique is presented with Gaussian distribution model to preserve the private data of the individual’s. However in the multi-partitioned data distribution, data perturbation raised ambiguity between vertical and horizontal partitions of the data. To overcome the ambiguity, we plan to introduce divisive k-neighbor clusters for multi-partitioned data sets. But in the proposed cluster based data perturbation technique is reliably made for privacy preservation in multipartitioned dataset, the clustering is done with multi-partitioned dataset using divisive k-means clustering technique which cluster the dataset efficiently and the variance on clustering efficiency is 35-40% high in the proposed CPPDP.
262
V. S. Prakash, A. Shanmugam and P. Murugesan
Table 5.3: No. of data vs. adversary effect of accessing unauthenticated data Adversary effect of accessing unauthenticated data (%) Proposed CPPDP Existing CF 7 14 4 15 13 18 11 24 15 22
No. of data 10 20 30 40 50
The above table (table 5.3) describes the Adversary effect of accessing unauthenticated data and the efficiency of a privacy preservation of data in the network. The outcome of the proposed cluster based data perturbation technique is reliably made for privacy preservation in multi-partitioned dataset is compared with an existing Combinatorial function (CF) for multi-partitioned dataset. Figure 5.3: No. of data vs. adversary effect of accessing unauthenticated data
Ad versary effect o n accessin g un autho riz ed data (% )
30 25 20 15 10 5 0 10
20
30
40
50
No. of data Proposed CPPDP
Existing CF
Fig 5.3 describes the Adversary effect of accessing unauthenticated data of the users involved in the communication based on the number of objects present. In the proposed for privacy preservation, we have implemented successfully the divisive k-means clustering technique follows the top-down approach for cluster the multi-partitioned dataset. Since each objects/data in the dataset are clustered until each item in the dataset is clustered with a singleton cluster size, there is a less chance of multipartitioned data set to be hacked by the adversaries. So, here the adversary rate of accessing the multipartitioned data in the dataset is very less compared to an existing Combinatorial function (CF) for multi-partitioned dataset and the variance on adversary effect on accessing unauthorized data is 3040% low in the proposed CPPDP. At last, the proposed work depicted the privacy preservation mechanism for multi-partitioned dataset by adapting the divisive k-means clustering technique. The proposed cluster based data perturbation technique is reliably made for privacy preservation in multi-partitioned dataset efficiently evaluate a trade of between data privacy and transparency of individual’s data and increase the volume of transaction to more new and existing clients.
6. Conclusion The proposed work efficiently deliberates the privacy preservation mechanism by adapting the divisive k-means clustering technique. The proposed divisive k-means clustering scheme eliminates the ambiguity occurred over horizontal and vertical partitioned dataset by adapting the efficient cluster selection criterion with a pre-defined k value and the clustering process stops its process only after the
Efficient Cluster Based Privacy Preservation Data Perturbation Technique in Multi-Partitioned Datasets
263
stopping criterion met. Compared to an existing Combinatorial function (CF) for multi-partitioned dataset, the proposed cluster based data perturbation technique is reliably made for privacy preservation in multi-partitioned dataset outperforms well and an experimental evaluation has been carried over with bench data sets obtained from popular e-business / e-commerce sites. (Amazon, e-bay etc.,) and estimated the performance of the proposed cluster based data perturbation technique is reliably made for privacy preservation in multi-partitioned dataset in terms of data privacy and transparency, adversary rate, clustering efficiency.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
Kun Liu, Hillol Kargupta, et. Al., “Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining” ,IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 1, JANUARY 2006 P.Kamakshi , Dr.A.Vinaya Babu, “Preserving Privacy and Sharing the Data in Distributed Environment using Cryptographic Technique on Perturbed data”, JOURNAL OF COMPUTING, VOLUME 2, ISSUE 4, APRIL 2010, ISSN 2151 9617 Li Liu , Murat Kantarcioglu et. Al., ‘The applicability of the perturbation based privacy preserving data mining for real-world data’, Science direct on Data & Knowledge Engineering 65 (2008) 5–21 Ying peng Sang, Hong Shen et. Al., “Effective Reconstruction of Data Perturbed by Random Projections”, IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO. 1, JANUARY 2012 Dmitriy Fradkin et. Al., “Image Compression in Real-Time Multiprocessor Systems UsingDivisiveK -MeansClustering”, International Conference on Integration of Knowledge Intensive Multi-Agent Systems, 2003. Jaideep Vaidya, et. Al., “Privacy-Preserving Kth Element Score over Vertically Partitioned Data”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 21, NO. 2, FEBRUARY 2009 Shuguo Han, et. Al., “Privacy-Preserving Gradient-Descent Methods”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 6, JUNE 2010 Keke Chen et. Al., “Privacy-Preserving Multiparty Collaborative Mining with Geometric Data Perturbation”, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. 12, DECEMBER 2009 K. Chen and L. Liu, “Space Adaptation: Privacy-Preserving Multiparty Collaborative Mining with Geometric Perturbation,” Proc. IEEE Conf. Principles on Distributed Computing, 2007. K. Chen and L. Liu, “Towards Attack-Resilient Geometric Data Perturbation,” Proc. SIAM Data Mining Conf., 2007 S. Han and W.K. Ng, “Privacy-Preserving Genetic Algorithms for Rule Discovery,” Proc. Ninth Int’l Conf. Data Warehousing and Knowledge Discovery (DaWak), pp. 407-417, Sept. 2007. S. Han and W.K. Ng, “Privacy-Preserving Linear Fisher Discriminant Analysis,” Proc. 12th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), May 2008.