Document not found! Please try again

Big Data Privacy: Issues and Challenges

7 downloads 74814 Views 479KB Size Report
The term “Big Data” refers to large and complex ... •Multidimensional Databases ... •Privacy in social media data, multi-source data analysis. Fig 3 White House ...
Big Data Privacy: Issues and Challenges Brijesh B. Mehta, Udai Pratap rao Computer Engineering Department, Sardar Vallabhbhai National Institute of Tehcnology, Surat

Introduction The term “Big Data” refers to large and complex data sets made up of a variety of structured and unstructured data which are too big, too fast, or too hard to be managed by traditional data management techniques[6]. Big data can be characterized by 4 Vs[8]: • Volume, size of data • Variety, structured or unstructured • Velocity, static or stream data • Veracity, accuracy and reliability of data

General Architecture of Big Data Rongxing et al.[7] have discussed general architecture of big data which is shown in fig 1.

Fig 1 General Architecture of Data [7]

Multi-source big data collection Executive office of the president[5] USA have given a partial list of source of big data such as, • Public Web • Social Media • Mobile Applications • Geo-spatial Data • Surveys • Traditional off-line documents scanned by optical character recognition in to electronic form • Sensors and Radio-Frequency Identification(RFID)chips • GPS chips

Intra/inter big data processing

Fig 2 Word Count using Map Reduce [1]

Distributed big data storing Types of NoSQL databases[2]: • Wide Column Store / Column Families • Document Store • Key Value / Tuple Store • Graph Databases • Multimodel Databases • Object Databases • Multidimensional Databases

Privacy Issues Some of the privacy issues with big data • Privacy in big mobile data, over collection of user data • Privacy in e-governance, misuse of data • Privacy in social media data, multi-source data analysis

Fig 3 White House Survey Results [3]

Research Challenges

Conclusion

Why existing methods of privacy preservation can not be applied to big data? • Multi-source of data, risk of re-identification is always there because data is being collected from multiple sources. • Variety and velocity of data, anonymization techniques like, generalization, suppression, anatomization, etc. can not be directly applied to big data because it is difficult to distinguish between sensitive and non-sensitive attributes in unstructured data. • Feature extraction (identifying sensitive attribute), is one of the most challenging task while privacy preserving unstructured data mining. • It is hard to maintain trade off between privacy and data utility

Some of the existing privacy preserving techniques with little modification can be useful to big data privacy (for e.g. de-identification technique such as k-anonymity is modified in such a way that it should also work with unstructured data). As per big data working group, Homomorphic encryption and differential privacy are some of the promising technologies for preserving privacy in big data.

Top ten big data security and privacy challenges Big data working group[4] at cloud security alliance, a non-profit organization from Nevada USA, have identified top ten big data security and privacy challenges: 1 Scalable and composable privacy preserving data analytics 2 Cryptographically enforced data centric security 3 Granular access control 4 Secure computations in distributed programming frameworks 5 Security best practices for non-relational data stores 6 Secure data storage and transactional logs 7 Granular audits 8 Data provenance 9 End-point validation and filtering 10 Real time security monitoring.

References [1] http://xiaochongzhang.me/blog/ wp-content/uploads/2013/05/MapReduce_ Work%_Structure.png, Accessed: November 4, 2014. [2] http://nosql-database.org/, Accessed: February 4, 2014. [3] http://www.whitehouse.gov/issues/ technology/big-data-review, Accessed: November 4, 2014. [4] Expanded Top Ten Big Data Security and Privacy Challenges. Technical Report April, 2013. [5] Big Data: Seizing Opportunities, Preserving Values. Technical Report May, Executive Office of the President, Washington, D.C., 2014. [6] K. Grolinger, M. Hayes, W. Higashino, A. L’Heureux, D. Allison, and M. Capretz. Challenges for mapreduce in big data. In 2014 IEEE World Congress on Services (SERVICES), pages 182–189, June 2014. [7] R. Lu, H. Zhu, X. Liu, J. Liu, and J. Shao. Toward efficient and privacy-preserving computing in big data era. IEEE Network, 28(4):46–50, July 2014. [8] F. J. Ohlhorst.Big Data Analytics: Turning Big Data into Big Money. Wiley Publishing, 1st edition, 2012.