International Journal of Big Data Security Intelligence Vol. 2, No. 1 (2015) pp.21-26 http://dx.doi.org/10.21742/ijbdsi.2015.2.1.03
Security and Privacy of Big Data in Various Applications Abeer M. AlMutairi 1 , Rawan Abdullah Turdi AlBukhary 1 and Jayaprakash Kar 2 1,2
Department of Information Technology, Faculty of Computing & Information Technology Information Security Research Group, Department of Information Systems, Faculty of Computing & Information Technology
[email protected] Abstract
Due to the increasing use of technology and all of its resources such as computers, the Internet, mobile phone and social networks, the amount of existing data is increasing as well. The resultant from this expansion is called Big Data. Nowadays big data is becoming an essential source of information that feed organizations and impact economies and international decisions by increasing operational efficiencies and improving performance and even the detection of international threat attack. The data contain private and sensitive information that need to be kept secure and safe. In this article, we have discussed the most important security issues of big data in various applications. Keywords: Threat, Cloud security, Structure data.
1. Introduction The invention of computers and the Internet long time ago, resulted in a manageable amount of electronic data that was stored in different and basic storage utilities. Then and close to current time, social networks and mobile smart phones were introduced. Having those simplified the use of the Internet and made it reachable to almost everyone around. The use was increasing day by day and the amount of data was increasing respectively resulting in Big Data. Big Data refers to a massive amount of data that is rapidly increasing and changing. It is consisted of text, images, documents, audio, video and other file types. It is stored mostly on the clouds where it is an accessible place from anywhere and all the time through the Internet. Having this amount of data makes it possible for organization and businesses to process and analyze the data to develop, enhance, and maintain their business. For example, in health care, big data can be used in many aspects regarding helping patients, their doctors, the hospital, pharmacies, ambulance system and etc. According to [1], sensitive Data which regularly contain sensitive and private information important to the user. Then, Non-sensitive data which is not as important to the user as the other type and no harm if it is published processed or stored. Despite the several advantages that big data provides, there is one main issue arising from the situation which is security. How to achieve security to sensitive data? How to protect data form unauthorized or harming use? And how to keep user’s privacy intact and not invaded? Recently the term ”Big Data” has become a hot topic not just in IT industries, but also in academia, industry, and governments, where the continuous advancements in technologies and telecommunications lead to an exponential growth of data. A lot of definitions describe the concept of big data and differentiate it from the typical data in different ways depending on its volume, velocity, and value to organizations and
ISSN: 2205-8524 IJBDSI Copyright ⓒ 2015 GV School Publication
International Journal of Big Data Security Intelligence Vol. 2, No. 1 (2015)
enterprises. The Gartner Research Company defined big data as ”high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making” [7]. Nowadays big data is becoming an essential source of information that feed organizations and impact economies and international decisions. Governments and organizations who are gaining and harnessing big data brought many advantages for their own such as [6]: Increasing operational efficiency and operating margins by taking advantage of detailed customer data [17]. Improving performance by collecting more accurate and detailed performance data. Targeting users based on their needs. Improving decision making and minimizing risks. Invention of new business area, services and products. Improving threat detection capabilities of governments. Although the existence of Big Data can avail organizations and enterprises, the highvolume and variety of data brought the needs to find mechanisms for processing and analyzing the data. Big data is composed of the following data format [7]. Structured data: Data in fixed fields like relational databases or data in spreadsheets. Unstructured data: Usually contain the most sensitive data like free form text (e.g., books, articles, body of email messages), un-tagged audio, image and video data. Semi-structured data: Data that are unconventional are essential to fix fields, but contain various markers and tags to segregate data elements like XML or HTML tagged text.
2. Security Goals The most important security goals for big data are: Privacy Authenticity, Integrity Access Control In this article we brief these security requirements of big data in various application including healthcare applications, social networks, government, e-commerce, finance and science.
3. Healthcare Applications Harsh et al. [3] have described the evolution of healthcare filing system and the use of Electronic Healthcare Records that are mostly stored in the clouds. It also indicated the multiple possible procedures that could happen invading user’s privacy and rights. The authors encourage researchers to enhance technologies to keep users data private and secure especially with increase of data volumes and the refusal of cloud hosting companies to grant centralized access of cloud management and providing it as a shared distributed one. David et al. [4] has briefed about the need for storing patient’s records in order to analyze and use them to improve many e-health sectors. In addition, he highlighted the need for keeping those records secure and safe when they’re stored waiting to be processed. To investigate that, there was a simulation of three technical environments of storing those records. The first one was the Plain Text environment where no encryption is conducted on records. The second was the Microsoft Encryption using Microsoft built-in tools. And the third was the Advanced
22
Copyright ⓒ 2015 GV School Publication
International Journal of Big Data Security Intelligence Vol. 2, No. 1 (2015)
Encryption Standard (AES) in conjunction with the Bucket Index in DaaS. Results indicated that AES-DaaS is more efficient and scales better than MS Encryption. It also advised that encryption is made to certain sensitive attributes to not overwhelm the IT physical resources in achieves secrecy and privacy in big data.
4. Social Networks Social networks are one of the main resources that is producing big data which can be used in harming the user as much as it can be useful or even more. Author mentioned the areas or organizations that could use the data of users and analyze it [5]. For example, organizations for marketing purposes, governments for surveillance purposes and even insurance companies to raise premium fees. The paper mentioned the issues that could result from using the social network and focused on the ones resulting from locations. The author has suggested three models to increase the privacy awareness and invited researchers to test and apply those models. Hao et al. [1] discuss the growth of big data to social networks focusing on the media data. The article proposes a scheme of security that encrypts the data to achieve confidentiality in big data stored in any format. But before encryption, it adds a fingerprint in order to establish relationship between users. This fingerprint is what makes scheme special and what makes encryption suitable for social networks. Testing scheme results showed that it is secure, efficient and has low computation time.
5. Finance Banks are now dealing with big data just like other areas. [11 researchers proposes a system design that deals with big data of the banking system while keeping under consideration the complexity and the multiple requirements a banking system must have requirements in processing, privacy and safety and even in storage. [14] introduced a privacy approach for big data applications including the financial ones that focuses on the concept of Order Preserving Encryption (OPE) where some fields are encrypted according to its sensitivity. In addition to that, the approach is concerned with information shared between multiple organizations and for that, authors suggest having private information on private clouds where it is strictly treated and processed.
6. E-commerce E-commerce is another example of applications that produce big data and all security and privacy issues must be resolved before security breaches could spread. An approach is proposed to e-commerce data that is stored on the cloud [12]. The approach uses an enhanced k-means clustering algorithm along with encryption to provide security and hash function for authenticity. The scheme is designed for cloud data that is stored on more than one machine. Researchers suggested a quality model for increasing the quality of B2C e-commerce websites focusing on four quality factors [13]. Those factors are privacy, security, content and design. The model is expected to use a small amount of data and thus is suitable for the increasing amount transactions and users accessing e-commerce websites. Authors suggest implementing the model to real working websites in their future work.
Copyright ⓒ 2015 GV School Publication
23
International Journal of Big Data Security Intelligence Vol. 2, No. 1 (2015)
7. Government Due to the enormous growth in technologies, many government attempts to transfer their functions and services into digital governments where government agencies and public sectors exchange a high volume of information between heterogeneous information to achieve authenticity, integrity, availability, non-repudiation, controllability and reliability, along with privacy which was the only focus on traditional government security. The author suggests that Chinese government must construct its e-government based on several principles and security mechanisms for protection, monitoring, and recovering. The author concludes that e-government information security systems needs mutualefforts and coordinated as well improving to cope with the importance of big data. Governments adopt methods for collecting, processing, and analyzing big data to observe people’s online activities to detect any suspicious behavior and criminal activities in order to prevent, detect crimes, and protect national security. Randall et al. [16] raise the concern of people awareness of collecting and using their personal data by government and the possibility of losing their privacy. The author suggests to adopt a privacy preserving approach over government datasets. Jeong et al. [18] analyze the privacy invasion based on many kinds of accidents from 2003 to 2013 in South Korea in order to study policy issues for ICT security. The study shows that the privacy invasion have been increased to over 200 after 2011 in South Korea because of the difference of culture and technology of big data and the need to design security policies according to these differences. The author advises governments to establish an organization that is responsible for ICT policy and management to handle new big data era.
8. Telecommunication The evolution of networks and smart phone devices have increased the wealth of information in telecommunications industry and communication service providers where data is generated whenever a subscriber to these companies use the Internet which lead to ever growing flow of data, hence make them a source of big data. Now a day communication service providers are aware of their big data and how to use these data to provide a valuable information about their customer’s behavior and lifestyle. However, the needs to provide a secure electronic communications services to the customers and preserve their privacy while they using the Internet or making calls are very challenging. Grkaynak et al. [19] says that the dynamic nature of electronic communication technologies and the security concerns and challenges regarding electronic communications in big data era, forces national and international authorities to find unified solutions to fight against these challenges. They evaluate various legal frameworks for electronic communications security in European regulators and wither it conformed to Turkish laws. The Authors argue that the Turkish legal framework for electronic communications security is not comprehensive enough to embrace future developments in the electronic communications. Martucci et al. [20] argue that using cloud computing solutions is essential for the telecommunication industry. However, cloud computing brings privacy and security challenges that need to be taken into account before integrate it with telecommunications providers. Authors list the requirements for privacy, security and trust related to the integration of the telecommunication industry and cloud computing services.
24
Copyright ⓒ 2015 GV School Publication
International Journal of Big Data Security Intelligence Vol. 2, No. 1 (2015)
9. Science Science and all its branches is one of the most important areas affected by the evolution of big data and its consequences. In [8], an emphasis on the potential data-driven researches that are now possible because of the big data concept and the positively different outcomes that are expected to result due to the use larger amounts of data and more sophisticated digital tools. Another paper [9], introduces the use of data mining on big data to improve all aspects of science and engineering fields. The authors propose a theorem called HACE which focuses on the newly advent of big data and the powerful computers needed to process such type. The theorem suggests a model of three tiers and focuses one of them for keeping data private and secure along with the useful processing. One more area of science that is a bit closer to the Information Technology field than previously mentioned areas which is software engineering. [10], authors represent a privacy extension to the standard UML approach where all privacy requirements are apparent to the software engineer without the need to go through details. The proposed extension is tried on with a single stake holder as well as two and more stake holders each having their own privacy specifications.
10. Conclusions In this article, we have discussed critical security issues of big data in health care applications such as social networks, finance, e-commerce, government, telecommunication and science. These applications store massive volumes of data. The high-volume, highvelocity, and high-verity nature of big data brought the needs to find mechanisms for securing the data from different kind of attack and disclosure [17]. Also, an organization must carefully adopt security policies and technique to protect the big data from different kind of attacks.
References [1] [2]
[3] [4]
[5]
[6] [7] [8] [9]
W. Hao, “Secure Sensitive Data Sharing on a Big Data Platform”, Tsinghua Science and Technology, vol. 17, no. 1, pp. 72-80, (2015). J Kar ”Provably Secure Online/Off-line Identity-Based Signature Scheme for Wireless Sensor Network” International Journal of Network Security, Taiwan, vol.16, no. 01, pp. 26-36, Jan. (2014). K. Harsh and S. Ravi, ”Big Data Security and Privacy Issues in Healthcare”, 2014, IEEE International Congress on Big Data, pp. 762-765, (2014). D. Shin, T. Sahama and R. Gajanayake, ”Secured e-health data retrieval in DaaS and Big Data”, 15th IEEE International Conference on e-Health Netwotking, Applications and Services, pp. 255259, (2013). M. Smith, C. Szongott, B. Henne and G. von Voigt, ”Big Data Privacy Issues in Public Social Media”, Digital Ecosystems Technologies (DEST), 6th IEEE International Conference, pp. 1-6, (2012). C. Tankard, ”Big data security”, Network security, vol. 2012, pp. 5-8, (2012). R. Devakunchari, ”Handling big data with Hadoop toolkit” In Information Communication 0and Embedded Systems (ICICES), 2014 International Conference on, pp. 1-5, (2014). Jacob D, “Data-driven medicinal chemistry in the era of big data”, Drug Discovery Today, Vol. 19, No. 7, pp. 859-868, (2014). X. Wu, Z. Zhu, G.Q.Wu and W. Ding, ”Data Mining with Big Data”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 26, no. 1, pp. 97-107, (2014).
Copyright ⓒ 2015 GV School Publication
25
International Journal of Big Data Security Intelligence Vol. 2, No. 1 (2015)
[10] D.N. Jutla, P. Bodorik and S.Ali, “Engineering Privacy for Big Data Apps with the Unified Modeling Language”, IEEE International Congress on Big Data, pp. 38-45, (2013). [11] A. Munar, E. Chiner, I. Sales, “A Big Data Financial Information Management Architecture for Global Banking”, 2014 International Conference on Future Internet of Things and Cloud, pp. 385-388, (2014). [12] D. Mittal, D. Kaur, A. Aggarwal, “Secure Data Mining in Cloud using Homomorphism Encryption”, Cloud Computing in Emerging Markets (CCEM),IEEE International Conference, 2014, pp. 1-7, (2014). [13] R.M. Al-Dwairi and M.A. Kamala. ”Business-to-Consumer E-Commerce Web Sites: Vulnerabilities, Threats and Quality Evaluation Model” Electronics, Communications and Computer (CONIELECOMP), 20th International Conference, pp. 206-211, (2010). [14] M. Ahmadian, A.Paya, D.C. Marinescu “Security of Applications Involving Multiple Organizations - Order Preserving Encryption in Hybrid Cloud Environments”, IEEE 28th International Parallel & Distributed Processing Symposium Workshops, pp. 894-903, (2014). [15] D. Zhao ”On Several Major Issues of the Construction of Chinese E-government Information Security System,” in Business Computing and Global Informatization (BCGIN), International Conference, pp. 274-277, (2011). [16] S. M. Randall, A. M. Ferrante, J. H. Boyd, J. K. Bauer and J. B. Semmens, ”Privacy preserving record linkage on large real world datasets,” Journal of biomedical informatics, vol. 50, pp. 205212, (2014). [17] J.Kar and M.R.Mishra, “Mitigate Threats and Security Metrics in Cloud Computing”, MAGNT Research Report, vol. 03, no. 4, pp. 159-166, (2015). [18] H. Moon, H. S. Chou, S. H. Jeong and J. Park, “Policy Design Based on Risk at Big Data Era: Case Study of Privacy Invasion in South Korea”, in Big Data (BigData Congress), 2014 IEEE International Congress on, pp. 756-759, (2014). [19] G. Gurkaynak, I. Yilmaz and N. P. Taskiran, “Protecting the communication: Data protection and security measures under telecommunications regulations in the digital age”, Computer Law & Security Review, vol. 30, pp. 179-189, (2014). [20] L. A. Martucci, A. Zuccato, B. Smeets, S. M. Habib, T. Johansson and N. Shahmehri, ”Privacy, security and trust in cloud computing: The perspective of the telecommunication industry”, in Ubiquitous Intelligence & Computing and 9th International Conference on Autonomic & Trusted Computing (UIC/ATC), (2012).
26
Copyright ⓒ 2015 GV School Publication