Survey of Big Data Information Security

BigR&I 2016

Survey of Big Data Information Security Natalia Miloslavskaya and Aida Makhmudova National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russia

Vienna, 23 August 2016

SURVEY OF BIG DATA INFORMATION SECURITY

CONTENT Introduction 1. Related Works’ Analysis. 2. Availability Improvement. 3. Integrity and Privacy Improvement.

4. Research Objective. 5. Secure Big Data Mining Algorithm Design.

BigR&I2016

Conclusion


2


Introduction (1/3)

Additional features

3V big data model Volume Velocity

Veracity

Variability

Value

Visibility

Variety

The Big Data technology is used widely in the world due to:  The large amount of data that generated in the daily activities of the various  The ability to analyze effectively the big volumes of organizational information

collected from the different sources  Cost minimization of the large-scale and reliable data storages design  High availability and scalability of data processing

 More high information system network performance Vienna, 23 August 2016

BigR&I2016

organizations

3


Introduction (2/3)

Big Data IS Problems: 

The growth of targeted attacks on organizations Secure data collection and centralized data integration from multiple heterogeneous sources The protected access both to the "raw" (unprocessed) data and already obtained analysis results

In particular for network monitoring systems:  

Big data volumes should be handled correctly and efficiently to identify the IS incidents All the network traffic contains important information transmitted in user shared environment and requiring to ensure its integrity, availability and confidentiality Vienna, 23 August 2016

BigR&I2016

 

4


Introduction (3/3)

The amount of data involved in the implementation of modern large-scale projects

is not acceptable for processing within the traditional network

The Big Data storages and architectures are employed more frequently, not always designed to meet the requirements to ensure their IS. Vienna, 23 August 2016

BigR&I2016

architectures

5


1 Related Works’ Analysis

In March 2012, the US government invested 200 million $ in "The Big Data Research and Development Initiative" to improve the means and methods of proper organization and rapid analysis of large digital data volumes with existing availability problems. The Big Data architecture assumes that methods for their storage, analysis and use considerably vary depending on the applications. With the several hundred terabytes of data, this leads to the system failures (e.g. in bio-engineering).

1)

availability ensuring in distributed computing environments

2)

integrity and privacy ensuring

BigR&I2016

Research in the area of Big Data analysis


6


2 Availability Improvement (1/2)

Some Big Data processing requirements: 1) the reliable storage infrastructures that focused on providing frequent access to data with a large number of the corresponding transactions 2) sufficient communication channel bandwidth in the data centers 3) contemporary access to data processed by different applications simultaneously (this demands the precise coordination of downloads from these applications with the extraction and transfer processes carried out and guarantee of data availability during the request). The slowdown of queries’ processing and their performance decrease is unacceptable for crucial applications 4) high availability support and providing access control to Big Data throughout their life cycle applying to the data (experimental or "raw") collection, extraction from the storage, internal filtering, categorization, processing, archiving and analysis.


BigR&I2016

An organization’s Intranet where Big Data analysis is carried out is a distributed computing infrastructure. Distributed computing: computing based on open standards and protocols of access to computing resources, applications and data on the Intranet. Its usage requires the effective solutions (like products from IBM, Oracle, Microsoft, etc.) The availability of Big Data affects primarily the business decisions. With increasing data volumes, the issue becomes complicated because in addition to the data itself one needs to work with additional parameters characterizing this data.

7


2 Availability Improvement (2/2) Analyzed works: 1. NiagaraCQ (So J. Chen and F. Tian): uses the adaptive scheme of increasing query grouping. Drawbacks: the redirection of the requests in the new formed groups is not specified.

2. The distributed model of C. Olston and J. Widom: uses centralized processing of the data requests. Drawbacks: the technique allows request responding with some approximation and only works in cases that do not require a high query processing precision.

Note: The relational DBs are not applicable as the storage systems because of horizontal scalability, consistency and absence of means for the presentation of structured relations of their elements. The alternative: non-relational DBs (such as NoSQL) with their scalability and flexibility. But they have, in particular, the limited capacity to provide access control and ensure IS in general (for example, program optimization of techniques to ensure the integrity of the transactions with Big Data have a negative impact on the DB performance and scalability). Vienna, 23 August 2016

BigR&I2016

3. The architecture of D. Estrin and R. Govindan: consists of three components that are called the collectors, analyzers and dumps. Drawbacks: the problem of the model support in a dynamic environment is not solved.

8


3 Integrity and Privacy Improvement (1/3)

For distributed Big Data mining it is required the information exchange’s participants to perform function computations together on the basis of their protected data, preserving their integrity and confidentiality (secure multiparty computation).

Ensuring data integrity and transmission during the collection, delivery, acquisition, integration, categorization, correlation, analysis and further use, as well as the integrity of the intranet components themselves is critical to make the right management decisions. In addition to Big Data confidentiality, it is necessary to solve the problem of ensuring the data integrity that can emerge with the substitution of data source or data itself. Vienna, 23 August 2016

BigR&I2016

The distributed Big Data mining and the use of confidential computation protocols overlap, since for both processes the computation of functions is performed by multiple information systems’ users without the need for disclosure of the input data to each other (e.g. the millionaires problem by Andrew Yao).

9



Most of the algorithms are based  either on the use of cryptographic algorithms  or data transformation or corruption methods, which can be divided into several types: 1) the method of perturbation theory that considers confidentiality and integrity in relation to the information received in response to the requests from external objects to the statistical DBs; 2) the use of additive and multiplicative noise generated from the probability distribution of the data values; 3) data anonymization. Vienna, 23 August 2016

BigR&I2016

Two types of attackers for the secure computation protocols: 1) “semi-honest" adversaries are individuals who perform the actions of the cryptographic protocol and also try to get additional information about the data of the other parties during its execution 2) "malicious attackers" is the second type of adversaries, which assumes that an attacker can deviate from the protocol execution and send forged messages to other parties in order to reveal their confidential data.

10



Analyzed works: 1. Constant round protocol (A. Yao): uses the secure probability functions computing. Drawbacks: the method is generally applicable for the Big Data mining, but is not suitable to solve the issues related to the distributed Big Data analysis. 2. The homomorphic and commutative data encryption. Drawbacks: computational costs; it requires an entirely synchronized, distributed computing environment.

4. The multiplicative distortion methods. Drawbacks: the difficulties in identifying the information disclosure and corruption may occur; susceptible to the so-called IO attack. 5. The k-anonymity model (L. Sweeney). Drawbacks: the model is subjected to the attack with uniformity if an intruder has initial information about the confidential data set. 6. The ε-differential model. Drawbacks: the data query response may be directed only to a limited number of requests and then the database must be stopped in order to prevent leakage of confidential information. Vienna, 23 August 2016

BigR&I2016

3. Perturbation and spectral analysis model. Drawbacks: subjected to the known-sample attack.

11


4 Research Objective

The analyzed methods and algorithms significant drawbacks in general: they are subjected to a number of attacks and do not possess the necessary properties such as sufficient performance and scalability. Research objective: the formulation and substantiation of the specific recommendations for developing the Big Data secure analysis algorithm based on the IS requirements and the analyzed data (initial, intermediate and received as the result of the analysis). To achieve the above-mentioned goals the solution must be offered that provides the required levels of privacy and integrity in heterogeneous interconnected computing environments via the distributed, effective in terms of communication, multi-objective optimization.


BigR&I2016

(namely, integrity, availability and confidentiality) with respect to both the analysis itself

12


5 Secure Big Data Mining Algorithm Design (1/3)

To address the problem of providing Big Data availability and scalability: the entity-based model usage, that formalizes the user requests to the data

The entity is designed independently from the actual data storage.

End-users with the help of a formal query language can formulate requests using certain entity-based terms. To perform the requests the mappings are provided describing the relationship between

the entity-based terms and their representations in the data sources. Vienna, 23 August 2016

BigR&I2016

[R. Möller, V. Haarslev, M. Vessel, D. Calvanese, G. Giacomo, D. Lembo, M. Lenzerini, A. Amoroso, G. Esposito, D. Lembo, P. Urbano, etc.]

13


The entity-based approach’s applicability to address issues of users’ access to Big Data: 1) the involvement of end-users and IT experts is not required for writing specialpurpose software code; 2) In this case data can be stored in relational DBs. In many cases, the transfer of the large volumes of complex sets is impractical. Moreover, it is necessary to ensure scalability to use existing optimized data structures (tables) and avoid the query complexity increase by data fragmentation; 3) the use of flexible query language is provided that corresponds to data conceptualization on the end-user side; 4) the entity can be used to hide the details and set the abstractions. It is important in the cases where the source scheme is applied that is too sophisticated for the endusers to understand; 5) the relationship between entity concepts and relational data is clear when performing mappings. It allows the IT experts working with DBMS to understand easily the specific user requests; 6) the approach is flexible enough and suggests using the infrastructure that is simpler to design and support. Vienna, 23 August 2016

BigR&I2016


14



To address the problem of providing Big Data integrity and privacy: it should be considered in terms of the multi-objective optimization, where it is necessary to find the optimal solution between the two possible conflicting objectives for every user: 1)

the data privacy and integrity maximization (or minimization of the privacy and integrity threat violation) the minimization of the costs related to the adequate levels of confidentiality and integrity support.

The trade-off between the privacy and integrity values and corresponding costs should be provided in order to reach efficiency when working with Big Data. Vienna, 23 August 2016

BigR&I2016

2)

15


Conclusion The Big Data IS challenges are the subject of many scientists’ research and works. However the methods proposed in these works related to the insufficient scalability and performance for the Big Data analysis and exposure to certain attacks aimed at the violation of the IS properties. To solve these issues the following main concepts of the secure Big Data mining algorithm design providing the IS properties were formulated:

 the formalized approach where the user's request would be converted into a form clear for the  confidentiality and integrity should be considered in terms of the multi-objective optimization, where it is necessary to find the optimal solution among the two possible conflicting objectives (privacy and integrity level and the related costs). Future research: to form the more specific requirements for the algorithm implementation, its development and testing for the Big Data mining, ensuring the IS properties. Vienna, 23 August 2016

BigR&I2016

processing by data sources (entity-based model);

16


BigR&I 2016 Natalia Miloslavskaya

BigR&I2016

[email protected]


17

Survey of Big Data Information Security

Survey of Big Data Information Security

Suggest Documents

Big Data Information Security Maintenance

Information Security Maintenance Issues for Big Security-Related Data

A Survey of Big Data Cloud Computing Security - International Journal

Big Data: A Survey

Big Data Survey - Computerworld

Big Security Data

Big data analytics: a survey - Journal of Big Data

Big Data infrastructure: A survey

Solve Big Data Security Issues

approximate search for big data with applications in information security

Survey - Information Security Media Group

Big Data Analytics for Information Security (Networking ... - Google Sites

2013 Global Information Security Survey

Global Information Security Survey 2015

Information â Security Data

Information â Security Data

Survey of Apache Big Data Stack

Survey of Cyber Crime in Big Data

the big data information architecture

A Survey of Security Issues, Information & Securi

Information Accountability of Healthcare Big Data

Towards the Development of Best Data Security for Big Data

Big Data in Memory A Survey

Big Data Executive Survey 2017 - NewVantage Partners

Survey of Big Data Information Security