Security Intelligence Centers for Big Data Processing

0 downloads 0 Views 1MB Size Report
SIC Business Logic. 5. ... time management of all IS-related data is required .... an organization is looking at every aspect of its IS threat management in relation ... 8) Behavioral-based cross-correlation that triggers priority alerts and automated ...
BigR&I-2017

Security Intelligence Centers for Big Data Processing Natalia Miloslavskaya National Research Nuclear University MEPhI (Moscow Engineering Physics Institute) “Information Security of Banking Systems” Department

Prague, 21 August 2017

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

CONTENT Introduction 1. Big IS-related Data. 2. Security Intelligence Concept. 3. Security Intelligence Centers (SICs).

4. SIC Business Logic. 5. SIC’s Data Architecture.

BigR&I-2017

Conclusion

Prague, 21 August 2017

2

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

Introduction (1/3) Gartner by 2020, 30 % of global enterprises will have been directly compromised by an Gartner: independent group of cybercriminals or cyberactivists. Verizon 2015 Data Breach Investigations Report: Report in 60 % of network breaches, hackers compromise the network within minutes. E.g.: an organization with 1 head office and a few regional branches ~ 25 different network devices and information protection tools (IPTs) to monitor (like firewalls (FWs), ~ a 30,000 events/sec ongoing rate (this is not a peak, but averaged rate over the course of 24 hrs)  a log data with 750,000 potential events to review (some events will be strictly operational, some will be IS-related, and some could apply to both)

 let’s reduce the interesting event types to 25%, yielding an analysis field of 187,500 items. If this number of devices will be >25? Plus monitoring of outsiders and malware, as well as possible insiders’ abuse. This data expands exponentially when factors, such as device operating system (OS) versions and time are included in calculations. Prague, 21 August 2017

BigR&I-2017

identity and access management system (IAMS), intrusion detection/prevention system (IDPS), web server, mail server, database server, proxy, antivirus, security appliances, etc.)

3

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

Introduction (2/3) Another important factor for IS incident detection: a correlation of both real-time and near-real-time monitoring data and stored data (usually 30 days for log data used for operational IS analysis and one year for searchable full archives). The ever-increasing volumes of IS-related information is one issue, but the real problem for protecting intranets is the speed with which things related to IS happen.

 Not only monitoring, but fullfull-time management of all ISIS-related data is required The preventive IS controls based on risk assessment decrease the majority but not all IS incidents. A unified, inclusive, scalable, effective and efficient system with all the necessary “best-in-thebreed” tools and measures will allow security analysts to better understand their current intranet landscape, mitigate and promptly respond to IS threats and truly manage IS for their intranets! Prague, 21 August 2017

BigR&I-2017

 IS incident detection and response with the appropriate accuracy level must be maximally automated (where the involvement of IS professionals is not required)

4

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

Introduction (3/3) => an IS management system is required for “on-the-fly” IS incident detecting, minimizing loss and destruction, mitigating exploited vulnerabilities and restoring network structure and services It will do this through gathering, analysis and filtering of raw big IS-related data that are then collected into appropriate databases, thoroughly and comprehensively analyzed, presented and visualized in different reports, transferred for IPT reconfiguration and online IS management

problem! The late 1990’s: Security Operations Center (SOC) for IS incident monitoring with the right IPTs and skilled staff in place. 2010-2011: Security Intelligence Center (SIC) to temporarily deal with network- and higher-level IS events with an integrated IS architecture providing full visibility and control and context-driven security intelligence in one place. Paper’s goal: to analyze and develop the Security Intelligence concept as a preparatory stage for creating our own Network Security Intelligence Center. Prague, 21 August 2017

BigR&I-2017

volume, velocity, variety, veracity, variability, value and visibility

5

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

1. Big IS-related Data (1/2) A tremendous amount of heterogeneous data characterized the current state of a network and at a first glance unrelated events take place in them. The data are generated from the raw data considered in a particular context and coming not just from the separate DNS servers, domain controllers, proxies, IPTs, but also describing the current device configurations, the characteristics of network traffic, application and network services functioning, activity and specific actions of individual end-users, as well as containing e-mails, phone records, web-based content, metadata, digitized audio and video, GPS locations, the data of business processes, organization’s internal documents and analytical data…

1. 2. 3.

4.

Data locked up in disparate security devices, applications and databases Data collected from end-point products, applications, etc. (such as application execution logs, validation logs of federated IAMSs and so on), creating another silo such as another database where that data is stored with no communication and coordination with the first one Data in streams with the following unique features: huge or possibly infinite volume, dynamically changing, flowing in and out in a fixed order, demanding fast (often real-time) response time and so on. Typical examples: various kinds of time-series data and data produced in a dynamic network environment such as network traffic (consisting from network packets which are specific for a concrete network protocol), telecommunications, video surveillance, Website click streams, IS monitoring, sensor networks and so on Data segregated by organization’s business units and operations groups (often related to sensitive data)

Prague, 21 August 2017

BigR&I-2017

Enormous volumes of data in different formats, for different purposes and often with different policies and even compliance requirements:

6

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

1. Big IS-related Data (2/2) A problem of siloed data structured, consolidated and visual presentation and secure processing to make timely and informed decisions in the field of IS management for organization’s intranets rises very sharply.

It integrates various techniques from multiple disciplines: DBs and data warehouses, statistics, machine learning, high-performance computing, pattern recognition, neural networks, data visualization, information retrieval, image and signal processing and spatial or temporal data analysis…

All these techniques are well-integrated in a Security Intelligence concept.

Prague, 21 August 2017

BigR&I-2017

In general, big IS-related data processing is aimed at data mining: extracting or “mining” (discover) knowledge from large amounts of data.

7

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

2. Security Intelligence Concept (1/3)

(Cisco Systems contribution is great!)

Security Intelligence (SI) goal: to provide proactive, predictive (forward-looking), actionable and comprehensive protection and insight into IS that reduces IS risks and operational effort for any organization through advanced analytics.

SOC is the IS management (ISM) eyes Prague, 21 August 2017

BigR&I-2017

Security Operations Center (SOC): a centralized unit that deals with security issues and detects, analyzes, responds to, reports on and prevents IS incidents + a team primarily composed of security analysts

To have a real-time view in a concrete context of what is happening right now to find smth unusual, to automate to the limit all routine operations and IS incident response that do not require expert’s decision-making, to continually stay ahead of the intruders are the urgent needs for any organization to set up more advanced IS management center than a traditional SOC. SOC

8

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

2. Security Intelligence Concept (2/3)

Objective Architecture Data sources Number of devices managed Events per second Storage Analytics

End users Breach response

Major limitations

Perimeter Defense 2000-2004

Logging and Compliance Security Intelligence 2005-2009 2010-present Real-time detection, log Log management, application/user activity Perimeter defense, log consolidation, deeper reporting monitoring, IS threat detection, IS risk consolidation and correlation and analytics, forensics management, compliance Distributed architecture with Single console and deeper network flow Integrated IS management solution, maturing of log management and analytics, less intrusive and separated deeply embedded into systems IS analytics from data center Small numbers of sources All relevant IS data across the entire Larger variety of log data sources supported out of the box organization’s intranet Dozens to hundreds

Hundreds to thousands

1,000 to 5,000 Hundreds of gigabytes

10,000 + Terabytes

Unlimited, based on unique scaling requirements of each deployment

Advanced analytics including all network events (VPN, IDPS, etc.), network and application context, user data via IAMSs Perimeter security team (web IT security and compliance teams, IT security and compliance teams services) operators, auditors, analysts Slow (can take years to discover), Faster (often takes months or Real-time/near-real-time discovery of manual gathering of data and years to discover), but limited breaches, often with same-day device info analytics prevent quick response remediation Limited log file formats; small Performance issues with large Standards governing bodies not yet number of supported devices; not data sets; limited data (log only) formed and integration with third-party scalable, manual analysis; many analytics; false positives/ products/sources still labor intensive false positives/ negatives negatives Event filtering, basic event correlation

Advanced correlation, analytics limited by data type (log only)

BigR&I-2017

Timeframe

The evolution of IS ensuring approaches

Prague, 21 August 2017

9

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

1) 24x7 security coverage, combining local monitoring observations, external SI and internal threat intelligence, and continuously recorded history in one place without requiring full-time staffing; 2) A holistic approach with Defense-in-Depth (Castle) strategies, meaning that an organization is looking at every aspect of its IS threat management in relation to every other aspect, views IS as more than a matter of mitigating IS risk; 3) Proactive and predictive monitoring of IS threats, based on predefined meaningful IS metrics for making faster, more-informed, smarter decisions through real-time integration; 4) Alignment of IS risk ranking and management with business needs, based on Business Impact Analysis; 5) Build-in IS risk framework, vulnerability assessment, patch management, audit , etc. functions; 6) Better understanding of an organization’s overall exposure supported by cross-channel visibility in a single view with comprehensive reporting dashboards and entity link analysis to reveal hidden relationships and suspicious associations among users, accounts or other entities early in their life cycles; 7) Advanced context-based analytics meaning the ability to automatically correlate observed applications, hosts and users activity, their geo-location, network traffic telemetry, white/black listing, events, etc. with system, application, network, server, IPT and other logs with patterns and trends in a consistent way; 8) Behavioral-based cross-correlation that triggers priority alerts and automated responses based on IS risk scores tied to specific services and combinations of events or thresholds of changes in these indicators; 9) Baseline-driven anomaly detection based on atypical actions; 10) Increasing efficiency (by reducing costs and complexity of IS incident response and improving attack detection accuracy by instantly understanding the entire attack kill chain) via launching a unified defense against IS threats based on a centralized case management, a common repository for cross-channel data; etc. Prague, 21 August 2017

BigR&I-2017

2. Security Intelligence Concept (3/3) SI advantages:

10

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

3. Security Intelligence Centers (1/4)

SIC is the ISM brain with wide-open eyes

BigR&I-2017

Security Intelligence Center (SIC) with an integrated IS architecture and SIEM 2.0 provides full visibility and control and context-driven SI in one place to temporarily deal with network- and more important higher-level IS events (exists since 2011)

Prague, 21 August 2017

11

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

3. Security Intelligence Centers (2/4) • Data collection capabilities, fully integrated and centralized log and event management with scale-out architecture for complete situational awareness and robust unified knowledge base, including relevant internal historic data and data from external sources • Big IS-related data filtering and processing from a viewpoint of any applicable attack to find IS incident, its source, consider its type, weight its consequences, visualize its vector, highlight network areas and services being at high risk, associate all the target systems, prioritize countermeasures, offer mitigation solutions with weighted impact relevance • Scalable, multi-dimensional, content-specific centralization and aggregation of data from disparate silos with a single point of reference which is distributed in order to allow for scalability, high availability and ensuing advanced correlation, normalization, categorization, pattern recognition and analysis by the 2nd-generation security information and event management (SIEM) systems in a separate interconnected network environment, maintaining the recorded relationships of every network connection, file execution and modification, registry modification and so on • Network visibility and advanced threat detection of IDPSs which are run over the big data store as a kind of analytics for network behavior anomaly detection • Logical group analytics applied to services and resources for issue prioritization, continuous asset profiling, impact analysis • IS risk management over initial big IS-related data reducing a number of IS incidents and ensuring compliance • Predictive, actionable, intelligent, behavior- and process-driven response to all IS events (keeping in mind the respective requirements identified above) and further IS incident handling (detection, alerting, reporting, response, escalation management) • Identification, tracking and recovery of the intranet’s assets to be protected after different IS incidents • Vulnerability scanning followed by device configuration and patch management • Automated behavioral white-listing and statistical base-lining, deep packet and flow inspection and application content insight afforded by next-generation FWs and forensic tools with fully-recognized digital evidences • Automated compliance and audit assurance for broad IS controls verification and IS policies violations • IS knowledge management, promoting an integrated approach to its identifying, capturing, evaluating, retrieving and sharing. Prague, 21 August 2017

BigR&I-2017

Combine a number of technologies, based on the big IS-related data processing:

12

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

3. Security Intelligence Centers (3/4)

BigR&I-2017

Security Intelligence usage in SIC operation for big IS-related data processing

Prague, 21 August 2017

13

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

3. Security Intelligence Centers (4/4)

 Mining data streams involves the efficient discovery of general patterns and dynamic changes within stream data. Intelligent analysis methods and tools discover and extract potentially useful data patterns. In case of IS-related data stream processing multilevel, multidimensional on-line analysis and mining are performed on stream data.  The security analysts are interested in higher and multiple levels of abstraction to predict or detect IS events. Pattern evaluation identifies the truly interesting and useful patterns, using given validation measures.

 Knowledge representation as the final phase of the IS-related knowledge-discovery process is necessary for visualization in order to detect IS incidents and threats, to respond to them and s well as predict possible IS risks. Prague, 21 August 2017

BigR&I-2017

 Preprocessing: o Data cleaning/cleansing removes noise, irrelevant and inconsistent data from the collection. Data for cleaning are retrieved from various interconnected and autonomous sources. o Data integration combines data from multiple and heterogeneous sources. o Relevant to the analysis task, data selection allows to obtain a reduced representation of the data set to keep the integrity of the original data set in a reduced volume. o During data transformation the selected data are transformed or consolidated into suitable formats appropriate for mining.

14

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

4. SIC Business Logic (1/5)

SICs are the next evolution step after SOCs. SOC business logic, containing five operational layers, was described by R.Bidou: 1) IS event-based and status-based message generators (like application, servers, services, 2) IS-related data acquisition layer on the basis of SIEM with IS event collectors (C Boxes) 3) Formatted and aggregated message database (D Box) 4) IS incident analysis engine (A Box) and knowledge base (K Box) including attack characteristics/signatures, protected intranet model, IS policies, IS incident history, etc. 5) IS incident reporting and response management software (R Box) Prague, 21 August 2017

BigR&I-2017

perimeter and boundary points, internal resources, etc.) (depicted onwards as E Boxes)

15

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

4. SIC Business Logic (2/5)

ND – a network device AP – an application DS – a data stream Prague, 21 August 2017

BigR&I-2017

Based on the analysis of SIC’s objectives and outcomes we proposed to extend the picture by two additional types of boxes

16

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

4. SIC Business Logic (3/5)

 Periodical collection of configuration files (configs) from IPTs and their storage in K Box, using standard protocols (Telnet/SSH/HTTP/HTTPS) by sending the “show configuration” command to IPT. The time period is unique for each organization and/or IPT and depends on the IS requirements, network load, IPT’s…  IPTs configuration change tracking in two ways (or their combination): 1) the comparison of 2 latest config versions and search for non-matching lines. If found – U Box checks the criticality of this IS incident in K Box and, if the change is unauthorized, generates a message and writes it to D Box; 2) any IPT itself can usually generate a message that adds a rule to configs; then it comes to D Box via E and C Boxes  Managing IPTs config versioning. U Box keeps links to different config versions in K Box, and when necessary (for file recovery, rollback to a previous version, IS policies/operating mode change) follows the appropriate link to the desired file, and then applies it to the IPT. U Box also detects some discrepancies and manage them  IPTs access accounts storage. To collect the configs, U Box should keep IPTs access accounts having minimum privileges necessary to make changes in IPTs’ configurations  Obtaining a set of commands from K Box or directly through the U Box’s interface for alteration of IPTs setups.

U Box: a service with the ability to initiate the execution of some commands by itself. This would then map to an automated way to responding to IS incidents. IPT configuration change also maps to changes in the monitoring mechanism.

Prague, 21 August 2017

BigR&I-2017

U Box’s Box goal is to manage IPTs with an appropriate interface to enter commands and a mechanism to monitor the changes, including the following:

17

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

4. SIC Business Logic (4/5) TPI Box: Box manages IS trends (on attacks, vulnerabilities, IS threats and incidents), predictions (in a particular environment based on IS trends and its historical data) + IS information interchange and sharing with another partner organizations or unified databases on the international scale (such as trusted external threat intelligence sources).

Successfully testing of our approach: the NRNU MEPhI’s intranet subnetwork. The system kernel is a software module in Visual Basic, which implements the described logic and connects together all the system elements (authentication module, software parameter setting module, user interface, data storage, IPTs config module, analytical module, etc.).

Prague, 21 August 2017

BigR&I-2017

TPI Boxes is connected with R and K boxes directly and U and A boxes indirectly: 1) Directly with R Boxes for reaction activation (in SICs these actions can be both reactive (as in SOCs) and proactive), and via them with U Boxes for rapid IPTs’ reconfiguring 2) Directly with K boxes for mining data and creating new knowledge, and via them with A boxes for more comprehensive analysis

18

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

4. SIC Business Logic (5/5) Plans:  To introduce a multilevel hierarchy of D Boxes  To implement the processes of managing the data storage lifecycle derived from the C Boxes

BigR&I-2017

Key benefits of this improvement:  Reducing the DB query processing time due to the smaller amount of data at the lower level (there are actual data for operative work with a short period of life)  Ensuring compliance with regulatory requirements by archiving critical data to cheaper storage media and automating the data rotation process In the long-term perspective: To process another mentioned above IS-related data on intranet IS events, not only IPTs’ configs.

Prague, 21 August 2017

19

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

5. SIC’s Data Architecture (1/2) For big IS-related data processing in SICs we proposed to use “data lakes” (DLs): a massively scalable storage repository that holds a vast amount of raw data in its native format (“as is”) until it is needed + processing systems (engine) that can ingest data without compromising its structure. DLs are typically built to handle large and quickly arriving volumes of unstructured data (in contrast to data warehouses processing highly structured data).

BigR&I-2017

DLs use dynamic (not pre-build static) analytical applications, which can exploit static, dynamic or both types of data. The data in DLs becomes accessible as soon as it is created (in contrast to data warehouses designed for slowly changing data) – very important for real-time IS decision-making!

This simplified architecture = D Boxes + other boxes on their top + K Boxes + integration with the rest of the organization’s IT infrastructure. Prague, 21 August 2017

20

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

5. SIC’s Data Architecture (2/2) The SIC’s DLs should be well-managed and protected, have scale-out architectures with high availability, centralized cataloging and indexing, shared-access model from any permitted modern device, use agile analytics and advanced data lineage (tracking).

All IS-related data can be regarded as time-sensitive structured & unstructured “in-flight” fast data (FD) as it requires immediate processing in order to infer respective events that can lead to the activation of applicable IS controls. FD should be gathered and acted upon right away (with low latency and processing of data streams at speed) = application of big data analytics to smaller data sets in near-real/real-time to solve a particular task. FD requires a streaming system capable of delivering events as fast as they come in and a data store capable of processing each item as fast as it arrives. Technologies for real-time analytics in SICs: processing in memory/in database , in-memory analytics, massively parallel programming and others. Prague, 21 August 2017

BigR&I-2017

The data going into DLs contain logs, sensor data (e.g., from the IoT), low-level customer behavior (e.g., Website click streams), social media, document collections of (e.g., e-mail and customer files), geo-location trails, images, video and audio and another data useful for integrated analysis.

21

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

Conclusion To empower the autonomy of network security management within one organization and to deepen its knowledge of the computing environment, our research is aimed at uniting all the advantages of a SIC and a Network Operations Center (NOC) with their unique and joint toolkits and techniques in

a unified Network Security Intelligence Center (NSIC) NOC and SIC perfectly complement each other =>

The NSIC’s key objective: to move SI to organizations’ NOCs, allowing them to stay ahead of IS challenges while being fully integrated around their main business processes.

 Changes the security model from reactive to proactive  Supports more effective responses to IS incidents  Enhances communications between the network and security teams, management and board members  Drives IS investment strategies  More directly connects IS priorities with business risk management priorities, etc. The research in this area has just begun Prague, 21 August 2017

BigR&I-2017

NSIC

22

SECURITY INTELLIGENCE CENTERS FOR BIG DATA PROCESSING

SIC is the ISM brain with wide-open eyes

NSIC is the ISM big rapid brain with wide-open eyes and skillful hands

Natalia Miloslavskaya [email protected] Prague, 21 August 2017

BigR&I-2017

SOC is the ISM eyes

23