Introduction to Sqrrl and Accumulo. 2. Security In The Wild. 3. Sqrrl and Accumulo
Technology. 4. The Data-Centric Security Ecosystem. In our September ...
Securely explore your data
DATA MODELING AND INDEXING FOR APACHE ACCUMULO Sqrrl Webinar Series October, 2013 Adam Fuchs, CTO Sqrrl Data, Inc.
RECAP In our September Webinar: Sqrrl, Apache Accumulo, and Cell-Level Security
1. 2. 3. 4.
Introduction to Sqrrl and Accumulo Security In The Wild Sqrrl and Accumulo Technology The Data-Centric Security Ecosystem
Sqrrl Data, Inc. Confiden3al and Proprietary
2
TODAY’S DISCUSSION Data Modeling and Indexing for Apache Accumulo
1. Sqrrl and Accumulo Technology Review 2. Table Designs 1. Dynamic Documents 2. Graphs 3. Inverted Indexes
3. Putting It All Together with Sqrrl
Sqrrl Data, Inc. Confiden3al and Proprietary
3
LAYERED ARCHITECTURE Turtles all the way down...
Sqrrl API over Apache Thri8 RPC (JSON, Graph, Aggrega=on, Search, etc.)
Sqrrl Enterprise Accumulo RPC (Sorted Key/Value I/O)
Application Hadoop RPC (File I/O)
Sqrrl Data, Inc. Confiden3al and Proprietary
4
ACCUMULO DATA FORMAT An Accumulo key is a 5-tuple, consisting of: " " " " "
Row: Controls Atomicity Column Family: Controls Locality Column Qualifier: Controls Uniqueness Visibility Label: Controls Access Timestamp: Controls Versioning
Row
Col. Fam.
Col. Qual.
Visibility
Timestamp
Value
John Doe
Notes
PCP
PCP_JD
20120912
Patient suffers from an acute …
John Doe
Test Results
Cholesterol
JD|PCP_JD
20120912
183
John Doe
Test Results
Mental Health
JD|PSYCH_JD
20120801
Pass
John Doe
Test Results
X-Ray
JD|PHYS_JD
20120513
1010110110100…
Accumulo Key/Value Example Sqrrl Data, Inc. Confiden3al and Proprietary
5
THE ACCUMULO CLIENT API new ZooKeeperInstance(...)
Instance
new MockInstance()
getConnector(...)
Range
Connector
IteratorOption
TableOperations InstanceOperations
createScanner(...) createBatchScanner(...)
createBatchWriter(...)
SecurityOperations Scanner
BatchScanner
BatchWriter
iterator() addMuta3on(...)
Map.Entry Key Sqrrl Data, Inc. Confiden3al and Proprietary
Mutation
Value 6
ACCUMULO TECHNOLOGY Strengths • Shared-Nothing => Scalability • Micro-Batching for Efficient Random I/O • High Concurrency, Low Latency for Denormalized Data • Sparse, Flexible Schema supports dynamic and diverse data models • Cell-level Security promotes sharing Weaknesses • Sorting induces write multiplication factor • Sparse schema support induces additional storage overhead Sqrrl Data, Inc. Confiden3al and Proprietary
Tablet Data Flow Scan Writes
In-‐Memory Map
Minor Compac0 on
Sorted, Indexed File Write Ahead Log (For Recovery)
Iterator Tree
Iterator Tree
Reads
Sorted, Indexed File
Merging / Major Compac0on
Iterator Tree
Sorted, Indexed File
Zookeeper Delegate Authority Tablet Server Zookeeper Zookeeper
Tablet
Delegate Authority Assign/Balance
Tablet Server
Master Tablet Store/Replicate
HDFS
Tablet Server
Read/Write
Applica3on
Applica3on
Applica3on
Tablet
7
TODAY’S DISCUSSION Data Modeling and Indexing for Apache Accumulo
1. Sqrrl and Accumulo Technology Review 2. Table Designs 1. Dynamic Documents 2. Graphs 3. Inverted Indexes
3. Putting It All Together with Sqrrl
Sqrrl Data, Inc. Confiden3al and Proprietary
8
PROXY/NETFLOW EXAMPLE Source
Destination
Port
Bytes In
Bytes Out
Protocol
10.1.2.3 google.com
80
73,824
15,632
http
10.1.2.4 facebook.com
443
10,328
13,284,129
https
10.1.2.4 google.com
80
623,249
93,125
http
10.1.2.3 abcd1234.ru
3133 7
158
523,698,104
unknown
10.1.2.3 netflix.com
443
434,855,357 1,392,994
https
10.1.2.4 google.com
443
23,084
583,331
https
10.1.2.3 10.1.2.5
22
204
158
ssh
Sqrrl Data, Inc. Confiden3al and Proprietary
9
INDEXES AND QFDS
Input
Indexes QuestionFocused Datasets
Sqrrl Data, Inc. Confiden3al and Proprietary
Transformation
Logs/ Observations
• Immutable • Append-‐Only • • • • •
Real-‐Time Online Sorted Grouped Aggregated
10
QFD KEY GENERATION Source
Destination
10.1.2.3 google.com
Port
Bytes In
Bytes Out
Protocol
80
73,824
15,632
http
Key -‐> Value 10.1.2.3, Bytes In -‐> +73,824 10.1.2.3, Bytes Out -‐> +15,632 10.1.2.3, Ports Used -‐> +{80} 10.1.2.3, Protocols Used -‐> +{hap}
Hosts QFD 0x00 . . . 0xFF
Sqrrl Data, Inc. Confiden3al and Proprietary
11
HOSTS QFD WITH AGGREGATION IP
Ports Used
10.1.2.3
Protos Used
Total Bytes In
Total Bytes Out
Ports Protos Hosted Hosted
{22, 80, {http, 443, https, ssh, 31337} unknown}
434,931,543
525,106,888
-
-
10.1.2.4
{80, 443}
{http, https}
656,661
13,960,585
-
-
10.1.2.5
-
-
158 158 +3,215 3,373
204
{22}
{ssh}
New Contribu3on: (10.1.2.5, Total Bytes In -‐> +3,215)
Sqrrl Data, Inc. Confiden3al and Proprietary
12
CONNECTIVITY GRAPH 10.1.2.4
google.com facebook.co m
10.1.2.3
abcd1234.ru 10.1.2.5
Row
Col. Fam.
Col. Qual.
netflix.com
Val.
Row
Col. Fam.
Col. Qual.
Val
10.1.2.3
Contacts
10.1.2.5
-
10.1.2.5
Serves
10.1.2.3
-
10.1.2.3
Contacts
abcd1234.ru
-
abcd1234.ru
Serves
10.1.2.3
-
10.1.2.3
Contacts
google.com
-
facebook.com
Serves
10.1.2.4
-
10.1.2.3
Contacts
netflix.com
-
google.com
Serves
10.1.2.3
-
10.1.2.4
Contacts
facebook.com
-
google.com
Serves
10.1.2.4
-
10.1.2.4
Contacts
google.com
-
netflix.com
Serves
10.1.2.3
-
Sqrrl Data, Inc. Confiden3al and Proprietary
13
INVERTED INDEXING Table:
Forward Index
Inverted Index
Row:
Column Family:
Column Qualifier:
Value:
Sqrrl Data, Inc. Confiden3al and Proprietary
14
INVERTED INDEXING
Sqrrl Data, Inc. Confiden3al and Proprietary
15
ADVANCED INDEXING Table:
Shard Table
Row:
Column Family:
Column Qualifier (Tuples):
Value:
Sqrrl Data, Inc. Confiden3al and Proprietary
“Docs”
“Inv. Index” “Field Index”
“Geo”
16
TODAY’S DISCUSSION Data Modeling and Indexing for Apache Accumulo
1. Sqrrl and Accumulo Technology Review 2. Table Designs 1. Dynamic Documents 2. Graphs 3. Inverted Indexes
3. Putting It All Together with Sqrrl
Sqrrl Data, Inc. Confiden3al and Proprietary
17
SQRRL ENTERPRISE
Simple API for Advanced Accumulo Usage •
Dynamic Documents • •
•
Dynamic Graphs •
•
Co-partitioned with Documents for Integrated Search and Discovery
Search • •
•
JSON I/O support Cell-level Security and Efficient Aggregation Extensions
Lucene Query Syntax Accumulo Indexes Preserve Security Model
Processing • •
SQL-Like Language for Transforming and Aggregating Results Parallel Slicing and Extraction
Sqrrl Data, Inc. Confiden3al and Proprietary
18
REAL-TIME OPERATIONAL APPS Contact us for a demo
Sqrrl Data, Inc. Confiden3al and Proprietary
19
HOW TO LEARN MORE Download our White Paper " www.sqrrl.com/whitepaper
Watch a video " www.sqrrl.com/downloads#videos
Request a demo or one-on-one workshop " www.sqrrl.com/contact
Come meet us " " " "
Accumulo Meetup (October 28, New York) Strata + Hadoop World (October 28-30, New York) IBM IOD (November 4-7, Las Vegas) SC13 (November 18-21, Denver)
Sqrrl Data, Inc. Confiden3al and Proprietary
20
THANK YOU Thanks for attending! To keep up to date with Sqrrl, check out or social media sites: www.twitter.com/sqrrl_inc www.linkedin.com/company/sqrrl
Sqrrl Data, Inc. Confiden3al and Proprietary
21