Grib2 ocnsst.l.gdas.198401.grb2 ocnsst.l.gdas.198402.grb2. MERRA. Day,. 1..24 hrs. 1..*. HDF4 .... Supports import from CSV files. 2. Requires software ...
* This work was partially supported by Russian Foundation for Basic Research (grant #16-37-00416).
In-situ processing of big raster data with command line tools*
Russian Supercomputing Days 2016, 26–27 Sep., Moscow
Sources of raster data • •
• • •
2
Challenges in managing raster data • • •
• • • •
3
Array or Raster DBMS Manages multidimensional raster data
http://www.narccap.ucar.edu/users/user-meeting-08/handout/netcdf-diagram.png
4
2 key approaches In-situ processing: needs new techniques, algorithms – Diverse file formats
File
No import
Read Process
Import, then process: easier process data in single format Single format
File
convert
Import
Time-consuming, error-prone
DB Read
Process
Reason for in-situ approach: powerful raster file formats This file-centric model resulted in a broad set of raster file formats highly optimized for a particular purpose and subject domain. Example: GeoTIFF represents an effort by over 160 different … companies and organizations to establish interchange format for georeferenced raster imagery http://trac.osgeo.org/geotiff/
Some raster file formats support chunking, compression, multidimensional arrays, data types, hierarchical namespaces, metadata
6
Command line tools Along with raster file formats, decades of development and feedback resulted in numerous feature-rich, elaborate, free and quality-assured tools optimized mostly for a single machine. Example: NetCDF common operators (NCO) – tools for multidimensional arrays in NetCDF format • in development since about 1995 • take advantage of multicore CPUs via OpenMP • work on a single machine ncatted – NetCDF Attribute Editor ncap2 – NetCDF Arithmetic Processor ncks – NetCDF “Kitchen Sink” …
Data processing: new delegation approach (this work) Delegate in-situ processing to existing command line tools
File
+ + + + +
Read External exe
– Diverse file formats – Diverse tools No import Powerful storage Direct exe call Rich functionality Optimized algorithms
It is not the same as streaming File
convert
Import
DB
Two timeconsuming phases
convert
Export External exe
State-of-the-art in-situ
delegate
distributed
1st release
YES
Direct
YES
?
SciDB
NO
Streaming
YES
~ 2008**
Oracle Spatial*
NO
NO
YES
< 2005
ArcGIS IS*
YES
NO
NO****
> 2000
RasDaMan
YES/NO***
NO
YES/NO***
~ 1999
MonetDB SciQL
NO
NO
NO
not finished
Intel TileDB
NO
NO
NO
04 04 2016
ChronosServer
SciDB is the only free and distributed array DBMS available for comparison * Commercial ** Now (in 2016, since 8 years) it still has very limited set of operations *** YES in payed, enterprise version; no performance evaluation ever published **** Data the same on each server or retrieved from centralized storage
10
SciDB
SciDB – Scientific DB
NoSQL AQL – Array Query Language AFL – Array Functional Language
https://en.wikipedia.org/wiki/Michael_Stonebraker
Distributed General-purpose multidimensional array DBMS
11
ChronosServer
12
1000s of files
Complex naming, diverse coordinate systems, formats, data types, … Product AMIP/DOE Reanalysis 2 MODIS L3 Atmosphere CFSR MERRA
Period Year, 6 hours Day, Day Month, 1 hour Day, 1..24 hrs
Aura satellite, Day, OMI radiometer Day
Datasets Format 1 > 600 1 1..* 14
File name example uwnd.10m.gauss.1979.nc NetCDF uwnd.10m.gauss.1980.nc MOD08_D3.A2000061.051.2010273 HDF4 210218.hdf ocnsst.l.gdas.198401.grb2 Grib2 ocnsst.l.gdas.198402.grb2 MERRA200.prod.assim.tavg1_2d_lnd HDF4 _Nx.20000718.hdf OMI-Aura_L3OMSO2e_2004m1001_v003HDF5 011m0526t144250.he5
ChronosServer: hierarchical dataset namespace Reason: too many datasets
Separate groups with dots: r2.wind.10m.v r2.pressure.msl …
http://wikience.org
SELECT DATA FROM r2.wind.10m.u WHERE TIME = 01.01.2000 00:00
ChronosServer abstraction layers Time series of grids
▲ User view Reality ► N files of diverse formats, naming on K cluster nodes, replicated
ChronosServer command language (NoSQL) • Command names = names of command line tools • Command syntax = command line (key-value pairs) • Command options = • subset of tool’s command line options (no paths and files options) • ChronosServer specific options • File names dataset names • Commands are modified and submitted to OS shell ChronosServer command ncap2 -D 4 -O -alias u,r2.wind.10m.u -alias v,r2.wind.10m.v -alias ws,r2.wind.10m.uv.ws -s "$(ws)=sqrt($(u)*$(u) + $(v)*$(v));“
Command submitted to OS shell /usr/local/bin/ncap2 -D 4 -O -v -s “ws=sqrt(uwnd*uwnd + vwnd*vwnd);”
Distributed execution of a single raster processing operation Cluster nodes
5
6 5
Send command parameters
7
Collect intermediate results on a cluster node, calculate final result
Gate
1
Интернет Internet
1 Client
7 6
•Parse command Instructions 2 •Malicious check 5 •Modify command Data Find nodes 3 with data 1 Send command, e.g. find max Choose 4 nodes ncap2 -alias u,r2.wind.10m.u -alias umax,r2.wnd.umax Launch external tool on each -s "$(umax)=$(u).max($time)" file from a given sample
Benefits of new delegation approach • Avoid learning new language well-known command line syntax VS new SQL dialect • Steep learning curve use familiar functionality of known command line tools • Documentation reuse tool’s documentation (most) = docs of the respective command • Output conformance output files are formatted as if a tool was launched manually • Language independence use tools written in any programming language • Community support bug fix, new functionality, usage suggestions via mail lists • Zero-knowledge development (0-know dev.) know nothing about ChronosServer in order to develop a tool
18
ChronosServer and SciDB performance comparison
Test data: northward and eastward wind speed • 6-Hourly Forecast of U- and Vwind at 10 m • 4-times Daily Values • Gaussian grid 94 × 192
• NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) • 1979 – 2015 (≈ 54 020 time steps) • 3.63 GB NetCDF3 files in total
Consumers example Choose location for a wind farm (retrospective data) and optimize its operation (forecast data) For calendar year 2014, the electricity produced from wind power in the United States amounted to 181.79 terawatt-hours, or 4.44% of all generated electrical energy. https://en.wikipedia.org/wiki/Wind_power_in_the_United_States
Experimental setup (1) 1 machine, but results are representative (linear scalability) Reason: SciDB (complex cluster deploy + unable to import large data volumes in a reasonable time frame, next slides) Ubuntu 14.04, VirtualBox on Windows 10 • 2 cores, Intel Core i5-3210M, 2.50 GHz • 4 GB RAM • SSD OCZ vertex 4 sudo hdparm -Tt /dev/sda /dev/sda: Timing cached reads: 9140 MB in 2.00 seconds = 4573.28 MB/sec Timing buffered disk reads: 668 MB in 3.01 seconds = 222.04 MB/sec
Experimental setup (2) ChronosServer • 100% Java • Java 1.7.0_75 • OpenJDK IcedTea 2.6.4, 64 bit • -Xmx 978 MB (max heap size) • 1 gate, 1 worker
NetCDF Operators • • • •
C++ v4.6.0 (May, 2016) OpenMP SMP threading no run-time optimizations
SciDB • v15.12, latest (Apr 2015), C++ Recommended parameters • 4 SciDB instances • 0 redundancy • 4 execution and prefetch threads • 1 prefetch queue size • 1 operator threads • 128 MB array cache • etc.
Experimental setup (3) Cold and hot execution: run the same query several times (use case: repeated experiments with the same data) • C1, C2, C3 – query execution for the first time • H2, H3 – query execution for the second and third time
Cold time = (C1 + C2 + C3) / 3 Hot time = (C1 + H2 + H3) / 3 Free pagecache, dentries and inodes before each C1, C2, C3 free && sync && echo 3 > /proc/sys/vm/drop_caches && free
Unlike ChronosServer, SciDB hot & cold runtimes are the same http://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system
SciDB data import 1. No out-of-the-box import tool – Supports import from CSV files
2. Requires software development for data import – Took 3 weeks to develop and debug Java self-crafted SciDB import tool
3. Import procedure looks like this (very simplified): Open NetCDF file Read metadata (array shape, etc.) Create corresponding SciDB arrays to add data into while (t < time_max) { read 2D array for time t convert array to CSV string save CSV string to CSV file feed CSV file to SciDB t++ }
4. Significant manual intervention, error-prone, slow (next slides)
Initialization phase
SciDB data import • only U-wind @ 10 m • 50.2 MB in NetCDF3 • only single 1979 year: 1460 time steps ≈ 12 minutes 36x slower
Chronos data discovery • 803 datasets • 6.78 GB in NetCDF3,4; HDF4,5; Grib2 • Satellite & climate reanalysis products* ≈ 20 seconds, cold (8 sec., hot)
CODING & DEBUGGING TIME NOT TAKEN INTO ACCOUNT
New data: just copy files on computer, register dataset in XML
Estimate: 14.76 hours for Uand V-wind 1979 – 2015
Estimate: linear scalability on number of files * «Data» top menu at http://www.wikience.org
Data import lesson
Can’t import large data volumes into SciDB in a reasonable time frame. SciDB performance evaluation is further evaluated only on U- and V-wind for 1979 year.
Test: simple statistics Result: single 94 × 192 grid, each cell – max for 1979 year Execution time, seconds Operation
SciDB
ChronosServer
Ratio, SciDB / ChronosServer
Cold
Hot
Cold
Hot
Max
13.46
4.43
3.10
3.04
4.34
Min
12.87
4.71
3.33
2.73
3.86
Average
21.42
4.71
3.23
4.55
6.63
Despite SciDB operates on its internal storage Both are NoSQL systems SciDB: store(aggregate(r2_u10m, max(value), lat, lon), r2_u10m_max); Chronos: ncap2 -s "$(r2.wind.10m.uv.umax)=$(r2.wind.10m.u).max($time)"
ChronosServer benefits from native OS caching during hot mode
Test: user-defined expressions • Originally, a climate reanalysis provides wind speed separately for Eastward (u) and Northward (v) directions. These are two different vectors. • However, most applications prefer to have wind speed (ws) and wind direction (wd, azimuth) values instead. • Test: calculate wind speed from U- and V- values.
v
𝑤𝑠 =
𝑢2 + 𝑣 2
v ws
u
u
More U and V details are at http://www.wikience.org/documentation/wind-speed-and-direction-tutorial/
SciDB
Wind speed calculation – which syntax is cleaner & easier?
iquery -a -n --query "store( project( apply( join(r2_u10m, r2_v10m), ws, float(sqrt(r2_u10m.value * r2_u10m.value + r2_v10m.value * r2_v10m.value)) ), ws), r2_ws10m);"
ChronosServer ncap2 -alias u,r2.wind.10m.u -alias v,r2.wind.10m.v -alias ws,r2.wind.10m.uv.ws -s "$(ws)=sqrt($(u)*$(u) + $(v)*$(v));"
Wind speed calculation Result: 1460 × 94 × 192 grid (time × lat × lon) Each cell is wind speed value
Execution time, seconds Operation Wind speed calc.
SciDB 25.75
ChronosServer
Ratio, SciDB / ChronosServer
Cold
Hot
Cold
Hot
3.50
2.10
7.36
12.26
• ncap2 runs in 1 thread (no OpenMP) • scidb runs 4 instances
Chunking: row-major disk layout, read 6×1 slice no chunking * 12 × 12 raster
2 × 2 chunks one chunk 6 × 6
4 × 4 chunks one chunk 3 × 3
read 1 × 2 portions 6 storage requests (a)
read 2 chunks 50% of all data (b)
read 2 chunks 12.5% of data (c)
* the whole array is a single chunk ** reads may also involve uncompressing data Note, that SSD is not the solution
Optimal chunk shape Query patterns: always read whole array
• • • • •
random access
Read time series for single point (3D array)
Chunk shape is crucial performance parameter Chunk shape depends on data and workload Optimal chunk shape may not exist for all access patterns Difficult to guess good chunk shape a priori Solution: lots of tuning and experimentation Raster DBMS must be able to quickly alter chunk shape
Alter chunk size – query syntax SciDB iquery -a -n --query "store( redimension(r2_u10m, [time=0:*,10,0, lat=0:93,10,0, lon=0:191,8,0]), r2_u10m_10x10x8);"
ChronosServer ncks -4 --cnk_map dmn --cnk_plc g2d --cnk_dmn time,10 --cnk_dmn lat,10 --cnk_dmn lon,8 r2.wind.u10m.u r2.wind.u10m_ch10x10x8
For SciDB, this is the fastest way: http://forum.paradigm4.com/t/fastestway-to-alter-chunk-size/
Chunk sizes are in red
Alter chunk size – results Execution time, seconds
Operation
SciDB
ChronosServer
Ratio, SciDB / ChronosServer
Cold
Hot
Cold
Hot 150.24
Chunk 100×20×16
56.19
1.68
0.374
33.45
Chunk 10×10×8
222.11
1.98
1.15
112.18 193.14
• ncks runs 2 threads (OpenMP) • scidb runs 4 instances Hot mode is important: re-chunking the same data is quite often during chunk size tuning
Summary table Execution time, seconds Operation
SciDB
ChronosServer
Ratio, SciDB / ChronosServer
Cold
Hot
Cold
Hot
Data import
720.13
19.82
7.96
36.33
90.47
Max
13.46
4.43
3.10
3.04
4.34
Min
12.87
4.71
3.33
2.73
3.86
Average
21.42
4.71
3.23
4.55
6.63
Wind speed calc.
25.75
3.50
2.10
7.36
12.26
Chunk 100×20×16
56.19
1.68
0.374
33.45
150.24
Chunk 10×10×8
222.11
1.98
1.15
112.18 193.14
On average, ChronosServer is 3x to 193x faster SciDB
Appendix A. In-situ data processing benefits • Leverage powerful raster file formats – chunking, compression, multidimensional arrays, data types, hierarchical namespaces, metadata – re-implementation for an emerging in-db storage engine results in yet another raster file format
• Avoid conversion bottleneck – processing may take less time than data import – ability to process data before new data portion arrives
• Avoid additional space usage – most data owners never delete/modify source files
• Reduce DBMS dependence – easier to migrate to other DBMS
37
Appendix B. SciDB configuration [mydb] server-0=127.0.0.1,3 db_user=mydb install_root=/opt/scidb/15.12/install pluginsdir=/opt/scidb/15.12/install/lib/scidb/plugins logconf=/opt/scidb/15.12/install/share/scidb/log1.properties base-path=/home/scidb/scidb_data base-port=1239 interface=eth0 redundancy=0 security=trust execution-threads=4 result-prefetch-threads=4 result-prefetch-queue-size=1 operator-threads=1 mem-array-threshold=128 smgr-cache-size=128 merge-sort-buffer=64 sg-receive-queue-size=16 sg-send-queue-size=4
Appendix C. SciDB : no hierarchical namespace
scidb@scidb-vm:~$ iquery -a --query "list('arrays');" {No} name,uaid,aid,schema,availability,temporary {0} 'foo',17600,17600,'foo [i=0:93,94,0,j=0:191,192,0]',true,false {1} 'fooScaled',17602,17602,'fooScaled [i=0:93,94,0,j=0:191,192,0]',true,fals {2} 'IHI_ACCELEROMETER',1,1,'IHI_ACCELEROMETER