Document not found! Please try again

Command line tools

31 downloads 0 Views 2MB Size Report
Grib2 ocnsst.l.gdas.198401.grb2 ocnsst.l.gdas.198402.grb2. MERRA. Day,. 1..24 hrs. 1..*. HDF4 .... Supports import from CSV files. 2. Requires software ...
* This work was partially supported by Russian Foundation for Basic Research (grant #16-37-00416).

In-situ processing of big raster data with command line tools*

Russian Supercomputing Days 2016, 26–27 Sep., Moscow

Sources of raster data • •

• • •

2

Challenges in managing raster data • • •

• • • •

3

Array or Raster DBMS Manages multidimensional raster data

http://www.narccap.ucar.edu/users/user-meeting-08/handout/netcdf-diagram.png

4

2 key approaches In-situ processing: needs new techniques, algorithms – Diverse file formats

File

No import

Read Process

Import, then process: easier process data in single format Single format

File

convert

Import

Time-consuming, error-prone

DB Read

Process

Reason for in-situ approach: powerful raster file formats This file-centric model resulted in a broad set of raster file formats highly optimized for a particular purpose and subject domain. Example: GeoTIFF represents an effort by over 160 different … companies and organizations to establish interchange format for georeferenced raster imagery http://trac.osgeo.org/geotiff/

Some raster file formats support chunking, compression, multidimensional arrays, data types, hierarchical namespaces, metadata

6

Command line tools Along with raster file formats, decades of development and feedback resulted in numerous feature-rich, elaborate, free and quality-assured tools optimized mostly for a single machine. Example: NetCDF common operators (NCO) – tools for multidimensional arrays in NetCDF format • in development since about 1995 • take advantage of multicore CPUs via OpenMP • work on a single machine ncatted – NetCDF Attribute Editor ncap2 – NetCDF Arithmetic Processor ncks – NetCDF “Kitchen Sink” …

Data processing: new delegation approach (this work) Delegate in-situ processing to existing command line tools

File

+ + + + +

Read External exe

– Diverse file formats – Diverse tools No import Powerful storage Direct exe call Rich functionality Optimized algorithms

It is not the same as streaming File

convert

Import

DB

Two timeconsuming phases

convert

Export External exe

State-of-the-art in-situ

delegate

distributed

1st release

YES

Direct

YES

?

SciDB

NO

Streaming

YES

~ 2008**

Oracle Spatial*

NO

NO

YES

< 2005

ArcGIS IS*

YES

NO

NO****

> 2000

RasDaMan

YES/NO***

NO

YES/NO***

~ 1999

MonetDB SciQL

NO

NO

NO

not finished

Intel TileDB

NO

NO

NO

04 04 2016

ChronosServer

SciDB is the only free and distributed array DBMS available for comparison * Commercial ** Now (in 2016, since 8 years) it still has very limited set of operations *** YES in payed, enterprise version; no performance evaluation ever published **** Data the same on each server or retrieved from centralized storage

10

SciDB

SciDB – Scientific DB

NoSQL AQL – Array Query Language AFL – Array Functional Language

https://en.wikipedia.org/wiki/Michael_Stonebraker

Distributed General-purpose multidimensional array DBMS

11

ChronosServer

12

1000s of files

Complex naming, diverse coordinate systems, formats, data types, … Product AMIP/DOE Reanalysis 2 MODIS L3 Atmosphere CFSR MERRA

Period Year, 6 hours Day, Day Month, 1 hour Day, 1..24 hrs

Aura satellite, Day, OMI radiometer Day

Datasets Format 1 > 600 1 1..* 14

File name example uwnd.10m.gauss.1979.nc NetCDF uwnd.10m.gauss.1980.nc MOD08_D3.A2000061.051.2010273 HDF4 210218.hdf ocnsst.l.gdas.198401.grb2 Grib2 ocnsst.l.gdas.198402.grb2 MERRA200.prod.assim.tavg1_2d_lnd HDF4 _Nx.20000718.hdf OMI-Aura_L3OMSO2e_2004m1001_v003HDF5 011m0526t144250.he5

ChronosServer: hierarchical dataset namespace Reason: too many datasets

Separate groups with dots: r2.wind.10m.v r2.pressure.msl …

http://wikience.org

SELECT DATA FROM r2.wind.10m.u WHERE TIME = 01.01.2000 00:00

ChronosServer abstraction layers Time series of grids

▲ User view Reality ► N files of diverse formats, naming on K cluster nodes, replicated

ChronosServer command language (NoSQL) • Command names = names of command line tools • Command syntax = command line (key-value pairs) • Command options = • subset of tool’s command line options (no paths and files options) • ChronosServer specific options • File names  dataset names • Commands are modified and submitted to OS shell ChronosServer command ncap2 -D 4 -O -alias u,r2.wind.10m.u -alias v,r2.wind.10m.v -alias ws,r2.wind.10m.uv.ws -s "$(ws)=sqrt($(u)*$(u) + $(v)*$(v));“

Command submitted to OS shell /usr/local/bin/ncap2 -D 4 -O -v -s “ws=sqrt(uwnd*uwnd + vwnd*vwnd);”

Distributed execution of a single raster processing operation Cluster nodes

5

6 5

Send command parameters

7

Collect intermediate results on a cluster node, calculate final result

Gate

1

Интернет Internet

1 Client

7 6

•Parse command Instructions 2 •Malicious check 5 •Modify command Data Find nodes 3 with data 1 Send command, e.g. find max Choose 4 nodes ncap2 -alias u,r2.wind.10m.u -alias umax,r2.wnd.umax Launch external tool on each -s "$(umax)=$(u).max($time)" file from a given sample

Benefits of new delegation approach • Avoid learning new language well-known command line syntax VS new SQL dialect • Steep learning curve use familiar functionality of known command line tools • Documentation reuse tool’s documentation (most) = docs of the respective command • Output conformance output files are formatted as if a tool was launched manually • Language independence use tools written in any programming language • Community support bug fix, new functionality, usage suggestions via mail lists • Zero-knowledge development (0-know dev.) know nothing about ChronosServer in order to develop a tool

18

ChronosServer and SciDB performance comparison

Test data: northward and eastward wind speed • 6-Hourly Forecast of U- and Vwind at 10 m • 4-times Daily Values • Gaussian grid 94 × 192

• NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) • 1979 – 2015 (≈ 54 020 time steps) • 3.63 GB NetCDF3 files in total

Consumers example Choose location for a wind farm (retrospective data) and optimize its operation (forecast data) For calendar year 2014, the electricity produced from wind power in the United States amounted to 181.79 terawatt-hours, or 4.44% of all generated electrical energy. https://en.wikipedia.org/wiki/Wind_power_in_the_United_States

Experimental setup (1) 1 machine, but results are representative (linear scalability) Reason: SciDB (complex cluster deploy + unable to import large data volumes in a reasonable time frame, next slides) Ubuntu 14.04, VirtualBox on Windows 10 • 2 cores, Intel Core i5-3210M, 2.50 GHz • 4 GB RAM • SSD OCZ vertex 4 sudo hdparm -Tt /dev/sda /dev/sda: Timing cached reads: 9140 MB in 2.00 seconds = 4573.28 MB/sec Timing buffered disk reads: 668 MB in 3.01 seconds = 222.04 MB/sec

Experimental setup (2) ChronosServer • 100% Java • Java 1.7.0_75 • OpenJDK IcedTea 2.6.4, 64 bit • -Xmx 978 MB (max heap size) • 1 gate, 1 worker

NetCDF Operators • • • •

C++ v4.6.0 (May, 2016) OpenMP SMP threading no run-time optimizations

SciDB • v15.12, latest (Apr 2015), C++ Recommended parameters • 4 SciDB instances • 0 redundancy • 4 execution and prefetch threads • 1 prefetch queue size • 1 operator threads • 128 MB array cache • etc.

Experimental setup (3) Cold and hot execution: run the same query several times (use case: repeated experiments with the same data) • C1, C2, C3 – query execution for the first time • H2, H3 – query execution for the second and third time

Cold time = (C1 + C2 + C3) / 3 Hot time = (C1 + H2 + H3) / 3 Free pagecache, dentries and inodes before each C1, C2, C3 free && sync && echo 3 > /proc/sys/vm/drop_caches && free

Unlike ChronosServer, SciDB hot & cold runtimes are the same http://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system

SciDB data import 1. No out-of-the-box import tool – Supports import from CSV files

2. Requires software development for data import – Took 3 weeks to develop and debug Java self-crafted SciDB import tool

3. Import procedure looks like this (very simplified): Open NetCDF file Read metadata (array shape, etc.) Create corresponding SciDB arrays to add data into while (t < time_max) { read 2D array for time t convert array to CSV string save CSV string to CSV file feed CSV file to SciDB t++ }

4. Significant manual intervention, error-prone, slow (next slides)

Initialization phase

SciDB data import • only U-wind @ 10 m • 50.2 MB in NetCDF3 • only single 1979 year: 1460 time steps ≈ 12 minutes 36x slower

Chronos data discovery • 803 datasets • 6.78 GB in NetCDF3,4; HDF4,5; Grib2 • Satellite & climate reanalysis products* ≈ 20 seconds, cold (8 sec., hot)

CODING & DEBUGGING TIME NOT TAKEN INTO ACCOUNT

New data: just copy files on computer, register dataset in XML

Estimate: 14.76 hours for Uand V-wind 1979 – 2015

Estimate: linear scalability on number of files * «Data» top menu at http://www.wikience.org

Data import lesson

Can’t import large data volumes into SciDB in a reasonable time frame. SciDB performance evaluation is further evaluated only on U- and V-wind for 1979 year.

Test: simple statistics Result: single 94 × 192 grid, each cell – max for 1979 year Execution time, seconds Operation

SciDB

ChronosServer

Ratio, SciDB / ChronosServer

Cold

Hot

Cold

Hot

Max

13.46

4.43

3.10

3.04

4.34

Min

12.87

4.71

3.33

2.73

3.86

Average

21.42

4.71

3.23

4.55

6.63

Despite SciDB operates on its internal storage Both are NoSQL systems SciDB: store(aggregate(r2_u10m, max(value), lat, lon), r2_u10m_max); Chronos: ncap2 -s "$(r2.wind.10m.uv.umax)=$(r2.wind.10m.u).max($time)"

ChronosServer benefits from native OS caching during hot mode

Test: user-defined expressions • Originally, a climate reanalysis provides wind speed separately for Eastward (u) and Northward (v) directions. These are two different vectors. • However, most applications prefer to have wind speed (ws) and wind direction (wd, azimuth) values instead. • Test: calculate wind speed from U- and V- values.

v

𝑤𝑠 =

𝑢2 + 𝑣 2

v ws

u

u

More U and V details are at http://www.wikience.org/documentation/wind-speed-and-direction-tutorial/

SciDB

Wind speed calculation – which syntax is cleaner & easier?

iquery -a -n --query "store( project( apply( join(r2_u10m, r2_v10m), ws, float(sqrt(r2_u10m.value * r2_u10m.value + r2_v10m.value * r2_v10m.value)) ), ws), r2_ws10m);"

ChronosServer ncap2 -alias u,r2.wind.10m.u -alias v,r2.wind.10m.v -alias ws,r2.wind.10m.uv.ws -s "$(ws)=sqrt($(u)*$(u) + $(v)*$(v));"

Wind speed calculation Result: 1460 × 94 × 192 grid (time × lat × lon) Each cell is wind speed value

Execution time, seconds Operation Wind speed calc.

SciDB 25.75

ChronosServer

Ratio, SciDB / ChronosServer

Cold

Hot

Cold

Hot

3.50

2.10

7.36

12.26

• ncap2 runs in 1 thread (no OpenMP) • scidb runs 4 instances

Chunking: row-major disk layout, read 6×1 slice no chunking * 12 × 12 raster

2 × 2 chunks one chunk 6 × 6

4 × 4 chunks one chunk 3 × 3

read 1 × 2 portions 6 storage requests (a)

read 2 chunks 50% of all data (b)

read 2 chunks 12.5% of data (c)

* the whole array is a single chunk ** reads may also involve uncompressing data Note, that SSD is not the solution

Optimal chunk shape Query patterns: always read whole array

• • • • •

random access

Read time series for single point (3D array)

Chunk shape is crucial performance parameter Chunk shape depends on data and workload Optimal chunk shape may not exist for all access patterns Difficult to guess good chunk shape a priori Solution: lots of tuning and experimentation  Raster DBMS must be able to quickly alter chunk shape

Alter chunk size – query syntax SciDB iquery -a -n --query "store( redimension(r2_u10m, [time=0:*,10,0, lat=0:93,10,0, lon=0:191,8,0]), r2_u10m_10x10x8);"

ChronosServer ncks -4 --cnk_map dmn --cnk_plc g2d --cnk_dmn time,10 --cnk_dmn lat,10 --cnk_dmn lon,8 r2.wind.u10m.u r2.wind.u10m_ch10x10x8

For SciDB, this is the fastest way: http://forum.paradigm4.com/t/fastestway-to-alter-chunk-size/

Chunk sizes are in red

Alter chunk size – results Execution time, seconds

Operation

SciDB

ChronosServer

Ratio, SciDB / ChronosServer

Cold

Hot

Cold

Hot 150.24

Chunk 100×20×16

56.19

1.68

0.374

33.45

Chunk 10×10×8

222.11

1.98

1.15

112.18 193.14

• ncks runs 2 threads (OpenMP) • scidb runs 4 instances Hot mode is important: re-chunking the same data is quite often during chunk size tuning

Summary table Execution time, seconds Operation

SciDB

ChronosServer

Ratio, SciDB / ChronosServer

Cold

Hot

Cold

Hot

Data import

720.13

19.82

7.96

36.33

90.47

Max

13.46

4.43

3.10

3.04

4.34

Min

12.87

4.71

3.33

2.73

3.86

Average

21.42

4.71

3.23

4.55

6.63

Wind speed calc.

25.75

3.50

2.10

7.36

12.26

Chunk 100×20×16

56.19

1.68

0.374

33.45

150.24

Chunk 10×10×8

222.11

1.98

1.15

112.18 193.14

On average, ChronosServer is 3x to 193x faster SciDB

Appendix A. In-situ data processing benefits • Leverage powerful raster file formats – chunking, compression, multidimensional arrays, data types, hierarchical namespaces, metadata – re-implementation for an emerging in-db storage engine results in yet another raster file format

• Avoid conversion bottleneck – processing may take less time than data import – ability to process data before new data portion arrives

• Avoid additional space usage – most data owners never delete/modify source files

• Reduce DBMS dependence – easier to migrate to other DBMS

37

Appendix B. SciDB configuration [mydb] server-0=127.0.0.1,3 db_user=mydb install_root=/opt/scidb/15.12/install pluginsdir=/opt/scidb/15.12/install/lib/scidb/plugins logconf=/opt/scidb/15.12/install/share/scidb/log1.properties base-path=/home/scidb/scidb_data base-port=1239 interface=eth0 redundancy=0 security=trust execution-threads=4 result-prefetch-threads=4 result-prefetch-queue-size=1 operator-threads=1 mem-array-threshold=128 smgr-cache-size=128 merge-sort-buffer=64 sg-receive-queue-size=16 sg-send-queue-size=4

Appendix C. SciDB : no hierarchical namespace

scidb@scidb-vm:~$ iquery -a --query "list('arrays');" {No} name,uaid,aid,schema,availability,temporary {0} 'foo',17600,17600,'foo [i=0:93,94,0,j=0:191,192,0]',true,false {1} 'fooScaled',17602,17602,'fooScaled [i=0:93,94,0,j=0:191,192,0]',true,fals {2} 'IHI_ACCELEROMETER',1,1,'IHI_ACCELEROMETER