Sep 16, 2011 - A single file gets too big (inefficient seeks, too much metadata to read, etc. ..... Make sure to run ANA
GeoServer on steroids All you wanted to know about how to make GeoServer faster but you never asked (or you did and no one answered)
Ing. Andrea Aime, GeoSolutions Ing. Simone Giannecchini, GeoSolutions
FOSS4G 2011, Denver 12th-16th September 2011
GeoSolutions
Founded in Italy in late 2006
Expertise
•
Image Processing, GeoSpatial Data Fusion
•
Java, Java Enterprise, C++, Python
•
JPEG2000, JPIP, Advanced 2D visualization
Supporting/Developing FOSS4G projects
GeoTools, GeoServer
GeoBatch, GeoNetwork
Clients
Public Agencies
Private Companies
http://www.geo-solutions.it FOSS4G 2011, Denver 12th-16th September 2011
Preparing raster inputs
FOSS4G 2011, Denver 12th-16th September 2011
Raster Data CheckList
Objectives Fast extraction of a subset of the data Fast extraction of overviews Check-list Avoid having to open a large number of files per request Avoid parsing of complex structures Avoid on-the-fly reprojection (if possible) Get to know your bottlenecks CPU vs Disk Access Time vs Memory Experiment with Format, compression, different color models, tile size, overviews, configuration (in GeoServer of course) FOSS4G 2011, Denver 12th-16th September 2011
Problematic Formats
PNG/JPEG direct serving Bad formats (especially in Java) No tiling (or rarely supported) Chew a lot of memory and CPU for decompression Mitigate with external overviews NetCDF/grib1 and similar formats Complex formats (often with many subdatasets) Often contains un-calibrated data Must usually use multiple dimensions
Use ImageMosaic
Must usually massage the data before serving
e.g. transpose X,Y, FOSS4G 2011, Denver 12th-16th September 2011
Problematic Formats
Ascii Grid, GTOPO30, IDRISI and similar formats are bad
No internal tiling, no compression, no internal overviews
JPEG2000 (with Kakadu)
ASCII formats are bad
Extensible and rich, not (always) fast Can be difficult to tune for performance (might require specific encoding options)
ECW and MrSID
Why bother it’s proprietary? FOSS4G 2011, Denver 12th-16th September 2011
Choosing Formats and Layouts
To remember: GeoTiff is a swiss knife But you don’t want to cut a tree with it! Tremendously flexible, good fir for most (not all) use cases BigTiff pushes the GeoTiff limits farther Single File VS Mosaic VS Pyramids Use single GeoTiff when Overviews and Tiling stay within 4GB No additional dimensions Consider BigTiff for very large file (> 4 GB) Support for tiling Support for Overviews Can be inefficient with very large files + small tiling FOSS4G 2011, Denver 12th-16th September 2011
Choosing Formats and Layouts
Use ImageMosaic when:
A single file gets too big (inefficient seeks, too much metadata to read, etc..) Multiple Dimensions (time, elevation, others..) Avoid mosaics made of many very small files Single granules can be large Use Tiling + Overviews + Compression on granules
Use ImagePyramid when:
Tremendously large dataset
Need to serve at all scales
Too many files / too large files Especially low resolution
For single granules (< 2Gb) GeoTiff is generally a good fit FOSS4G 2011, Denver 12th-16th September 2011
Choosing Formats and Layouts
Examples:
Small dataset: single 2GB GeoTiff file
Medium dataset: single 40GB BigTiff
Large dataset: 400GB mosaic made of 10GB BigTiff files Extra large: 4TB of imagery, built as pyramid of mosaics of BigTiff/GeoTiff files to keep the file count low
FOSS4G 2011, Denver 12th-16th September 2011
GeoTiff preparation
STEP 0: get to know your data
gdalinfo utility is your friend
CheckList Missing CRS
Missing georeferencing
Fix with gdal_translate
Missing Overviews
Add a World File Fix with gdal_translate
Bad Tiling
Add a .prj file Fix with gdal_translate
Use gdaladdo
Compression
FOSS4G 2011, Denver 12th-16th September 2011
Use gdal_translate
GeoTiff preparation
STEP 1: fix and optimize with gdal_translate Inner Tiling
CRS and GeoReferencing
gdal_translate -co "TILED=YES" -co "BLOCKXSIZE=512" -co "BLOCKYSIZE=512" in.tif out.tif Check also GeoTiff driver creation options here
gdal_translate –a_srs “EPSG:32619” –a_ullr 285409.2 2014405.2 287536.8 2011947.6 in.tif out.tif
STEP 2: add overviews with gdal_addo
Leverages on tiff support for multipage files and reduced resolution pages gdaladdo -r cubic output.tif 2 4 8 16 32 64 128 Choose the resampling algorithm wisely Chose the tile size and compression wisely (use GDAL_TIFF_OVR_BLOCKSIZE)
Consider external overviews FOSS4G 2011, Denver 12th-16th September 2011
GeoTiff preparation
FOSS4G 2011, Denver 12th-16th September 2011
GeoTiff preparation
Compression
Consider when disk speed/space is an issue
Control it with gdal_translate and creation options
GeoTiff tiles can be compressed
LZW/Deflate are good for lossless compression
JPEG is good for visually lossless compression
From experience
Use LZW/Deflate on geophysical data (DEM, acquisitions) USE JPEG visually lossless with Photometric Interpretation to YCbCr for RGB FOSS4G 2011, Denver 12th-16th September 2011
Time, Elevation and other dimensions
Use Cases:
MetOc data (support for time, elevation)
Data with additional indipendent dimensions
WorkFlow
Split in multiple GeoTiff files
Optimize the files individually
Use ImageMosaic
Use a DBMS for indexing granules
Use File Name based property collectors to turn properties into DB rows attributes Filter by time, elevation and other attributes via OGC and CQL filters
Check back up slides for more info! FOSS4G 2011, Denver 12th-16th September 2011
Time, Elevation and other dimensions
Indexing multiple dimensions with DB support (video here)
datastore.properties
timeregex.properties
stringregex.properties indexer.properties
FOSS4G 2011, Denver 12th-16th September 2011
Time, Elevation and other dimensions
FOSS4G 2011, Denver 12th-16th September 2011
Proper Mosaic Preparation
ImageMosaic stitches single granules together with basic processing
Filtered selection
Overviews/Decimation on read
Over/DownSampling in memory
ColorMask (optional)
Mosaic/Stitch
ColorMask again (optional)
Optimize files as if you were serving them individually Keep a balance between number and dimensions of granules FOSS4G 2011, Denver 12th-16th September 2011
Proper Mosaic Configuration
STEP 0: Configure Coverage Access (see slide 22) STEP 1: Configure Mosaic Parameters ALLOW_MULTITHREADING
Use a proper Tile Size
In-memory processing, must not be too large Disk tiling should larger
If memory is scarce:
Load data from different granules in parallel Needs USE_JAI_IMAGE_READ set to false (Immediate Mode)
USE_JAI_IMAGREAD to true USE_MULTITHREADING to false*
Otherwise
USE_JAI_IMAGREAD to false ALLOW_MULTITHREADING to true
FOSS4G 2011, Denver 12th-16th September 2011
Proper Mosaic Configuration
Optional (Advanced): Configure Mosaic Parameters Directly Caching
Load the index in memory (using JTS SRTree) Super fast granule lookup, good for shapefiles Bad if you have additional dimension to filter on Based on Soft References, controlled via Java switch SoftRefLRUPolicyMSPerMB
ExpandToRGB
Expand colormapped imagery to RGB in memory Trade performance for quality
SuggestedSPI Default ImageIO Decoder class to use Don’t touch unless expert
FOSS4G 2011, Denver 12th-16th September 2011
Proper Pyramid Preparation
Use gdal_retile for creating the pyramid
Prepare the list of tiles to be retiled
Create the pyramid with GDAL retile (grab a coffee!)
Chunks should not be too small (here 2048x2048) Too many files is bad anyway Use internal Tiling for Larger chunks size If the input dataset is huge use the useDirForEachRow option Too many files in a dir is bad practice Make sure the number of level is consistent Too few bad performance at high scale FOSS4G 2011, Denver 12th-16th September 2011
Proper Pyramid Configuration
STEP 0: Configure Coverage Access (see slide 22) STEP 1: Configure Pyramid Parameters ALLOW_MULTITHREADING
Use a proper Tile Size
ImagePyramid relies on ImageMosaic
In-memory processing, must not be too large Disk tiling should larger
If memory is scarce:
Load data from different granules in parallel Needs USE_JAI_IMAGE_READ set to false (Immediate Mode)
USE_JAI_IMAGREAD to true USE_MULTITHREADING to false*
Otherwise
USE_JAI_IMAGREAD to false ALLOW_MULTITHREADING to true
FOSS4G 2011, Denver 12th-16th September 2011
Proper Pyramid Configuration
Optional (Advanced): Configure Mosaic Parameters Directly Caching
Load the index in memory (using JTS SRTree) Super fast granule lookup, good for shapefiles Bad if you have additional dimension to filter on Based on Soft References, controlled via Java switch SoftRefLRUPolicyMSPerMB
ExpandToRGB
Expand colormapped imagery to RGB in memory Trade performance for quality
SuggestedSPI Default ImageIO Decoder class to use Don’t touch unless expert
FOSS4G 2011, Denver 12th-16th September 2011
Proper GDAL Formats Configuration
Fix Missing/Improper CRS with PRJ or coverage config Fix Missing GeoReferencing with World File Make sure GDAL_DATA is properly configured Use a proper Tile Size
If memory is scarce:
In-memory processing, must not be too large Fundamental for striped data! JNI overhead Disk tiling should larger USE_JAI_IMAGREAD to true USE_MULTITHREADING to true*
Otherwise
USE_JAI_IMAGREAD to false USE_MULTITHREADING is ignored
FOSS4G 2011, Denver 12th-16th September 2011
Proper JPEG2000 Kakadu Configuration
Fix Missing/Improper CRS with PRJ or coverage config Fix Missing GeoReferencing with World File Make sure Kakadu dll/so is properly loaded Use a proper Tile Size
If memory is scarce:
In-memory processing Must not be too large Disk tiling should larger USE_JAI_IMAGREAD to true USE_MULTITHREADING to true*
Otherwise
USE_JAI_IMAGREAD to false USE_MULTITHREADING is ignored
FOSS4G 2011, Denver 12th-16th September 2011
Proper GeoServer Coverage Options Configuration
Make sure native JAI and Image is installed
Enable ImageIO native acceleration Enable JAI Mosaicking native acceleration Give JAI enough memory Don’t raise JAI memory Threshold too high Rule of thumb: use 2 X #Core Tile Threads (check next slide)
Enable Tile Recycling only on trunk Enable Tile Recycling if memory is not a problem FOSS4G 2011, Denver 12th-16th September 2011
Proper GeoServer Coverage Options Configuration
Multithreaded Granule Loading Allows to fine tuning multithreading for ImageMosaic Orthogonal to JAI Tile Threads Rule of Thumb: use 2 X #Core Tile Threads Perform testing to fine tune depending on layer configuration as well as on typical requests
ImageIO Cache threshold
decide when we switch to disk cache (very large WCS requests)
FOSS4G 2011, Denver 12th-16th September 2011
Reprojection Performance Vs Quality
GeoServer 2.1.x reprojects raster data using a piecewiselinear algorithm The area is divided in rectangular blocks, each having its own affine transform
The transformation between the full trigonometric expressions and the linear ones is driven by a tolerance, default value is 0.333 Larger value will make reprojection faster, but lower the quality -Dorg.geotools.referencing.resampleTolerance=0.5
FOSS4G 2011, Denver 12th-16th September 2011
Preparing vector inputs
FOSS4G 2011, Denver 12th-16th September 2011
Vector data checklikst
What do we want from vector data:
Binary data
No complex parsing of data structures
Fast extraction of a geographic subset
Fast filtering on the most commonly used attributes
FOSS4G 2011, Denver 12th-16th September 2011
Choosing a format
Slow formats
WFS
GML
DXF
Good formats, local and indexable
Shapefile
Directory of shapefiles
SDE
Spatial databases: PostGIS, Oracle Spatial, DB2, MySQL*, SQL server*
FOSS4G 2011, Denver 12th-16th September 2011
Shapefiles vs DBMS
Speed comparison vs spatial extent depicted:
Shapefile
Shapefile very fast when rendering the full dataset Database faster when extracting a small subset of a very large data set
no attribute indexing, avoid if filtering on attribute is important (filtering == reading less data, not applying symbols)
Database
Rich support for complex native filters Use connection pooling (preferably via JNDI) Validate connections (with proper pooling)
FOSS4G 2011, Denver 12th-16th September 2011
Shapefile preparation
Remove .qix file if present, let GeoServer 2.1.x rebuild it (more efficient) If there are large DBF attributes that are not in use, get rid of them using ogr2ogr, e.g.: ogr2ogr -select FULLNAME,MTFCC arealm.shp tl_2010_08013_arealm.shp
If on Linux, enable memory mapping, faster, more scalable (but will kill Windows):
FOSS4G 2011, Denver 12th-16th September 2011
Shapefile filtering
Stuck with shapefiles and have scale dependent rules like the following?
Show highways first
Show all streets when zoomed in
Use ogr2ogr to build two shapefiles, one with just the highways, one with everything, and build two layers, e.g.: ogr2ogr -sql "SELECT * FROM tl_2010_08013_roads WHERE MTFCC in ('S1100', 'S1200')" primaryRoads.shp tl_2010_08013_roads.shp
FOSS4G 2011, Denver 12th-16th September 2011
PostGIS specific hints
PostgreSQL out of the box configured for very small hardware: http://wiki.postgresql.org/wiki/Performance_Optimization
Make sure to run ANALYZE after data imports (updates optimizer stats) As usual, avoid large joins in SQL views, consider materialized views If the dataset is massive, CLUSTER on the spatial index:
http://postgis.refractions.net/documentation/manual1.3/ch05.html
Careful with prepared statements (bad performance) FOSS4G 2011, Denver 12th-16th September 2011
Optimize styling
FOSS4G 2011, Denver 12th-16th September 2011
Use scale dependencies
Never show too much data
the map should be readable, not a graphic blob. Rule of thumb: 1000 features max in the display
FOSS4G 2011, Denver 12th-16th September 2011
Labeling
Labeling conflict resolution is expensive, limit to the most inner zooms Halo is important for readability, but adds significant overhead Careful with maxDisplacement, makes for various label location attempts
FOSS4G 2011, Denver 12th-16th September 2011
FeatureTypeStyle
GeoServer uses SLD FeatureTypeStyle objects as Z layers for painting Each one allocates its own rendering surface (which can use a lot of memory), use as few as possible
FOSS4G 2011, Denver 12th-16th September 2011
Use translucency sparingly
Translucent display is expensive, use it sparingly
FOSS4G 2011, Denver 12th-16th September 2011
Scale dependent rules
Too often forgotten or little used, yet very important:
Hide layers when too zoomed in (raster/vector example) Progressively show details Add more expensive rendering when there are less features
Key to any high performance / good looking map
FOSS4G 2011, Denver 12th-16th September 2011
Example
FOSS4G 2011, Denver 12th-16th September 2011
Hide as you zoom in
Add a MinScaleDenominator to the rule This will make the layer disappear at 1:75000 (towards 1:1)
FOSS4G 2011, Denver 12th-16th September 2011
Alternative rendering
Simple rendering at low scale (up to 1:2000) More complex rendering when zoomed in (1:1999 and above)
FOSS4G 2011, Denver 12th-16th September 2011
Alternative rendering
FOSS4G 2011, Denver 12th-16th September 2011
Point symbols
• 600 loc for 6 different points types • Painful… FOSS4G 2011, Denver 12th-16th September 2011
Prepare data
alter table pointlm add column image varchar;
update pointlm set image = 'shop_supermarket.p.16.png' where MTFCC = 'C3081' and (FULLNAME like '%Shopping%' or FULLNAME like '%Mall%'); update pointlm set image = 'peak.png' where MTFCC = 'C3022'
update pointlm set image = 'amenity_prison.p.20.png' where MTFCC = 'K1236';
update pointlm set image = 'museum.p.16.png' where MTFCC = 'K2165';
update pointlm set image = 'airport.p.16.png' where MTFCC = 'K2451';
update pointlm set image = 'school.png' where MTFCC = 'K2543';
update pointlm set image = 'christian3.p.14.png' where MTFCC = 'K2582';
update pointlm set image = 'gate2.png' where MTFCC = 'K3066';
FOSS4G 2011, Denver 12th-16th September 2011
Dynamic symbolizers
FOSS4G 2011, Denver 12th-16th September 2011
Output tuning
FOSS4G 2011, Denver 12th-16th September 2011
WMS output formats JPEG
PNG 8bit
PNG 24bit
23.8KB
66KB
169.4KB
27KB
27KB
64KB
Compression artifacts
Color reduction FOSS4G 2011, Denver 12th-16th September 2011
Large size
WFS output formats 35 30 25 20 15 10 5 0
Dimension MB
HTTP GZip compression is transparent in GeoServer, make sure proxies keep it (or pay 10x price) FOSS4G 2011, Denver 12th-16th September 2011
Tile caching
FOSS4G 2011, Denver 12th-16th September 2011
Tile caching with GeoWebCache
Tile oriented maps, fixed zoom levels and fixed grid
Useful for stable layers, backgrounds
Protocols: WMTS, TMS, WMS-C, Google Maps/Earth, VE
Speedup compared to dynamic WMS: 10 to 100 times, assuming tiles are already cached (whole layer preseeded) Suitable for:
Mostly static layer No (or few) dynamic parameters (CQL filters, SLD params, SQL query params, time/elevation, format options) FOSS4G 2011, Denver 12th-16th September 2011
Embedded GWC advantage
No double encoding when using meta-tiling, faster seeding
FOSS4G 2011, Denver 12th-16th September 2011
Space considerations
Seeding Colorado, assuming 8 cores, one layer, 0.1 sec 756x756 metatile, 15KB for each tile
Do yours: http://tinyurl.com/3apkpss
Not enough disk space? Set a disk quota Zoom level 13 14 15 16 17 18 19 20
Tile count
Size (MB)
58,377 232,870 929,475 3,713,893 14,855,572 59,396,070 237,584,280 950,273,037
1 4 14 57 227 906 3,625 14,500
Time to seed Time to seed (hours) (days) 0 0 0 0 0 0 1 0 6 0 23 1 92 4 367 15
FOSS4G 2011, Denver 12th-16th September 2011
Resource control
FOSS4G 2011, Denver 12th-16th September 2011
WMS request limits
Max memory per request: avoid large requests, allows to size the server memory (max concurrent request * max memory)
Max time per request: avoid requests taking too much time (e.g., using a custom style provided with dynamic SLD in the request) Max errors: best effort renderer, but handling errors takes time
FOSS4G 2011, Denver 12th-16th September 2011
WFS request limits
Max feature returned, configured as a global limit
Return feature bbox: reduce amount of generated GML
Per layer max feature count
FOSS4G 2011, Denver 12th-16th September 2011
WCS request limits
FOSS4G 2011, Denver 12th-16th September 2011
Control flow
Control how many requests are executed in parallel, queue others:
Increase throughput
Control memory usage
Enforce fairness
More info here FOSS4G 2011, Denver 12th-16th September 2011
Control flow
17%
$GEOSERVER_DATA_DIR/controlflow.properties # don't allow more than 16 GetMap requests in parallel ows.wms.getmap=16 FOSS4G 2011, Denver 12th-16th September 2011
Auditing
Log each and every request
Log contents driven by customizable template
Summarize and analyze requests with offline tools
More info here
FOSS4G 2011, Denver 12th-16th September 2011
JVM and deploy configuration
FOSS4G 2011, Denver 12th-16th September 2011
Premise
The options discussed here are not going to help visibly if you did not prepare the data and the styles They are finishing touches that can get performance up once the major data bottlenecks have been dealt with
Check “Running in production” instructions here
FOSS4G 2011, Denver 12th-16th September 2011
JVM settings
--server: enables the server JIT compiler --Xms2048m -Xmx2048m: sets the JVM use two gigabytes of memory --XX:+UseParallelOldGC -XX:+UserParallelGC: enables multi-threaded garbage collections, useful if you have more than two cores --XX:NewRatio=2: informs the JVM there will be a high number of short lived objects --XX:+AggressiveOpt: enable experimental optimizations that will be defaults in future versions of the JVM
FOSS4G 2011, Denver 12th-16th September 2011
Native JAI and JDK
Install native JAI and use a recent Sun JDK! Benchmark over a small data set (the effect is not as visible on larger ones)
FOSS4G 2011, Denver 12th-16th September 2011
Setup a local cluster
Java2D locks when drawing antialiased vectors
Limits scalability severely
Use Apache mod_proxy_balance and setup a GeoServer each 2/4 cores mod_proxy_balance
GeoServer
GeoServer
GeoServer
FOSS4G 2011, Denver 12th-16th September 2011
Clustering advantage
FOSS4G 2010 vector benchmarks (roads/buildings/isolines and so on, over the entire Spain) GeoServer was benchmarked without local clustering
66%
FOSS4G 2011, Denver 12th-16th September 2011
Benchmarking
FOSS4G 2011, Denver 12th-16th September 2011
Using JMeter
Good benchmarking tool Allows to setup multiple thread groups, different parallelelism and request count, to ramp up the load
Can use CSV files to generate semi-randomized requests
Reports results in a simple table
http://jakarta.apache.org/jmeter/
FOSS4G 2011, Denver 12th-16th September 2011
Using JMeter
Thread group: how many threads Loop: how many requests
HTTP sampler: the request CSV: read request params from CSV Summary table
FOSS4G 2011, Denver 12th-16th September 2011
Generating the CSV
Simple randomized generation tool built during WMS shootouts, wms_request.py Generate csv with the bbox and width/height to be used in JMeter scripts: ./wms_request.py -count 1200 -region -180 -90 180 90 -minres 0.002 -maxres 0.1 -minsize 256 256 -maxsize 1024 1024
Get it here along with a corresponding JMeter script: http://demo1.geo-solutions.it/share/jmeter_2011.zip
FOSS4G 2011, Denver 12th-16th September 2011
Checking results
Results table
Run the benchmarks 2-3 times, let the results stabilize
Save the results, check other optimizations, compare the results
FOSS4G 2011, Denver 12th-16th September 2011
Real world deploy
FOSS4G 2011, Denver 12th-16th September 2011
Deploy configuration
FOSS4G 2011, Denver 12th-16th September 2011
Raster data
Whole Italy at 50cm per pixel Over 4TB, updated fully every 3 years (old data still available for historical access) Custom pyramid
100 m per pixel: one image
20m per pixel: mosaic of 20 tiles
4m per pixel: mosaic of few hundred tiles
0.5m per pixel: 9000 tiles
Each tile is 10000x10000, with overviews
FOSS4G 2011, Denver 12th-16th September 2011
Vector data
Cadastral data for the whole Italy, with full history (interval of validity for each parcel) 100 million polygons A query extracts a subset relative to a certain time interval and area the user is allowed to see
No data from this table is ever shown below 1:50000 (SLD scale dependencies) Physical table level partitioning (Oracle style) of the table based on geographic area to parallelize and cluster data loading, plus spatial indexing and indexes on commonly filtered upon attributes
FOSS4G 2011, Denver 12th-16th September 2011
The End
Questions?
[email protected]
[email protected] FOSS4G 2011, Denver 12th-16th September 2011