Large Image Correction and Warping in a Cluster

0 downloads 0 Views 557KB Size Report
Page 7 .... ActiveHarmony, ERDoS, EPIQ, CQoS. • Focus on time-related requirements. • Adaptation to changes in resources and execution environment.
Performance vs. Accuracy Trade-offs for Large-scale Image Analysis Applications

Vijay S Kumar, Tahsin Kurc, Jun Kong, Umit Catalyurek, Metin Gurcan, Joel Saltz Department of Biomedical Informatics The Ohio State University

IEEE Cluster 2007 September 18, 2007

Organization of the talk ¾ Motivation: Digital microscopy, pathology ¾ Image analysis for cancer prognosis studies

¾ Adaptive data analysis ¾ Basic image-analysis algorithm ¾ ¾ ¾ ¾

Adaptive processing heuristics Parallel execution on cluster Performance Results Conclusions

Motivating application scenarios Pathology

Astronomy

Large data volumes, High data rates Complex data processing operations ƒ Digital Microscopy: 500 MB/min ƒ 2D images acquired at upto 150x magnification (60 Gigabytes

per slice uncompressed)

ƒ Whole-slide image analysis: High turnaround times

ƒ Large survey telescopes: 12 TB/night ƒ Science requirements on performance are stringent ƒ Near-real time data analysis

Adaptive applications ƒ Applications can execute at different performance levels ƒ Higher performance level Î Lower accuracy of output

ƒ Trade-off controlled by application-level parameters ƒ Application-level Quality-of-service (QoS) requirements: ƒ Maximize average accuracy across all tasks within a deadline ƒ Given a minimum average accuracy per task, minimize time Resolution

Processing time

Accuracy

256 x 256

1088 s

0.509

128 x 128

483 s

0.483

64 x 64

374 s

0.446

Adaptive multi-resolution strategy used for image analysis

GOAL: Framework to support QoS in large-scale image analysis • Application: Express performance parameters, QoS requirements • System: Estimate accuracy vs. performance characteristics, Adaptive processing (map requirements to parameter values)

Adaptive multi-resolution strategy Whole-slide Image

Image tile Multi-resolution decomposition

Classification map No SR SP

Yes tile “finalized”

Confident?

⎡ f1 ⎤ ⎢f ⎥ Classification ⎢ 2 ⎥ ⎢ M ⎥ ⎢ ⎥ ⎢⎣ fSN ⎥⎦ Ri

Computer-aided classification of neuroblastoma

Problem definition ƒ To process tiles in an image such that user QoS requirements are met. ƒ Maps to a scheduling problem: Given a set of image tiles and a user QoS requirement: • Determine an optimal order of execution of the tiles • Determine the resolution at which each tile must be processed

Simulation studies – Ideal conditions ƒ Processing time & confidence for tile at each resolution known

a priori

ƒ Baseline performance (RAND): ƒ Pick a tile at random, process tile until it is finalized

ƒ QoS 1: Maximize number of finalized tiles within time ‘t’ : ƒ Process tiles in increasing order of their processing times

ƒ QoS 2: Maximize average confidence for image within time ‘t’ ƒ Process tiles in decreasing order of maximum gain in confidence per unit processing time

Lessons learnt

• Scope for improvement • Ordering tile execution can help bridge the gap to a certain extent • How do we account for non-ideal conditions?

Tile-based heuristic ƒ Initial Static Phase (Only insertions) ƒ Process each tile at lowest res ƒ Insert entry into Queue

ƒ Dynamic Phase ƒ Process entry at top of queue at next higher resolution ƒ If tile gets finalized, delete queue entry ƒ Insertions and Deletions ƒ Queue is dynamically updated

ƒ Use same Priority Queue data structure for different QoS requirements ƒ Modify insertion operation accordingly

Queue Entry Tile ID, Resolution, Confidence, Proc time

Priority Queue for Tile ordering

Priority Queue Insertion scheme ƒ QoS: Maximize number of finalized tiles ƒ Tile with confidence value closer to threshold gets higher priority ƒ Better chance of being finalized at resolution+1

ƒ QoS: Maximize average accuracy across tiles ƒ Tile achieving higher increase in confidence per unit of processing time between resolution-1 and resolution gets higher priority ƒ Prospect for greater benefit at resolution+1

A

A t

Well-behaved characteristics

t

Unpredictable characteristics

In practice, accuracy vs. performance could be nonmonotone in nature

Accuracy vs. performance characteristics ƒ > 70 % of tiles showed nonmonotone characteristics ƒ Tile-based heuristic is not suited for such cases ƒ Impossible to estimate behavior for single tile in isolation

1

1

0.8

Confidence

0.4

0.6 0.4 0.2

0.2 0

0

1

1

2

3

2

4

3

4

Resolution

Resolution

regions

1

1

0.8

0.8

Confidence

ƒ Region characterized by its representative tiles. ƒ Hierarchical region-based heuristic

0.6

Confidence

ƒ Spatially close tiles Î Similar features Î Similar accuracyperformance characteristics ƒ Group neighboring tiles into

Confidence

0.8

0.6 0.4

0.6 0.4 0.2

0.2 0

0

1

1

2

3

Resolution

4

2

3

Resolution

4

Hierarchical Region-based algorithm ƒ Uses same priority queue structure. Process a tile completely. ƒ Queue entry now corresponds to a rectangular region. ƒ Static partitioning approach: ƒ Choice of region dimensions: one size will not work for all images ƒ Simple, not efficient: does not help converge on favorable tiles Queue Entry Region ID, Resolution, Confidence, Proc time

Hierarchical Region-based algorithm ƒ Uses same priority queue structure. Process a tile completely. ƒ Queue entry now corresponds to a rectangular region.

Architecture Middleware support: DataCutter • Component-stream framework

• Ideal for data analysis workflows • Combined task- and data-parallelism • MPI underlying substrate

Parallel Execution Worker

Request

Master

Tile

Process tile

Confide nce

+ Requ est

Tile

Regions

ƒ Static assignment: Leads to load imbalance ƒ with dynamic redistribution: Higher inter-processor communication ƒ Demand-driven strategy: ƒ Assign ‘next’ tile/region to worker with earliest request ƒ Dynamically adapts to changes to load on workers and network traffic

Experimental test bed ƒ Cluster of 32 nodes (NSF Research Infrastructure Grant) ƒ 2.4 GHz AMD Opteron dual-processor nodes ƒ 8 GB of memory ƒ 2x250GB SATA disks locally installed on each node (Joined into a 437GB RAID0 volume) ƒ Interconnected by both an Infiniband and 1Gbps Ethernet network ƒ Datasets ƒ Image 1: 69,840x59,359 pixels (11GB); 4,075 tiles ƒ Image 2: 108,640x70,601 pixels (22GB); 16,011 tiles

Quality aspects: Experiment 1

% extra finalized tiles

ƒ QoS: Maximize number of finalized tiles within time ‘t’ 80

ƒ Image 1 ƒ 16 workers ƒ Threshold: 0.5

70 60 50 40 30 20 10 0 100

600

1100

1600

2100

2600

3100

Time (s)

Hierarchical region-based heuristic can finalize 21% more tiles than baseline scheme on an average

3600

Quality aspects: Experiment 2 ƒ QoS: Maximize average accuracy across whole image within time ‘t’

% increase in average

50 40

• Image 1

30

• 16 workers

20

• Threshold: 0.7

10 0 100

1100

2100

3100

4100

5100

Time (s) On an average, the confidence for the hierarchical regionbased heuristic is 19% greater than that of the baseline scheme

Performance tests: Scaling number of nodes 8000

Processing Time (s)

7000 6000 5000

12 nodes 16 nodes

4000 24 nodes 32 nodes

3000 2000 1000 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Average Confidence per tile

As number of nodes is doubled, region-based heuristic achieves same average confidence in half the time

Performance tests: Scaling image size

Across different image sizes, linear scaling is observed, but not strictly linear.

Related Work ƒ Large-scale Image Analysis ƒ Virtual Microscopy: BrainMaps.org (UC Davis)

ƒ Support for distributed adaptive applications ƒ Language and compiler-based approaches (Adve et al, Agrawal et al) • Prediction of execution time • Assume data invariance

ƒ Middleware-based approaches • • • •

ActiveHarmony, ERDoS, EPIQ, CQoS Focus on time-related requirements Adaptation to changes in resources and execution environment Fine-grained adaptation (at a single data-component level)

ƒ Our framework can be complemented with benefits from above approaches

Conclusions ƒ Framework for distributed adaptive processing of image tiles ƒ Support for different user QoS requirements ƒ Tile-ordering heuristics help bridge gap between basic and optimal execution strategies ƒ Scales well with compute nodes and image data size. ƒ Ongoing work: ƒ Express application parameters and QoS requirements to the system ƒ Improve heuristics depending on application characteristics ƒ Realistic user requirements

Performance vs. Accuracy Trade-offs for Large-scale Image Analysis Applications

Thank you!

http://www.bmi.osu.edu [email protected]

Suggest Documents