David Edgar Liebke - Incanter

From Concurrency to Parallelism an illustrated guide to multi-core parallelism in Clojure

David Edgar Liebke

clojureconj Durham, NC 23 October 2010

summary Concurrency is commonly mistaken for parallelism, but the two are distinct concepts. Concurrency is concerned with managing access to shared state from different threads, whereas parallelism is concerned with utilizing multiple processors/cores to improve the performance of a computation. Clojure has successfully improved the state of concurrent programming with its many concurrency primitives, and now the goal is to do the same for multi-core parallel programming, by introducing new parallel processing features that work with Clojure’s existing data structures. Clojure’s original parallel processing function, pmap, will soon be joined by pvmap and pvreduce, based on JSR 166 and Doug Lea’s Fork/Join Framework. From these building blocks, and the fjvtree function that underlies pvmap and pvreduce, higherlevel parallel functions can be developed. This talk will provide an illustrated walkthrough of the algorithms underlying pmap, pvmap, and pvreduce, comparing their strengths, weaknesses, and performance characteristics; and will conclude with an example of using these primitives to write a parallel version of Clojure’s filter function.

2

outline pmap algorithm weaknesses performance characteristics chunking chunked performance characteristics fork-join dequeues basic algorithm persistent-vector overview fjvtree, pvmap, and pvreduce algorithm performance characteristics pvfilter implementation performance characteristics

3

pmap lazy meets parallel

4

pmap

processors: 2 threads: 4

completed work

current threads 5

pmap


completed work

current threads 6

pmap


The pmap function is semi-lazy, meaning it tries to keep just enough threads running so that all the processors are utilized, but doesn’t process its entire input. It does this by using futures to invoke the function being mapped on just enough input values, but no more.

completed work

current threads 7

pmap


comp

comp

comp

comp


completed work

current threads 8

pmap


comp

comp

comp


completed work

current threads 9

pmap


comp

comp


completed work

current threads 10

pmap


comp

comp


completed work

current threads 11

pmap


comp

comp

comp


completed work

current threads 12

pmap


comp

comp

comp


completed work

current threads 13

pmap


comp

comp

comp

comp


completed work

current threads 14

pmap


comp

comp

comp


completed work

current threads 15

pmap


comp

comp

comp


completed work

current threads 16

pmap


comp

comp

comp

comp


completed work

current threads 17

pmap


comp

comp

comp


completed work

current threads 18

pmap when lazy isn’t parallel

19

pmap


comp

comp

comp

comp

Because pmap is semi-lazy, it is necessary that the process consuming its output can do so at a faster rate than the process producing its output, otherwise all the processors won’t be utilized.

completed work

current threads 20

pmap



completed work

current threads 21

pmap


comp


completed work

current threads 22

pmap


comp


completed work

current threads 23

pmap


comp


completed work

current threads 24

pmap


comp


completed work

current threads 25

pmap


comp


completed work

current threads 26

pmap


comp


completed work

current threads 27

pmap


comp


completed work

current threads 28

pmap


comp


completed work

current threads 29

pmap uneven loads

30

pmap


comp

If one or more processors are returning results at a slower rate than the others, there is no way to redistribute the load. The more responsive processors will complete their work and sit idle until the slowest completes and its result can be consumed.

completed work

current threads 31

pmap


comp

comp


completed work

current threads 32

pmap


comp

comp

comp


completed work

current threads 33

pmap


comp

comp

comp


completed work

current threads 34

pmap


comp

comp


completed work

current threads 35

pmap


comp


completed work

current threads 36

pmap



completed work

current threads 37

pmap


comp


completed work

current threads 38

pmap


comp

comp


completed work

current threads 39

pmap


comp

comp

comp


completed work

current threads 40

pmap


comp

comp

comp


completed work

current threads 41

pmap


comp

comp


completed work

current threads 42

pmap


comp


completed work

current threads 43

pmap performance characteristics

44

pmap

compared to map (2 cores) * results are the median of 50 samples, where a test function was pmapped over a vector of 100 values

Tf < 0.05 msec

0.05 < Tf < 0.1 msec

Tf > 0.1 msec 45

pmap chunky style

46

pmap

processors: 2 threads: 4 chunk size: 32

completed work

current threads 47

pmap


completed work

current threads 48

pmap


One strategy for handling situations where the consuming process cannot keep up with the producing process, hence underutilizing the processors, is to partition the input data into chunks and pass those to pmap instead of individual values.

completed work

current threads 49

pmap One strategy for handling situations where the consuming process cannot keep up with the producing process, hence underutilizing the processors, is to partition the input data into chunks and pass those to pmap instead of individual values.


pmap over chunks (pmap (fn [chunk] (map f chunk)) (partition 32 coll)))

completed work

current threads 50



comp

comp

comp

comp

completed work

current threads 51



comp

comp

comp

completed work

current threads 52



comp

comp

comp

completed work

current threads 53



comp comp

comp

completed work

current threads 54



comp

comp

comp

completed work

current threads 55



comp

comp

comp

comp

completed work

current threads 56

chunked pmap

compared to map (2 cores)

* results are the median of 500 samples, where a test function (duration < 0.005 msec) was pmapped over chunked vectors of 512 values

chunk < 10

10 < chunk < 50

Tf > 50 57

fork-join parallelism divide and conquer

58

dequeue double-ended queue

59

dequeue

double-ended queue

A dequeue (pronounced “deck”) is a double-ended queue, where values can be pushed and popped off the front or taken from the back. The Fork-Join algorithm systematically divides a job into tasks that are pushed onto the dequeue, resulting in the largest tasks being located at the back, which in turn improves the efficiency of its work-stealing algorithm. 60

dequeue

split


dequeue

push


dequeue


dequeue

split


dequeue

push


dequeue


dequeue

split


dequeue

push


dequeue


dequeue

compute


dequeue


dequeue

pop


dequeue

work stealing


dequeue

take


fork-join basic algorithm

75

fork-join

workers: 4 branching factor: 2 sequential threshold: 256

Fork-Join is a divideand-conquer algorithm that iteratively divides a job into smaller and smaller tasks, placing them on a dequeue, until the size of the current task is below a configured threshold.

worker deques

current tasks 76

completed tasks

fork-join



worker deques

current tasks 77

completed tasks

fork-join



worker deques

push

current tasks 78

completed tasks

fork-join



push

worker deques

current tasks

2048

79

completed tasks

fork-join


1024 As one worker is processing its tasks, another worker can steal a task from the back of its dequeue.

worker deques

take

current tasks 80

completed tasks

fork-join


1024

push

As one worker is processing its tasks, another worker can steal a task from the back of its dequeue.

1024

worker deques

current tasks 81

completed tasks

fork-join


take

take


1024

worker deques

current tasks 82

completed tasks

fork-join


512

push

512

push

push

512

push


worker deques

current tasks 83

completed tasks

fork-join


push

push 512


512

push

push 512

worker deques

512

current tasks 84

completed tasks

fork-join


256

256 512

Once each worker’s current task is smaller than the configured threshold, it will begin the intended computation.

512

compute

compute

256

256 512

512

compute

worker deques

compute

current tasks 85

completed tasks

fork-join


256

pop 512

Once the computation is completed for the current task, the worker will pop another task off of its dequeue...

worker deques

512

compute

pop

pop 512

512

current tasks 86

completed tasks

fork-join


256

compute

512 ... and perform the computation on it.

512

compute

compute

compute

512

worker deques

512

current tasks 87

completed tasks

fork-join


256

compute After the computation has been completed for at least two tasks, the results can be combined with a join function. join

worker deques

512

join

512

512

join

512

current tasks 88

completed tasks

fork-join


256

push 512

And so on...

compute

push

worker deques

push

current tasks 89

completed tasks

fork-join


256

256 512

And so on...

worker deques

compute

compute

256

256

compute

compute

current tasks 90

completed tasks

fork-join


256

256 512

And so on...

compute

compute

pop

256

compute

worker deques

current tasks 91

completed tasks

fork-join


256

256 512

And so on...

compute

compute

256

compute

compute

worker deques

current tasks 92

completed tasks

fork-join


256

256 512

And so on...

compute

compute

256 join compute

worker deques

current tasks 93

completed tasks

fork-join


256

256 512

And so on...

compute

compute

256 join compute

worker deques

current tasks 94

completed tasks

fork-join


pop

pop take

If one of the workers falls behind, another worker can take tasks from its dequeue.

worker deques

pop

current tasks 95

completed tasks

fork-join


compute

push

compute


worker deques

compute

current tasks 96

completed tasks

fork-join


256

compute

join If one of the workers falls behind, another worker can take tasks from its dequeue.

worker deques

compute

compute

current tasks 97

completed tasks

fork-join


take


worker deques

compute

compute

compute

current tasks 98

completed tasks

fork-join


compute

join If one of the workers falls behind, another worker can take tasks from its dequeue.

join

worker deques

current tasks 99

completed tasks

fork-join


join

join

And so on...

join

worker deques

current tasks 100

completed tasks

fork-join


join And so on...

join

worker deques

current tasks 101

completed tasks

fork-join


32 And so on...

join

join

worker deques

current tasks 102

completed tasks

fork-join


And so on...

join

worker deques

current tasks 103

completed tasks

fork-join


... until the job is complete.

worker deques

current tasks 104

completed tasks

persistent vector parallelism with existing data structures

105

persistent-vector

00000 00000 00000

count: 0 shift: 5

A PersistentVector contains a root and a tail; the tail is an array that can contain up to 32 Object references, and the root is a Node that can contain up to 32 child Nodes.

root node tail nodes obj refs selected 106

persistent-vector

00000 00000 11111

count: 32 shift: 5

Values are added to the tail...


persistent-vector

00000 00000 11111

count: 33 shift: 5

... until the tail is full, then a new Node is created, containing the 32 Object references from the tail, and inserted as a child of the root.


persistent-vector

00000 11111 11111

count: 1025 shift: 5

Once the root is full, a new root Node is created, and the existing one is added as a child.


persistent-vector

00001 00000 11111


Once the root is full, a new root Node is created, and the existing one is added as a child.


persistent-vector

00001 11111 11111


And so on...


persistent-vector

00010 00000 11111


And so on...


persistent-vector

00010 11111 11111


And so on...


persistent-vector

00011 00000 11111


And so on...


persistent-vector

00011 11111 11111


And so on...


fork-join on persistent vectors fjvtree, pvmap, and pvreduce

116

fjvtree


The core Fork-Join algorithm in Clojure is implemented in the fjvtree function, which uses the underlying tree structure of PersistentVector to break jobs into tasks.

worker deques

current tasks 117

completed tasks

fjvtree The core Fork-Join algorithm in Clojure is implemented in the fjvtree function, which uses the underlying tree structure of PersistentVector to break jobs into tasks.

worker deques


fork-join based map and reduce on vectors (pvmap f v)

(pvreduce f v)

implemented with (fjvtree v combine-fn leaf-fn)

current tasks 118

completed tasks

fjvtree



worker deques

current tasks 119

completed tasks


worker deques


push

push

push

current tasks 120

completed tasks


worker deques


1024

1024

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

1024

current tasks 121

completed tasks


worker deques


1024

1024

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

1024

current tasks 122

completed tasks


worker deques


leaf-fn: pvmap 1024

1024 (fn [^PersistentVector$Node node] (new-node (amap (.array node) i a 32 (f (aget a i)))))

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

1024

current tasks 123

completed tasks


worker deques


leaf-fn: pvreduce 1024

1024

comp

32

32

32

32

32

32

32

32

32

32

32

(fn [^PersistentVector$Node node] (let [a (.array node)] (loop [ret (aget a 0) i (int 1)] 32 (if (< i 32) 32 (recur (f ret (aget a i)) 32 (inc i)) 32 ret))))

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

1024

current tasks 124

completed tasks


worker deques


take

take

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

take

current tasks 125

completed tasks


worker deques

workers: 4 branching factor: 32 sequential threshold: 32 push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

push

comp

32

32

32

push

push

push

32

32

32

32

push

push

push

push

32

32

32

32

push

push

push

push

32

32

32

32

push

push

push

push

32

32

32

32

push

push

push

push

32

32

32

32

push

push

push

push

32

32

32

32

push

push

push

push

32

32

32

32

push

push

push

push

current tasks 126

completed tasks


worker deques

workers: 4 branching factor: 32 sequential threshold: 32 comp

32

32

32

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

comp

32

32

32

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

current tasks 127

completed tasks


worker deques

workers: 4 branching factor: 32 sequential threshold: 32 pop

32

32

pop

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

pop

32

32

pop

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

current tasks 128

completed tasks


worker deques


32

32

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

comp

32

32

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

current tasks 129

completed tasks


worker deques


32

pop

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

pop

32

pop

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

current tasks 130

completed tasks


worker deques


32

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

comp

32

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

current tasks 131

completed tasks

fjvtree



worker deques

pop

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

pop

pop

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

current tasks 132

completed tasks

fjvtree



worker deques

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

comp

comp

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

current tasks 133

completed tasks


worker deques

workers: 4 branching factor: 32 sequential threshold: 32 1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

pop

1

1

1

pop

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

pop

1

1

1

pop

current tasks 134

completed tasks


worker deques

workers: 4 branching factor: 32 sequential threshold: 32 1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

comp

1

1

1

comp

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

comp

1

1

1

comp

current tasks 135

completed tasks


worker deques


join

join

join

join

current tasks 136

completed tasks


worker deques


combine-fn: pvmap join

join (fn [coll] (new-node (to-array coll))))

join

join

current tasks 137

completed tasks


worker deques


combine-fn: pvreduce join

join (fn [coll] (reduce f coll))

join

join

current tasks 138

completed tasks


worker deques


join

join

join

join

current tasks 139

completed tasks


worker deques


join

current tasks 140

completed tasks


worker deques


128

current tasks 141

completed tasks

pvmap performance characteristics

142

pvmap

compared to map (2 cores) * results are the median of 50 samples, where a test function was pvmapped over a vector of 100 values

Tf < 0.05 msec

0.05 < Tf < 0.1 msec

Tf > 0.1 msec 143

pvmap vs pmap

compared to map (2 cores) * results are the median of 50 samples, where a test function was p*mapped over a vector of 100 values

Tf < 0.05 msec

0.05 < Tf < 0.1 msec

Tf > 0.1 msec 144

pvreduce performance characteristics

145

pvreduce

compared to reduce (2 cores) * results are the median of 50 samples, where a test function was pvreduced over a vector of 100 values

Tf < 0.05 msec

0.05 < Tf < 0.1 msec

Tf > 0.1 msec 146

pvreduce examples implementing pvfilter

147

parallel sum

using pvreduce

(pvreduce + [1 2 3 4 5 6 7 8 9 .. 128])

148

parallel sum

using pvreduce

(pvreduce + [1 2 3 4 5 6 7 8 9 .. 128])

(reduce + (reduce (reduce (reduce (reduce

+ + + +

[1 2 3 [33 33 [65 66 [97 98

149

4 5 .. 32]) 34 35 .. 64]) 67 68 .. 96]) 99 100 .. 128]))

parallel sum

using pvreduce

(pvreduce + [1 2 3 4 5 6 7 8 9 .. 128])


+ + + +

[1 2 3 [33 33 [65 66 [97 98

4 5 .. 32]) 34 35 .. 64]) 67 68 .. 96]) 99 100 .. 128]))

(reduce + [528 1552 2576 3600])

150

parallel sum

using pvreduce

(pvreduce + [1 2 3 4 5 6 7 8 9 .. 128])


+ + + +

[1 2 3 [33 33 [65 66 [97 98

4 5 .. 32]) 34 35 .. 64]) 67 68 .. 96]) 99 100 .. 128]))

(reduce + [528 1552 2576 3600])

8256

151

parallel filter

using pvreduce

(defn pvfilter [pred v] (letfn [(filt [v x] (if (pred x) (conj v x) v))] (pvreduce filt [] v) (pvfilter even? [1 2 3 4 5 6 7 8 9 .. 128])

152

parallel filter

using pvreduce

(defn pvfilter [pred v] (letfn [(filt [v x] (if (pred x) (conj v x) v))] (pvreduce filt [] v) (pvfilter even? [1 2 3 4 5 6 7 8 9 .. 128]) (reduce filt (reduce (reduce (reduce (reduce

filt filt filt filt

[] [] [] []

153

[1 2 3 [33 34 [65 66 [97 98

4 .. 32]) 35 36 .. 64]) 67 68 .. 96]) 99 100 .. 128]))

parallel filter

using pvreduce


filt filt filt filt

[] [] [] []

[1 2 3 [33 34 [65 66 [97 98

4 .. 32]) 35 36 .. 64]) 67 68 .. 96]) 99 100 .. 128]))

(reduce filt [[2 4 6 8 .. 32] [34 36 38 40 .. 64] [66 68 70 72 .. 96] [98 100 102 104 .. 128]])

154

parallel filter

using pvreduce


filt filt filt filt

[] [] [] []

[1 2 3 [33 34 [65 66 [97 98

4 .. 32]) 35 36 .. 64]) 67 68 .. 96]) 99 100 .. 128]))

(reduce filt [[2 4 6 8 .. 32] [34 36 38 40 .. 64] [66 68 70 72 .. 96] [98 100 102 104 .. 128]]) java.lang.ClassCastException: clojure.lang.PersistentVector cannot be cast to java.lang.Number

155

parallel filter

using pvreduce

(defn pvfilter [pred v] (letfn [(par-filt [v x] (cond (vector? x) (apply reduce conj v x) (even? x) (conj v x) :else v))] (pvreduce par-filt [] v)) (pvfilter even? [1 2 3 4 5 6 7 8 9 .. 128])

156

parallel filter

using pvreduce

(defn pvfilter [pred v] (letfn [(par-filt [v x] (cond (vector? x) (apply reduce conj v x) (even? x) (conj v x) :else v))] (pvreduce par-filt [] v)) (pvfilter even? [1 2 3 4 5 6 7 8 9 .. 128]) (reduce par-filt (reduce (reduce (reduce (reduce

par-filt par-filt par-filt par-filt

157

[] [] [] []

[1 2 3 [33 34 [65 66 [97 98

4 .. 32]) 35 36 .. 64]) 67 68 .. 96]) 99 100 .. 128]))

parallel filter

using pvreduce



[] [] [] []

[1 2 3 [33 34 [65 66 [97 98

4 .. 32]) 35 36 .. 64]) 67 68 .. 96]) 99 100 .. 128]))

(apply reduce conj [[2 4 6 8 .. 32] [34 36 38 40 .. 64] [66 68 70 72 .. 96] [98 100 102 104 .. 128]])

158

parallel filter

using pvreduce



[] [] [] []

[1 2 3 [33 34 [65 66 [97 98

4 .. 32]) 35 36 .. 64]) 67 68 .. 96]) 99 100 .. 128]))

(apply reduce conj [[2 4 6 8 .. 32] [34 36 38 40 .. 64] [66 68 70 72 .. 96] [98 100 102 104 .. 128]]) [2 4 6 8 .. 128] 159

pvfilter performance characteristics

160

pvfilter

compared to filter with 2 cores * results are the median of 50 samples, where a test predicate was pvfiltered over a vector of 100 values

Tf < 0.05 msec

0.05 < Tf < 0.1 msec

Tf > 0.1 msec 161

references

1.

Brian Goetz Fork/Join talk: http://www.infoq.com/presentations/brian-goetz-concurrent-parallel

2.

Fork/Join API: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166ydocs/

3.

Developer’s Works article: http://www.ibm.com/developerworks/java/library/j-jtp11137.html

4.

Java Concurrency Wiki: http://artisans-serverintellect-com.si-eioswww6.com/default.asp?W1

5.

JVM summit slides: http://wiki.jvmlangsummit.com/images/f/f0/Lea-fj-jul10.pdf

6.

Fork/Join Paper: http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.42.1918&rep=rep1&type=pdf

162

questions?

163

thank you

parallel frequency

using pvreduce

(defn pvfreq [x] (letfn [(par-freq [m x] (if (map? x) (merge-with #(+ %1 %2) m x) (update-in m [x] #(if % 1 (inc %)))))] (pvreduce par-freq {} x) (pvfreq [:foo :bar :baz :bar .. :foo])

165

parallel frequency

using pvreduce

(defn pvfreq [x] (letfn [(par-freq [m x] (if (map? x) (merge-with #(+ %1 %2) m x) (update-in m [x] #(if % 1 (inc %)))))] (pvreduce par-freq {} x) (pvfreq [:foo :bar :baz :bar .. :foo]) (reduce par-freq (reduce (reduce (reduce (reduce

par-freq par-freq par-freq par-freq

166

{} {} {} {}

[:foo [:bar [:baz [:bar

:foo :foo :bar :baz

.. .. .. ..

:bar]) :foo]) :foo]) :baz]))

parallel frequency

using pvreduce

(defn pvfreq [x] (letfn [(par-freq [m x] (if (map? x) (merge-with #(+ %1 %2) m x) (update-in m [x] #(if % 1 (inc %)))))] (pvreduce par-freq {} x) (pvfreq [:foo :bar :baz :bar .. :foo]) (reduce par-freq (reduce (reduce (reduce (reduce (reduce par-freq [{:foo {:foo {:foo {:foo


{} {} {} {}


:foo :foo :bar :baz

15, :bar 10, :baz 7} 10, :bar 12, :baz 5} 7, :bar 14, :baz 11} 12, :bar 10, :baz 10}])

167

.. .. .. ..


parallel frequency

using pvreduce

(defn pvfreq [x] (letfn [(par-freq [m x] (if (map? x) (merge-with #(+ %1 %2) m x) (update-in m [x] #(if % 1 (inc %)))))] (pvreduce par-freq {} x) (pvfreq [:foo :bar :baz :bar .. :foo]) (reduce par-freq (reduce (reduce (reduce (reduce (reduce par-freq [{:foo {:foo {:foo {:foo


{} {} {} {}


:foo :foo :bar :baz

15, :bar 10, :baz 7} 10, :bar 12, :baz 5} 7, :bar 14, :baz 11} 12, :bar 10, :baz 10}])

{:foo 44, :bar 46, :baz 33} 168

.. .. .. ..


David Edgar Liebke - Incanter

David Edgar Liebke - Incanter

Suggest Documents

David Edgar Liebke - Incanter

Working with Incanter Datasets - WordPress.com

David-Edgar-Love-Daughter-Amber-Wold-Letter-to-NBC.pdf

Edgar AlienPoe

Edgar Wind

2014 Edgar Nominations - The Edgar Awards

198. Edgar

EDGAR ANDERSON

Edgar Canaan

2012 Edgar Nominations - Press Release_docx - The Edgar Awards

Edgar Lee Masters

Edgar Bezerra.pdf - Google Drive

Edgar Wallace - opus4.kobv.de

EDGAR WALLACE - TiC

James Dean - Edgar Motorsport

Edgar Allan Poe Festival

EDGAR Filer Manual (Volume II) EDGAR Filing (Version 26)

EDGAR Filer Manual (Volume II) EDGAR Filing (Version 17) - SEC.gov

2011 Edgar Winners - Press Release - The Edgar Awards

EDGAR Filer Manual (Volume II) EDGAR Filing (Version 25)

2013 Edgar Nominations - Press Release_docx - The Edgar Awards

Edgar Rice Burroughs Edgar Rice Burroughs and the Wobblies and ...

CENTRE EDGAR CAYCE - Angelfire

Edgar Cayce Readings - Ning