0,3. 0,35. 0,4. 8. 16. 32. 48. 112. 240. Key Size in bytes linear + PF binary + PF .... Pentium 4. Processor. 1Mb. 32. MS VC++ 6.0. Compiler. Cache. Addressing ...
Making CSB+-Trees Processor Conscious Michael L. Samuel Anders U. Pedersen Philippe Bonnet University of Copenhagen
Goal Make CSB+-Trees processor conscious Incorporate our CSB+-Tree variant into MySQL
Related work Rao & Ross, 2000: Making B+-Trees Cache Conscious in Main Memory
Hankins & Patel, 2003: Effect of Node Size on the Performance of CSB+-Trees
Chen, Gibbons & Mowry, 2001: Improving Index Performance through Prefetching
Approach Identify processor sensitive indexparameters Study performance impact of parameters Isolated Inside MySQL Memory storage engine
Construct configuration table for each platform
CSB+-tree parameters Data structure Node size Fill factor Tree height Pointer size
Operations Searching in nodes Compare method Prefetching
Benchmark workload Point query Range scan Insertion
Benchmark parameters Node size Key size Node search method
Optimal node sizes, Itanium 2 1792 1536
linear + PF
1280
binar y + PF
1024
inser t ion ( lin)
768 512 256 0 8
16
32
48
Key size in byt es
112
240
Optimal node sizes, P4
2048 Linear
1792
Binary
1536
Insert(Lin)
1280 1024 768 512 256 0 8
16
32
48
112
240
Node search running times, Itanium 2
0,4 0,35
linear + PF
0,3
binar y + PF
0,25 0,2 0,15 0,1 0,05 0 8
16
32
48
Key Size in byt es
112
240
Node search running times, P4 0,6 Linear 0,5
Binary
0,4 0,3 0,2 0,1 0 8
16
32
48
112
240
Point query Linear+PF
Binary+PF
Linear
Binary
Time in seconds
0,5 0,4 0,3 0,2 0,1 0 128
256
384
512
768
Node size in bytes
1024
Impact of number of keys Optim al node size in bytes
8B-keys
32B-keys
640 512 384 256 128 0 10k keys
100k keys
1M keys
Number of keys
10M keys
Future work Impact of other parameters Profile-guided optimization Benchmark on more platforms Indirect indexing Make it self-configuring and self-tuning Key comparisons and key types in MySQL Compare to existing access methods in MySQL
Wrap up Identified distinct differences between processors/architectures Portable implementation with several adjustable parameters Preliminary MySQL implementation
Benchmark platform: Itanium 2 Addressing
64 bits
Clock frequency
900MHz
Number of processors
2
RAM L1/L2/L3 (total size/line size)
4GB 16KB/64B; 256KB/128B; 1,5MB/128B
OS
Debian GNU/Linux 2.4.25
Compiler
Intel C/C++ 8.0.066
EPIC No rearranging of instructions at run-time High-level prefetching
Point query - decomposed Other stalls
Dcache stalls
B
y, 5
12 ar
,5 Bi n
ar
12
B
B Li
ar
Bi n
ne
y, 3
84 ,3
ar
84
B
B Li
ar
Bi n
ne
y, 2
56
B 56
,2 ar
ne Li
ar
Bi n
ne
ar
,1
y, 1
28
28
B
B
30000 25000 20000 15000 10000 5000 0
Li
Cycles
Busy
Search m ethod and node size in bytes
Point query: response time Itanium 2 0,4 0,35
linear + PF
0,3
binary + PF
0,25 0,2 0,15 0,1 0,05 0 8
16
32
48
Key Size in bytes
112
240
0,25 0,2 0,15 0,1 0,05 0
Scan
,2 40 )
(1 53 6
,1 12 )
(1 15 2
48 ) (5 12 ,
32 ) (3 84 ,
16 ) (3 84 ,
8)
Scan+PF
(1 28 ,
Time in seconds
Range scan: response time
(Node size, key size) in bytes
(2 56 ,1 6) (3 84 ,3 2) (5 12 ,4 8) (7 68 ,1 12 ) (1 79 2, 24 0)
6,000 5,000 4,000 3,000 2,000 1,000 0,000 (1 28 ,8 )
Time in seconds
Insertion: response time
(Node size, Key size) in bytes
Optimal node sizes
Node size in bytes
1792 1536
linear + PF
1280
binary + PF
1024
insertion
768 512 256 0 8
16
32
48
Key size in bytes
112
240
Benchmark platform: Pentium Processor
Pentium 4
Pentium M
Clock frequency
3 Ghz
1,3 Ghz
RAM
1024MB
512MB
Addressing Number of processors
32
OS
Windows XP, SP2
Cache
1Mb
Compiler
MS VC++ 6.0
1
Out-of-order execution logic
Point query
Tim e in s e c onds
100k searches after bulkloading 400k keys of 8B 0,14 0,12 0,1 0,08 0,06 0,04 0,02 0
P4, Lin P4, Bin PM, Lin PM, Bin
128
256
384
512
Node size in bytes
768
1024
Point query: Response time Response time for the optimal key size, node size combinations, Pentium 0,12 T im e in s e c o n d s
0,1 Pentium4, Bin
0,08
PentiumM, Bin
0,06
Pentium4, Lin
0,04
PentiumM, Lin
0,02 0 8
16
32 Key size in bytes
48
112
Initial MySQL results Linear + PF
Binary + PF
Spec-linear+PF
Spec-binary+PF
Time in seconds
25 20 15 10 5 0 128
256
384
512
640
768
Node size in bytes
896
1024
Index structure