Understanding Manycore Scalability of File Systems - Taesoo Kim

1 downloads 109 Views 2MB Size Report
Exim mail server on RAMDISK btrfs ext4 ... Exim email server at 80 cores. RAMDISK ...... Mostly use memory file system t
Understanding Manycore Scalability of File Systems Changwoo Min, Sanidhya Kashyap, Steffen Maass Woonhak Kang, and Taesoo Kim

Application must parallelize I/O operations ●



Death of single core CPU scaling –

CPU clock frequency: 3 ~ 3.8 GHz



# of physical cores: up to 24 (Xeon E7 v4)

From mechanical HDD to flash SSD –

IOPS of a commodity SSD: 900K



Non-volatile memory (e.g., 3D XPoint): 1,000x ↑

But file systems become a scalability bottleneck

Problem: Lack of understanding in internal scalability behavior Exim mail server on RAMDISK 14k

messages/sec

12k

btrfs

F2FS

ext4

XFS

10k

Embarrassingly parallel application!

1. Saturated

8k 6k 4k 2k 0k

2. Collapsed 0

10 ● ●

20

30

40 #core

50

60

70

Intel 80-core machine: 8-socket, 10-core Xeon E7-8870 RAM: 512GB, 1TB SSD, 7200 RPM HDD

3

80

3. Never scale

Even in slower storage medium file system becomes a bottleneck Exim email server at 80 cores 12k

RAMDISK SSD HDD

messages/sec

10k 8k 6k 4k 2k 0k

btrfs

ext4

F2FS

4

XFS

Outline ●

Background



FxMark design –

A file system benchmark suite for manycore scalability



Analysis of five Linux file systems



Pilot solution



Related work



Summary 5

Research questions ●

What file system operations are not scalable?



Why they are not scalable?



Is it the problem of implementation or design?

6

Technical challenges ●

Applications are usually stuck with a few bottlenecks → cannot see the next level of bottlenecks before resolving them → difficult to understand overall scalability behavior



How to systematically stress file systems to understand scalability behavior 7

FxMark: evaluate & analyze manycore scalability of file systems FxMark: File systems:

tmpfs Memory FS

ext4

XFS

J/NJ

Journaling FS

Storage medium: # core:

3 applications

19 micro-benchmarks

btrfs

F2FS

CoW FS

Log FS

SSD

1, 2, 4, 10, 20, 30, 40, 50, 60, 70, 80 8

FxMark: evaluate & analyze manycore scalability of file systems FxMark: File systems:

tmpfs Memory FS

ext4

XFS

>4,700

J/NJ

Journaling FS

Storage medium: # core:

3 applications

19 micro-benchmarks

btrfs

F2FS

CoW FS

Log FS

SSD

1, 2, 4, 10, 20, 30, 40, 50, 60, 70, 80 9

Microbenchmark: unveil hidden scalability bottlenecks Data block read Low Sharing Level



Medium

R

R

File

High

R R

Block

R R

Process

Operation

R

Legend:

10

Stress different components with various sharing levels

11

Evaluation ●

Data block read

R

R

Linear scalability

250

Low:

200

File systems:

M ops/sec

Legend

btrfs ext4 ext4NJ F2FS tmpfs XFS

150

100

50

0

Storage medium: 12

0

10

20

30

40 50 #core

60

70

80

Outline ●

Background



FxMark design



Analysis of five Linux file systems –

What are scalability bottlenecks?



Pilot solution



Related work



Summary 13

Summary of results: file systems are not scalable DRBM 9

140

1.8

9

1.6

8

3.5

1.4

7

1.2

6

M ops/sec

100

5 4

100 80 60

3 50

50

40

2

20

1 0

0

10

20

30

40 50 #core

60

70

0

80

0

10

20

30

DWSL

0

80

0

10

20

30

40 50 #core

MRPL

60

70

0

80

0

10

20

30

MRPM

40 50 #core

60

70

0.6

3

0.4

2

0.2

1

0

80

0

10

20

30

MRPH

40 50 #core

60

70

0

80

0

70

8

4.5

450

7

4

400

7

60

3.5

350

3

300

4 3

2.5 2

50

1

0

0

0

0

70

80

0

10

20

30

MWCM

60

70

80

0.4

0.45

0.35

0.4

0.2 0.15

0.3 0.25 0.2

60

70

0

80

0

10

20

30

40 50 #core

60

70

0

80

0.4

0.4

90k

0.35

80k

0.25 0.2 0.15 0.1

0.05 0

0.3

messages/sec

M ops/sec

0.1

10

20

30

40 50 #core

60

70

80

0

40 50 #core

60

70

80

10

20

30

40 50 #core

60

70

80

20

30

40 50 #core

60

70

80

1.5 1 0.5

0

10

20

30

40 50 #core

60

70

0

80

0.2 0.15

0

10

20

30

40 50 #core

60

70

80

70k 60k

400

50k 40k 30k

40 50 #core

60

70

80

70

0

10

20

30

40 50 #core

8 6 4 2

0

10

20

30

40 50 #core

14

60

70

80

60

70

80

0

0

10

20

30

40 50 #core

0

0

60

70

80

80

10

20

30

40 50 #core

60

70

80

70

80

DRBM:O_DIRECT

0.45

0.45

0.4

0.4

0.35

0.35

0.3 0.25 0.2

0.3 0.25 0.2 0.15

0.1

0.1

0.05

0.05 0 0

10

20

30

40 50 #core

btrfs ext4 ext4NJ F2FS tmpfs XFS

10

70

1

0.5

Legend

12

60

1.5

0.5

0

80

14

300

0

60

40 50 #core

0.5

16

100 30

40 50 #core DBENCH

200

20

30

30

2

18

600

10

20

RocksDB 700

0

10

20

MWCL

0.15

0

10

DRBL:O_DIRECT

0.05

0

0

2.5

MWRM

0.3

500

0k

10

0.1

10k 0

0

0.25

20k

0.05 0

30

0.35

Exim

0.35

0.2

20

0

80

2

MWRL

0.3

100k

0.15

10

2

DWOM:O_DIRECT

0.25

0

2.5

0.4

0.45

0.3

80

0.1

DWOL:O_DIRECT 0.45

70

0.2

0.05 40 50 #core

60

0.5

0.1

0.05

40 50 #core

0.6

0.15

0.1

30

30

MWUM

M ops/sec

M ops/sec

0.25

20

20

0.7

0.35

0.3

10

10

M ops/sec

0.5

0

0

MWUL

0.45

0

40 50 #core

M ops/sec

60

GB/sec

40 50 #core

ops/sec

30

70

3

100

1

0

20

200

1

10

0

60

4

0.5

20 10

250 150

2

40 50 #core

5

1.5

20

0

M ops/sec

5

30

6

M ops/sec

30

M ops/sec

M ops/sec

40

M ops/sec

120

60

20

MRDM 8

80

10

MRDL 500

6

2 1.5 1

5

50

2.5

0.5

9

M ops/sec

M ops/sec

70

4

80

40

M ops/sec

60

5

140

100

M ops/sec

40 50 #core

1 0.8

3 M ops/sec

120

6

M ops/sec

M ops/sec

M ops/sec

100

DWTL 4

8

150

DWAL 10

7 150

DWOM 2

M ops/sec

200

DWOL 160

M ops/sec

200

DRBH 10

M ops/sec

250

M ops/sec

DRBL 250

60

70

80

0

10

20

30

40 50 #core

60

Summary of results: file systems are not scalable DRBM 9

140

1.8

9

1.6

8

3.5

1.4

7

1.2

6

M ops/sec

100

5 4

100 80 60

3 50

50

40

2

20

1 0

0

10

20

30

40 50 #core

60

70

0

80

0

10

20

30

DWSL

0

80

0

10

20

30

40 50 #core

MRPL

60

70

0

80

0

10

20

30

MRPM

40 50 #core

60

70

0.6

3

0.4

2

0.2

1

0

80

0

10

20

30

MRPH

40 50 #core

60

70

0

80

0

70

8

4.5

450

7

4

400

7

60

3.5

350

3

300

4 3

2.5 2

50

1

0

0

0

0

70

80

0

10

20

30

MWCM

60

70

80

0.4

0.45

0.35

0.4

0.2 0.15

0.3 0.25 0.2

60

70

0

80

0

10

20

30

40 50 #core

60

70

0

80

0.4

0.4

90k

0.35

80k

0.25 0.2 0.15 0.1

0.05 0

0.3

messages/sec

M ops/sec

0.1

10

20

30

40 50 #core

60

70

80

0

40 50 #core

60

70

80

10

20

30

40 50 #core

60

70

80

20

30

40 50 #core

60

70

80

1.5 1 0.5

0

10

20

30

40 50 #core

60

70

0

80

0.2 0.15

0

10

20

30

40 50 #core

60

70

80

70k 60k

400

50k 40k 30k

40 50 #core

60

70

80

70

0

10

20

30

40 50 #core

8 6 4 2

0

10

20

30

40 50 #core

15

60

70

80

60

70

80

0

0

10

20

30

40 50 #core

0

0

60

70

80

80

10

20

30

40 50 #core

60

70

80

70

80

DRBM:O_DIRECT

0.45

0.45

0.4

0.4

0.35

0.35

0.3 0.25 0.2

0.3 0.25 0.2 0.15

0.1

0.1

0.05

0.05 0 0

10

20

30

40 50 #core

btrfs ext4 ext4NJ F2FS tmpfs XFS

10

70

1

0.5

Legend

12

60

1.5

0.5

0

80

14

300

0

60

40 50 #core

0.5

16

100 30

40 50 #core DBENCH

200

20

30

30

2

18

600

10

20

RocksDB 700

0

10

20

MWCL

0.15

0

10

DRBL:O_DIRECT

0.05

0

0

2.5

MWRM

0.3

500

0k

10

0.1

10k 0

0

0.25

20k

0.05 0

30

0.35

Exim

0.35

0.2

20

0

80

2

MWRL

0.3

100k

0.15

10

2

DWOM:O_DIRECT

0.25

0

2.5

0.4

0.45

0.3

80

0.1

DWOL:O_DIRECT 0.45

70

0.2

0.05 40 50 #core

60

0.5

0.1

0.05

40 50 #core

0.6

0.15

0.1

30

30

MWUM

M ops/sec

M ops/sec

0.25

20

20

0.7

0.35

0.3

10

10

M ops/sec

0.5

0

0

MWUL

0.45

0

40 50 #core

M ops/sec

60

GB/sec

40 50 #core

ops/sec

30

70

3

100

1

0

20

200

1

10

0

60

4

0.5

20 10

250 150

2

40 50 #core

5

1.5

20

0

M ops/sec

5

30

6

M ops/sec

30

M ops/sec

M ops/sec

40

M ops/sec

120

60

20

MRDM 8

80

10

MRDL 500

6

2 1.5 1

5

50

2.5

0.5

9

M ops/sec

M ops/sec

70

4

80

40

M ops/sec

60

5

140

100

M ops/sec

40 50 #core

1 0.8

3 M ops/sec

120

6

M ops/sec

M ops/sec

M ops/sec

100

DWTL 4

8

150

DWAL 10

7 150

DWOM 2

M ops/sec

200

DWOL 160

M ops/sec

200

DRBH 10

M ops/sec

250

M ops/sec

DRBL 250

60

70

80

0

10

20

30

40 50 #core

60

Summary of results: file systems are not scalable DRBM 9

140

1.8

9

1.6

8

3.5

1.4

7

1.2

6

M ops/sec

100

5 4

100 80 60

3 50

50

40

2

20

1 0

0

10

20

30

40 50 #core

60

70

0

80

0

10

20

30

DWSL

0

80

0

10

20

30

40 50 #core

MRPL

60

70

0

80

0

10

20

30

MRPM

40 50 #core

60

70

0.6

3

0.4

2

0.2

1

0

80

0

10

20

30

MRPH

40 50 #core

60

70

0

80

0

70

8

4.5

450

7

4

400

7

60

3.5

350

3

300

4 3

2.5 2

50

1

0

0

0

0

70

80

0

10

20

30

MWCM

60

70

80

0.4

0.45

0.35

0.4

0.2 0.15

0.3 0.25 0.2

60

70

0

80

0

10

20

30

40 50 #core

60

70

0

80

0.4

0.4

90k

0.35

80k

0.25 0.2 0.15 0.1

0.05 0

0.3

messages/sec

M ops/sec

0.1

10

20

30

40 50 #core

60

70

80

0

40 50 #core

60

70

80

10

20

30

40 50 #core

60

70

80

20

30

40 50 #core

60

70

80

1.5 1 0.5

0

10

20

30

40 50 #core

60

70

0

80

0.2 0.15

0

10

20

30

40 50 #core

60

70

80

70k 60k

400

50k 40k 30k

40 50 #core

60

70

80

70

0

10

20

30

40 50 #core

8 6 4 2

0

10

20

30

40 50 #core

16

60

70

80

60

70

80

0

0

10

20

30

40 50 #core

0

0

60

70

80

80

10

20

30

40 50 #core

60

70

80

70

80

DRBM:O_DIRECT

0.45

0.45

0.4

0.4

0.35

0.35

0.3 0.25 0.2

0.3 0.25 0.2 0.15

0.1

0.1

0.05

0.05 0 0

10

20

30

40 50 #core

btrfs ext4 ext4NJ F2FS tmpfs XFS

10

70

1

0.5

Legend

12

60

1.5

0.5

0

80

14

300

0

60

40 50 #core

0.5

16

100 30

40 50 #core DBENCH

200

20

30

30

2

18

600

10

20

RocksDB 700

0

10

20

MWCL

0.15

0

10

DRBL:O_DIRECT

0.05

0

0

2.5

MWRM

0.3

500

0k

10

0.1

10k 0

0

0.25

20k

0.05 0

30

0.35

Exim

0.35

0.2

20

0

80

2

MWRL

0.3

100k

0.15

10

2

DWOM:O_DIRECT

0.25

0

2.5

0.4

0.45

0.3

80

0.1

DWOL:O_DIRECT 0.45

70

0.2

0.05 40 50 #core

60

0.5

0.1

0.05

40 50 #core

0.6

0.15

0.1

30

30

MWUM

M ops/sec

M ops/sec

0.25

20

20

0.7

0.35

0.3

10

10

M ops/sec

0.5

0

0

MWUL

0.45

0

40 50 #core

M ops/sec

60

GB/sec

40 50 #core

ops/sec

30

70

3

100

1

0

20

200

1

10

0

60

4

0.5

20 10

250 150

2

40 50 #core

5

1.5

20

0

M ops/sec

5

30

6

M ops/sec

30

M ops/sec

M ops/sec

40

M ops/sec

120

60

20

MRDM 8

80

10

MRDL 500

6

2 1.5 1

5

50

2.5

0.5

9

M ops/sec

M ops/sec

70

4

80

40

M ops/sec

60

5

140

100

M ops/sec

40 50 #core

1 0.8

3 M ops/sec

120

6

M ops/sec

M ops/sec

M ops/sec

100

DWTL 4

8

150

DWAL 10

7 150

DWOM 2

M ops/sec

200

DWOL 160

M ops/sec

200

DRBH 10

M ops/sec

250

M ops/sec

DRBL 250

60

70

80

0

10

20

30

40 50 #core

60

Summary of results: file systems are not scalable DRBM 9

140

1.8

9

1.6

8

3.5

1.4

7

1.2

6

M ops/sec

100

5 4

100 80 60

3 50

50

40

2

20

1 0

0

10

20

30

40 50 #core

60

70

0

80

0

10

20

30

DWSL

0

80

0

10

20

30

40 50 #core

MRPL

60

70

0

80

0

10

20

30

MRPM

40 50 #core

60

70

0.6

3

0.4

2

0.2

1

0

80

0

10

20

30

MRPH

40 50 #core

60

70

0

80

0

70

8

4.5

450

7

4

400

7

60

3.5

350

3

300

4 3

2.5 2

50

1

0

0

0

0

70

80

0

10

20

30

MWCM

60

70

80

0.4

0.45

0.35

0.4

0.2 0.15

0.3 0.25 0.2

60

70

0

80

0

10

20

30

40 50 #core

60

70

0

80

0.4

0.4

90k

0.35

80k

0.25 0.2 0.15 0.1

0.05 0

0.3

messages/sec

M ops/sec

0.1

10

20

30

40 50 #core

60

70

80

0

40 50 #core

60

70

80

10

20

30

40 50 #core

60

70

80

20

30

40 50 #core

60

70

80

1.5 1 0.5

0

10

20

30

40 50 #core

60

70

0

80

0.2 0.15

0

10

20

30

40 50 #core

60

70

80

70k 60k

400

50k 40k 30k

40 50 #core

60

70

80

70

0

10

20

30

40 50 #core

8 6 4 2

0

10

20

30

40 50 #core

17

60

70

80

60

70

80

0

0

10

20

30

40 50 #core

0

0

60

70

80

80

10

20

30

40 50 #core

60

70

80

70

80

DRBM:O_DIRECT

0.45

0.45

0.4

0.4

0.35

0.35

0.3 0.25 0.2

0.3 0.25 0.2 0.15

0.1

0.1

0.05

0.05 0 0

10

20

30

40 50 #core

btrfs ext4 ext4NJ F2FS tmpfs XFS

10

70

1

0.5

Legend

12

60

1.5

0.5

0

80

14

300

0

60

40 50 #core

0.5

16

100 30

40 50 #core DBENCH

200

20

30

30

2

18

600

10

20

RocksDB 700

0

10

20

MWCL

0.15

0

10

DRBL:O_DIRECT

0.05

0

0

2.5

MWRM

0.3

500

0k

10

0.1

10k 0

0

0.25

20k

0.05 0

30

0.35

Exim

0.35

0.2

20

0

80

2

MWRL

0.3

100k

0.15

10

2

DWOM:O_DIRECT

0.25

0

2.5

0.4

0.45

0.3

80

0.1

DWOL:O_DIRECT 0.45

70

0.2

0.05 40 50 #core

60

0.5

0.1

0.05

40 50 #core

0.6

0.15

0.1

30

30

MWUM

M ops/sec

M ops/sec

0.25

20

20

0.7

0.35

0.3

10

10

M ops/sec

0.5

0

0

MWUL

0.45

0

40 50 #core

M ops/sec

60

GB/sec

40 50 #core

ops/sec

30

70

3

100

1

0

20

200

1

10

0

60

4

0.5

20 10

250 150

2

40 50 #core

5

1.5

20

0

M ops/sec

5

30

6

M ops/sec

30

M ops/sec

M ops/sec

40

M ops/sec

120

60

20

MRDM 8

80

10

MRDL 500

6

2 1.5 1

5

50

2.5

0.5

9

M ops/sec

M ops/sec

70

4

80

40

M ops/sec

60

5

140

100

M ops/sec

40 50 #core

1 0.8

3 M ops/sec

120

6

M ops/sec

M ops/sec

M ops/sec

100

DWTL 4

8

150

DWAL 10

7 150

DWOM 2

M ops/sec

200

DWOL 160

M ops/sec

200

DRBH 10

M ops/sec

250

M ops/sec

DRBL 250

60

70

80

0

10

20

30

40 50 #core

60

Data block read DRBL 250

R

R

All file systems linearly scale

200 M ops/sec

Low:

150 100 50 0

0

10

20

30

40 50 #core

60

70

80

DRBM

R

R

XFS shows performance collapse

200 M ops/sec

Medium:

250

150 100

XFS

50 0

0

10

20

30

40 50 #core

60

70

80

DRBH

R

All file systems show performance collapse

9 8 7 M ops/sec

High:

R

10

6 5 4 3 2 1 0

0

10

20

30

40 50 #core

60

70

80

18

Page cache is maintained for efficient access of file data OS Kernel

2. look up a page cache 1. read a file block

R Page cache 5. copy page

3. cache miss 4. read a page from disk

Disk

19

Page cache hit OS Kernel

2. look up a page cache

1. read a file block

R Page cache 4. copy page

3. cache hit

Disk

20

Page cache can be evicted to secure free memory OS Kernel

Page cache

Disk

21

… only when not being accessed OS Kernel

1. read a file block

Reference counting is used to track # of accessing tasks

R Page cache 4. copy page

access_a_page(...) { atomic_inc(&page->_count); ... atomic_dec(&page->_count); }

Disk

22

Reference counting becomes a scalability bottleneck R

R

R

...

R

R

10 9 8 7

M ops/sec

R

20 CPI DRBH (cycles-per-instruction)

access_a_page(...) { atomic_inc(&page->_count); ... atomic_dec(&page->_count); }

6

100 CPI (cycles-per-instruction)

5 4 3 2 1 0

23

0

10

20

30

40 50 #core

60

70

80

Reference counting becomes a scalability bottleneck R

R

R

...

R

R

10 9 8 7

M ops/sec

R

20 CPI DRBH (cycles-per-instruction)

access_a_page(...) { atomic_inc(&page->_count); ... atomic_dec(&page->_count); }

6

100 CPI (cycles-per-instruction)

5 4 3 2 1 0

High contention on a page reference counter → Huge memory stall 0

10

20

30

40 50 #core

60

70

80

Many more: directory entry cache, XFS inode, etc 24

Lessons learned

High locality can cause performance collapse

Cache hit should be scalable → When the cache hit is dominant, the scalability of cache hit does matter.

25

Data block overwrite DWOL

W

Ext4, F2FS, and btrfs show performance collapse

140 120 M ops/sec

Low:

W

160

100 80 60 40

ext4

20 0

0

10

20

30

40 50 #core

F2FS btrfs 60

70

80

DWOM

All file systems degrade gradually

1.8 1.6 1.4 M ops/sec

Medium:

W W

2

1.2 1 0.8 0.6 0.4 0.2 0

0

10

20

30

40 50 #core

60

70

80

26

Btrfs is a copy-on-write (CoW) file system ●

Directs a write to a block to a new copy of the block → Never overwrites the block in place → Maintain multiple versions of a file system image W

Time T

Time T

Time T+1

CoW triggers disk block allocation for every write W

Time T

Time T+1

Block Allocation

Block Allocation

Disk block allocation becomes a bottleneck Ext4 → journaling, F2FS → checkpointing 28

Lessons learned Overwriting could be as expensive as appending → Critical at log-structured FS (F2FS) and CoW FS (btrfs)

Consistency guarantee mechanisms should be scalable → Scalable journaling → Scalable CoW index structure → Parallel log-structured writing 29

Data block overwrite DWOL

W

Ext4, F2FS, and btrfs show performance collapse

140 120 M ops/sec

Low:

W

160

100 80 60 40

ext4

20 0

0

10

20

30

40 50 #core

F2FS btrfs 60

70

80

DWOM

All file systems degrade gradually

1.8 1.6 1.4 M ops/sec

Medium:

W W

2

1.2 1 0.8 0.6 0.4 0.2 0

0

10

20

30

40 50 #core

60

70

80

Entire file is locked regardless of update range ●

All tested file systems hold an inode mutex for write operations –

Range-based locking is not implemented

***_file_write_iter(...) { mutex_lock(&inode->i_mutex); ... mutex_unlock(&inode->i_mutex); }

Lessons learned A file cannot be concurrently updated – Critical for VM and DBMS, which manage large files

Need to consider techniques used in parallel file systems → E.g., range-based locking

Summary of findings ●

High locality can cause performance collapse



Overwriting could be as expensive as appending



A file cannot be concurrently updated



All directory operations are sequential



Renaming is system-wide sequential



Metadata changes are not scalable



Non-scalability often means wasting CPU cycles



Scalability is not portable 33

See

er p a p our

Summary of findings Many themcan arecause unexpected and collapse counter-intuitive ● High of locality performance → Contention at file system level ● Overwriting could be as expensive as appending to maintain data dependencies ● A file cannot be concurrently updated ●

All directory operations are sequential



Renaming is system-wide sequential



Metadata changes are not scalable



Non-scalability often means wasting CPU cycles



Scalability is not portable 34

See

er p a p our

Outline ●

Background



FxMark design



Analysis of five Linux file systems



Pilot solution –

If we remove contentions in a file system, is such file system scalable?



Related work



Summary 35

RocksDB on a 60-partitioned RAMDISK scales better A single-partitioned RAMDISK tmpfs

700

700

600

600

500

500

400

400

ops/sec

ops/sec

A 60-partitioned RAMDISK

300 200

200

100

100

0

0

10

20

30

40 50 #core

60

70

0

80

2.1x

300

0

10

20

30

** Tested workload: DB_BENCH overwrite **

36

btrfs ext4 F2FS tmpfs XFS

40 50 #core

60

70

80

RocksDB on a 60-partitioned RAMDISK scales better A single-partitioned RAMDISK tmpfs

700

700

600

600

500

500

400

400

ops/sec

ops/sec

A 60-partitioned RAMDISK

300 200

200

100

100

0

0

10

20

30

40 50 #core

60

70

0

80

2.1x

300

0

10

20

30

btrfs ext4 F2FS tmpfs XFS

40 50 #core

60

70

80

Reduced contention on file systems ** Tested workload: DB_BENCH overwrite helps improving performance and**scalability 37

But partitioning makes performance worse on HDD

ops/sec

A single-partitioned HDD F2FS

A 60-partitioned HDD

500

500

400

400

300

300

200

200

100 0

btrfs ext4 F2FS XFS

2.7x

100

0

5

10 #core

15

0

20

0

5

** Tested workload: DB_BENCH overwrite **

38

10

15

20

But partitioning makes performance worse on HDD

ops/sec

A single-partitioned HDD F2FS

A 60-partitioned HDD

500

500

400

400

300

300

200

200

100 0

btrfs ext4 F2FS XFS

2.7x

100

0

5

10 #core

15

0

20

But reduced spatial locality degrades performance → Medium-specific (e.g.,**spatial locality) ** Testedcharacteristics workload: DB_BENCH overwrite should be considered 39

0

5

10

15

20

Related work ●

Scaling operating systems –



Mostly use memory file system to opt out the effect of I/O operations

Scaling file systems –



Scalable file system journaling ●

ScaleFS [MIT:MSThesis'14]



SpanFS [ATC'15]

Parallel log-structured writing on NVRAM ●

NOVA [FAST'16] 40

Summary ●





Comprehensive analysis of manycore scalability of five widely-used file systems using FxMark Manycore scalability should be of utmost importance in file system design New challenges in scalable file system design –



Minimizing contention, scalable consistency guarantee, spatial locality, etc.

FxMark is open source –

https://github.com/sslab-gatech/fxmark 41

Suggest Documents