Scheduling Algorithms for Modern Disk Drives

7 downloads 0 Views 984KB Size Report
scheduling algorithms used simplistic disk models, lacking several important features ... to be hidden by a high-level interface, making this information unavailable ..... its path. It only changes direction at the innermost and out- ermost cylinder. ..... terarriva.1 rates. When the scale factor is one, the simulated interarrival times.
Scheduling Bruce

Algorithms

for

L.\Vorthington,

Department

Gregory

of Electrical

University

subsystem by

quests.

performance

dynamically

Via

can

ordering,

strongly

validated

be

dramatically

scheduling,

or

simulation,

im-

pending

of

complex

logic&

caches

synthetic

on

scheduling

workloads

and

effectiveness.

academia

and

the

workload

exact

tational

user

environments,

traces

captured

we arrive

at three

large

Using

both

six

(I)

Incorporating

scheduler

provides

response which

complex

times

always

highest

such

(3)

among

the

and

for

highest

exploit

with

important

been

for

ing

and

subsystems the

storage

acterized

must

growing

replaced

In

components.

by intense

of pending

bursts

requests

for dynamically

A

portion

prised

of

blocks

on and

account

the

the

the

the

various

can

providing

reasonable

Over gorithms

the

past

have

been

position.

entire

with

disk

years,

proposed

times a

for

variety and

of

be

most

of our

date

appear,

and

notice

these

features

disk

drives. of logi-

by a high-level to

affect

any

exter-

buffers

caches.

published Most

In

the

much

work

of these

me

have

this

paper,

performance

nniformly

can

of

‘(realistic”

utilize

such

We

previous traces

use ranwork,

from

synthetic

ran-

in question

workloads. with

and

that across

using

remains

extensive

traces

detailed,

limited

assume

distributed

be learned

comparison

The

has been

studies

of the results

more

a very

are

highly request

taking

affect

but

various

workloads

well-validated

six

scheduling

traces.

dk.k

this

can

our

and

work data

disk

and and

are

simulator.

5 and traces,

efforts

6 present

avenues

and our

some 3

the

for

our

results

future due

dis-

origins using

Section

excluded

[Wort

how

4 describes

respectively.

suggests

in

and

Section

Section

validation

descriptions,

be found

drives

accuracy.

research.

Sections

marizes

itations,

modern

scheduling

including

Additional

into

accesses,

7 sumresearch.

to space

lim-

94].

a

while

requests.

in

2 describes

can

workloads

2

Modern

(on

a common

A disk

alboth

Data

mation

Its

prefetching

systems.

random

titie

and

of

lacking

mapping

speed-matching

large

While

previous

of the

The

publication

Small

experiments

methodology,

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the the

modern the

unavailable

to allow

to drive

dia.

of

in allow

to be hidden

the validity using

workloads

cusses

com-

scheduling

implemented

to

media

locations

veritied

dom

features

requests.

overhead

individual

drive

studies

models,

information

previously

disk.

Section

char-

scheduler

various By

disk

present

workloads.

starting

used

queues

may

which

tot al positioning

response

disk

pending

the

of disk

previous

process-

long

time

of

associated

the

25

the

delays,

arm

delays

The

service

positions disk

minimize

3].

on

compu-

algorithms.

dom workloads,

that

are often

creating

ordering

positioning relative

current

scheduler

Ruem9

of request

mechanical

dependent

worldoads

of activity,

[MciYu86,

is responsible significant

Disk

how

addition,

request

for

to compen-

between

with

scheduling

until

managed

disparity

this entity.

to synthetic

cache.

be carefully

performance

knowledge Most

simplistic

expanded

to physical

making

red-world Disk

is dependent

scheduler’s

read

position-

provided

blocks

various

Introduction

sate

available

used

has

we investigate sig-

achieves

overall

performance

a prefetching

in

algorithms

reduce

merit the

status.

features

logic

scheduling

which

order,

seek-reducing

nal

Algorithms

(C-LOOK), logical

that

the

provide

workloads

the

current

algorithms

interface,

2YO) decrease (2)

relative

requests,

differ-

into

caches

algorithm

inascending

Algorithms

produce

recognize

1

scan

performance

delays

disk

improvements cyclical

than

algorithms.

prefetching

requests

workloads.

they

seek-reducing

The

schedules

(less

and

and

On-board

conclusions:

information

a marginal

utilize

performance

sequentiality.

ing

for

effectively

nificant

the

only

mapping

Their

of disk

the

and

from main

industry.

resources,

cal data ent

Science

urnich.edu

several prefetching

Patt

48109-2122

configuration

mappings

N.

re-

we examine

to-physicaJ

Drives

Yale

and Computer

scheduling impact

Disk

Ganger,

Ann Arbor

Abstract Disk

R.

Engineering

of Michigan, worthing~eecs.

proved

Modern

is given

copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

SiGMETRICS 94- 5/94 Santa Clara, CA. USA 0 1994 ACM 0-89791 -659-xKWO005..$3.5O

241

drive

ten

unit

of data

error

tracks

(one

from

of the

disk.

For

a set

coated

writ

holds (e. g.,

Drives

contains

axis)

are

minimum

commonly

that

Disk

in

storage

data).

surface)

example,

the

plus

rotating with

a sector,

innermost

me-

which

header/trailer

A cylinder

equidistant

platters

magnetic

on each surface.

is usually

of data

correction

sides

tracks

circular

512? bytes

each

of rapidly

on both

from cylinder

infor-

is a set of the

center

contains

the

innermost

media

track

location

tor.

The

disk

arm

be positioned

to

together

that

such

a specific

of

surface.

by

holds

any

a given

given

Therefore,

cylinder,

cylinder. disk

In most

at any

its

heads

The

arm

drives

heads

position only

and

sec-

that

can

The logic

are ganged

corresponds

one

use of memory

has

progressed

age.

head

The

cache

time.

disk

serviced

more

mechanical

(seek)

to the

by

the

delays. target

cylinder.

while

waiting

for

head.

After

these

spans

multiple

the

may

disk

target

delays

suffer

arm

one

must

rotational

the

result

or

that

move

to

data

the

reach

on-board The

be-

Current affect

desired

data

are

the

drives

the

accuracy

result

of of

self-managed. pacity goal

teristics

work

should

to

make

also

factor

into

disks

have

determine

that

can

algorithms.

in

a disk

a major

exactly

scheduling

drive’s

ca-

Host

ing

power

are

to present

details from

number

of the

subsequent

the

agement

self the

host. overhead

the

host

other

have

media,

various

little

the

the

cached

2.2

Data

Disk

to the

of

delays

and

systems

disk

drive

size.

of

of

PBN)

mappings.

highly

accurate

ment

cache if

can the

cache

disk

of the

scheduling is

configured

read/write

by knowledge

Iengt

h.

The

be

accessed

disk

media.

will

be given

cylinder.

quickly

Requests

therefore

possibly

next

more

most

continue

request,

to the far

head

of the

disk

read

or seeking

could

affects

the

position

heads

than

that

can

higher

be

priority.

Algorithms

data

data

disk

such

LBN-based

proposed

logical More

The

recording,

to

aggressive

knowledge mappings

[Ruem94,

block

of the may

complexity track/cylinder

be

physical

data

unavailable

of the

mappings skew,

and

that

disk

often

scheduling

higher

about

First Come

a simple pohcy

results

algorithms

performance

individual

in have

by

taking

requests

and

the

subsystem.

Reduction

ago,

[Denn6i]

analyzed

the

advantages

of

it-

and

the

request

that

ally

on

the

on

First

request

will

average

increases

with time

distances.

over

for

disk

a wide

starvation

a subset to

of that

range

of the

may

SSTF

reduces

be

workloads. requests

in excessive

workload,

cylinders

region,

is usu-

SSTF

of

resulting a heavy

that

but

pending It

of individual

utibation, Given

algorithm the

delay.

seek

requests

outside

seek

by using time

over

all

smallest

times,

variance.

to hover

This

seek

potential the

policy. by selecting

exact

response the

requests

the

to predict

However,

(SSTF)

to service

incur

approximated

to exhaust

the

Time

next

infeasible

closely

the

Seek

chooses

in

thereby

re-

the an

arm

attempt

starving

any

space.

(L BN-tomay

require

[Denn67]

As mentioned or

of the

25 years

a Shortest

of seek-

block

layout.

achieve

Seek Delay

Over

On

layout

rehes

algorithms

Many

information

state

shown scheduling

mm-

request.

scheduling

that

account

3.1

drive

approximations

LBN-based

have disk

The

storage

cache,

each

studies (FCFS)

performance.

current

hidden the

of the

of the

with

host

electronics.

outside

been into

starting

typically

actual

on-board

associated

use

algorithms.

to obtain.

into

controller of the

most

the

entities

any

are

In(IPI)

Layout

sequentiality

zoned

read

prefetched

been

of a previous

Scheduling

suboptimal

tends Many

above,

request

has

First,

and

end

from

by the

sponse

reducing

a read

For

sequential

on-board

the

data

obtained

First Serueri

host

System

request

offloads with

the

adapter,

total

access

process-

Interface

in terms

or no knowledge

status

overhead

to

drive,

and

media

scheduling

interface

or intermediate

approach

or controller

and

Computer

drive

associated

hand,

often

host

disk (LBN)

This

logic

Peripheral

of disk

The to the

block

Small

Intelligent

variety

a request

logical

the

a wide

clean

as the

the

manufacturers.

presents

the

requests.

algorithms.

on-board

a relatively

and

by

system

sufficient

protocols

(SCSI)

used

from

possess

Such

terface

read

into

Interface

disks

system.

an

10 cation

Numerous Most

of stor-

data

to service

data

ways.

read/write

satisfied

3

2.1

megabytes

to

subsequent

be determined

beyond

Second, data

charac-

of

aggressively,

switching

One

these

A

if the

several

request

reading

or

impact.

how

20 ms

control

buffers

cache.

necessarily

recent

di-

Some

sel&contained

of increasing

lifetime is to

features

scheduling

methods

effective

of our

several

of

efforts

Clever

and

have

2 m

disk

to prefetch

or cylinders.

disk

take

drive

prefetch

sequential

access.

existence

activities

cannot

rectly

might

only

containing

automatically satisfy

media

disk

speed-matching

the

transfer

if the

a disk

take

embedded

small,

caches can

quickly

requires

might

latency

sectors

delays,

mechanical tracks

the Second,

read/write

Additional

drive

First,

is incurred

gins.

disk

within from

logic

to more

example, Requests

Cache

dynamically-controlled

to

read/write

On-Board

2.3

a physical

surface,

a set of read/write

access

cylinder.

is active

each

is defined

very

difficult

is increased defect

rithm, SSTF by

time

manage-

with (for

is named

IVort94].

across

242

also

which

the

examined

provides only

the for

a marginal random

the

entire

way range

the

a lower

increase

workloads the

SCAN

response

disk

or

in the studied).

arm

of cylinders,

“elevator”

time

shuttles servicing

algo-

variance

average This

than

response algorithm

back

and

all

requests

forth in

its path.

It only

ermost

cylinder.

region

of the

requests

direction

Because disk

at

to middle

ever y cylinder a result, lower

changes

the

more

response

time

innermost

passes

both

phases

than

out-

the

center

the

edges,

service.

more

variance)

than

better

starvation

and

over

intervals

receive during

resists

arm

regular

cylinders

is reached

SCAN

at the

disk

However,

of the

effectively

scan.

As

(i.e.,

has

H

SSTF.

Zones

I

56-96

Interface Several posed. the

variations

The

bidirectional

seek

ing

any

der

equally,

the

returns

requests

LOOK

direction

the

than

of

if

way.

no

travel

the

[Mert70]. in

C-SCAN

the

C-LOOK

in

[Geis87],

in

and

Table

4

The

proposed

of algorithms denotes

how

t aining alent

between

the

strongly

that

average

LOOK.

the

direction

and

4.1

cur-

LOOK

Disk We

have to

and

algorithms,

starvation

main-

) is equiv-

rates,

Reducing about

the

account

the

relative for

about

the

In

mation,

the

imum

scheduler

positioning

more

of data the

delay

be

complete

the

Given

request

combined

seek

This

aJgorithm

was denoted

provided

isting

defects.

in [Selt90]

and

Shortest

this

and

The

is and

of the

Time

First

but

(SPTF)

we

use

the

to clarify

the

term

Shortest

exact

focus

Time

SPTF,

given

to

excessive as the requests

like

To

SSTF,

reduce

requests

that

periods request

suffers

response have

of time. ages,

are given

or

a higher

poor

variance,

been

in

The

a time

from

the

priority limit

for several

(SATF)

priority

host

algorithm.

activ-

extracted

a seek

bus

transfer

We also determined several

zoning,

of these

disks,

sparing,

simulator

and

ex-

configuration,

parameters,

pending

can queue

slowly

be set

after

validated of

run

inter

all

in-

is described

arrival

workloads

by exercising

SCSI

through

with

times,

request

sizes,

The

average

response

mat ch to

delays

sight

activity.

the

simulator, This

varying

read/write

and

an HP Each

delays.

in

C2247

traced

using

process

degrees times

[Ruem94]. time

As

most

for

that

with two

square

which

re-

the

ob-

was repeated ratios,

interar-

of sequentialit

of the

horizontal

Jaco91].

ure

243

for

figure the

Figure

of our

actual

y and

lo-

and

the

disk

disk

the

measured

[Ruem94] between

run

one

can

defines

the

the

two

in

figure

time

and

validation

calibration.

shown

Greater

response

results,

model

Unpredictable

cases.

difference.

for a sample

validation

distance

validation

all

this

I shows

are present.

for

in

for

by comparing

distributions

curves

a demerit

0.870

account

can be achieved

response

increase

within

pmtidly

be

resis-

priority

[Selt 90,

from

SCSI

in Time

starvation

may may

model

First

Positioning

time

was

request

butions

tance.

the

We

for

about

some

accurately was obtained

strategies.

relevant

was

traces

stream

served

min-

rotational

First

of the

1 lists

overheads,

validated all

simulator

simulator [Jaco91],

The for

To

disk.

mappings

information

capturing

quest

infor-

wit h the

as Shortest

Access

particul.m

BN

overto model

Table

by monitoring

management

values

cality. (STF)

cache

C2247.

communication

LBN-to-P

which

rival latency).

this

and

[H P92].

set of parameters and

d-

communication

~ort94].

To

media

location

the

by

HP

defect

prefetch

was configured

drives

the

simula-

regions,

various

and

disk

The

information

onto

physicaJ

data.

and exact

cluding

knowledge

requested

known.

choose

(i.e.,

requires

blocks

current

must can

only

between

latency,

mapping

head

the

seek delay

seek distances

addition,

read/write

Reduction

average

rotational actuaJ

necessary. active

Delay

simulator

for

validated

spare

caches,

the of disk

documentation

control

recording, and control

paper, series

generated

strongly algorithms.

and

an extensive

published

resistance.

zoned

buffers

specifications drive,

a detailed,

delays,

C2240

curve,

Positioning

bus

For this

HP

ity

3.2

Parameters

Validation

scheduling

models disk

heads.

thk

between

developed

the basic

[Geis87]

balance

and

compare

accurately

R parameter

to LOOK.

a good

Basic

simulator

simulator

management,

is towards

reduces

Drive

can

a continuum

VSCAN(O.O

(O. 2) provides

time

The

scheduler

of travel.

VSCAN(l.0)

VSCAN

response

and

biased

current

to SSTF,

suggests

SSTF

Disk

the

the

aJgorithm.

creates

1: HP C2247

Methodology

tor VSCAN(R),

H

cylin-

changes exist

Sparing/Reallocation

servic-

each

variation, requests

I

1-4 Segments

travel

without treats

SCSI-2

“ Cache,

Track

n

a full-

cent er cylinders.

SCAN

256 i ‘“– K13

pro-

replaces

cylinder,

C-SCAN

pending

been

of arm

last

cylinder

favoring

resulting

direction the

first

another

direction

be combined,

the

have

(C-SCAN)

a single reaches

to

along

rather

algorithm

algorithm

with arm

it

algorithm,

scanning rent

scan

When

SCAN

SCAN

Cyclical

[Seam66]. stroke

of the

H

8

1

Sect ors/Track

in-

distri-

simulated workload. barely root

see mean

distributions The

as

demerit

1 is 0.07

ms,

figor

100

-

4.3

LJlslcs

WTe chose

075

drives

for

els the have



050

-

-

S*mulatmn

storage

10

15

20

30

25

35

Response

Figure 10K

1:

Validation

request

Workload

30~0 sequential,

requests,

size with

8KB

Response

uniform

I 50

45

Dsmb.tmn

Time

30?’o local,

mean,

40

Tm.

in

(ins)

capacity

().5Y0 of the

demerit the

figure

average

observed

corresponding

response

over

average

all

validation

response

the

traced Sci-

Air-Rsv)

tive

data

The

worst-case

runs

was

1.9%

in

extensive the in

traces

traces

of

briefly

detail

from

as they

[Ruem93,

a broad

range

random

actual have

systems.

been

Rama92].

workloads

described

The

of environments,

traced

and

each

workshift

(8 hours)

present

in

running tem

of the

[Ruem93].

for program is from ley

used

traces a single

from

a version Celto

server

workloads

The

HP

and

Labs

Air-R

four

(5/30/92

about

vations. in which tistical machine during

travel

used

analytic packages

were

parts

distribution

environment

and

9 for

to the

line

issue rate

is that

this often

and subsequent how

is an

individual

area

work

produce

was

a range

factor

is one,

traced.

used

incorporate

completions

this

When

times

C2240

disks

would

about

effects

those

ac-

that

HP

the

a methodology

scale

Snake, the

of disks

than

model

in

interarrival

Cello,

platters.

previous

feedback

those

platters

We feel

completions

used

to of

to contain

media.

Developing

systems.

disks

enough

ideal

difficul-

not

real&

for

future

to scale

the

of average

ir-

the the

are halved

is for

simulated scale

factor

(doubling

the

made

VAXT&r [Rama92].

in which

environment

airline

Order

and and

company.

was

problem

(which

with

behaves

undoubtedly using

and

HP

this

as an open

results

wit h a identity

the

have C2240

type

been

disk

different

drives.

of trace-driven

system),

insights

scaling

but

are

This simula-

we believe

that

valid.

Metrics

4.4

for

system

and

time-sharing

software

representing

Report

were

even

these

data

commercial

processing

used.

would

traced

qualitative

However,

to 6/6/92).

from

a scientific

modeling

hours,

this

a different

traced

large

information

match

workload

system

tion

Snake

While

we report

operating

agents

Sc i- TS is from

workload.

a batch

are

VMSTM

a transaction

500

daytime

cessing

traces

the

su is from

which

were

other

the

Repor