scheduling algorithms used simplistic disk models, lacking several important features ... to be hidden by a high-level interface, making this information unavailable ..... its path. It only changes direction at the innermost and out- ermost cylinder. ..... terarriva.1 rates. When the scale factor is one, the simulated interarrival times.
Scheduling Bruce
Algorithms
for
L.\Vorthington,
Department
Gregory
of Electrical
University
subsystem by
quests.
performance
dynamically
Via
can
ordering,
strongly
validated
be
dramatically
scheduling,
or
simulation,
im-
pending
of
complex
logic&
caches
synthetic
on
scheduling
workloads
and
effectiveness.
academia
and
the
workload
exact
tational
user
environments,
traces
captured
we arrive
at three
large
Using
both
six
(I)
Incorporating
scheduler
provides
response which
complex
times
always
highest
such
(3)
among
the
and
for
highest
exploit
with
important
been
for
ing
and
subsystems the
storage
acterized
must
growing
replaced
In
components.
by intense
of pending
bursts
requests
for dynamically
A
portion
prised
of
blocks
on and
account
the
the
the
the
various
can
providing
reasonable
Over gorithms
the
past
have
been
position.
entire
with
disk
years,
proposed
times a
for
variety and
of
be
most
of our
date
appear,
and
notice
these
features
disk
drives. of logi-
by a high-level to
affect
any
exter-
buffers
caches.
published Most
In
the
much
work
of these
me
have
this
paper,
performance
nniformly
can
of
‘(realistic”
utilize
such
We
previous traces
use ranwork,
from
synthetic
ran-
in question
workloads. with
and
that across
using
remains
extensive
traces
detailed,
limited
assume
distributed
be learned
comparison
The
has been
studies
of the results
more
a very
are
highly request
taking
affect
but
various
workloads
well-validated
six
scheduling
traces.
dk.k
this
can
our
and
work data
disk
and and
are
simulator.
5 and traces,
efforts
6 present
avenues
and our
some 3
the
for
our
results
future due
dis-
origins using
Section
excluded
[Wort
how
4 describes
respectively.
suggests
in
and
Section
Section
validation
descriptions,
be found
drives
accuracy.
research.
Sections
marizes
itations,
modern
scheduling
including
Additional
into
accesses,
7 sumresearch.
to space
lim-
94].
a
while
requests.
in
2 describes
can
workloads
2
Modern
(on
a common
A disk
alboth
Data
mation
Its
prefetching
systems.
random
titie
and
of
lacking
mapping
speed-matching
large
While
previous
of the
The
publication
Small
experiments
methodology,
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the the
modern the
unavailable
to allow
to drive
dia.
of
in allow
to be hidden
the validity using
workloads
cusses
com-
scheduling
implemented
to
media
locations
veritied
dom
features
requests.
overhead
individual
drive
studies
models,
information
previously
disk.
Section
char-
scheduler
various By
disk
present
workloads.
starting
used
queues
may
which
tot al positioning
response
disk
pending
the
of disk
previous
process-
long
time
of
associated
the
25
the
delays,
arm
delays
The
service
positions disk
minimize
3].
on
compu-
algorithms.
dom workloads,
that
are often
creating
ordering
positioning relative
current
scheduler
Ruem9
of request
mechanical
dependent
worldoads
of activity,
[MciYu86,
is responsible significant
Disk
how
addition,
request
for
to compen-
between
with
scheduling
until
managed
disparity
this entity.
to synthetic
cache.
be carefully
performance
knowledge Most
simplistic
expanded
to physical
making
red-world Disk
is dependent
scheduler’s
read
position-
provided
blocks
various
Introduction
sate
available
used
has
we investigate sig-
achieves
overall
performance
a prefetching
in
algorithms
reduce
merit the
status.
features
logic
scheduling
which
order,
seek-reducing
nal
Algorithms
(C-LOOK), logical
that
the
provide
workloads
the
current
algorithms
interface,
2YO) decrease (2)
relative
requests,
differ-
into
caches
algorithm
inascending
Algorithms
produce
recognize
1
scan
performance
delays
disk
improvements cyclical
than
algorithms.
prefetching
requests
workloads.
they
seek-reducing
The
schedules
(less
and
and
On-board
conclusions:
information
a marginal
utilize
performance
sequentiality.
ing
for
effectively
nificant
the
only
mapping
Their
of disk
the
and
from main
industry.
resources,
cal data ent
Science
urnich.edu
several prefetching
Patt
48109-2122
configuration
mappings
N.
re-
we examine
to-physicaJ
Drives
Yale
and Computer
scheduling impact
Disk
Ganger,
Ann Arbor
Abstract Disk
R.
Engineering
of Michigan, worthing~eecs.
proved
Modern
is given
copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
SiGMETRICS 94- 5/94 Santa Clara, CA. USA 0 1994 ACM 0-89791 -659-xKWO005..$3.5O
241
drive
ten
unit
of data
error
tracks
(one
from
of the
disk.
For
a set
coated
writ
holds (e. g.,
Drives
contains
axis)
are
minimum
commonly
that
Disk
in
storage
data).
surface)
example,
the
plus
rotating with
a sector,
innermost
me-
which
header/trailer
A cylinder
equidistant
platters
magnetic
on each surface.
is usually
of data
correction
sides
tracks
circular
512? bytes
each
of rapidly
on both
from cylinder
infor-
is a set of the
center
contains
the
innermost
media
track
location
tor.
The
disk
arm
be positioned
to
together
that
such
a specific
of
surface.
by
holds
any
a given
given
Therefore,
cylinder,
cylinder. disk
In most
at any
its
heads
The
arm
drives
heads
position only
and
sec-
that
can
The logic
are ganged
corresponds
one
use of memory
has
progressed
age.
head
The
cache
time.
disk
serviced
more
mechanical
(seek)
to the
by
the
delays. target
cylinder.
while
waiting
for
head.
After
these
spans
multiple
the
may
disk
target
delays
suffer
arm
one
must
rotational
the
result
or
that
move
to
data
the
reach
on-board The
be-
Current affect
desired
data
are
the
drives
the
accuracy
result
of of
self-managed. pacity goal
teristics
work
should
to
make
also
factor
into
disks
have
determine
that
can
algorithms.
in
a disk
a major
exactly
scheduling
drive’s
ca-
Host
ing
power
are
to present
details from
number
of the
subsequent
the
agement
self the
host. overhead
the
host
other
have
media,
various
little
the
the
cached
2.2
Data
Disk
to the
of
delays
and
systems
disk
drive
size.
of
of
PBN)
mappings.
highly
accurate
ment
cache if
can the
cache
disk
of the
scheduling is
configured
read/write
by knowledge
Iengt
h.
The
be
accessed
disk
media.
will
be given
cylinder.
quickly
Requests
therefore
possibly
next
more
most
continue
request,
to the far
head
of the
disk
read
or seeking
could
affects
the
position
heads
than
that
can
higher
be
priority.
Algorithms
data
data
disk
such
LBN-based
proposed
logical More
The
recording,
to
aggressive
knowledge mappings
[Ruem94,
block
of the may
complexity track/cylinder
be
physical
data
unavailable
of the
mappings skew,
and
that
disk
often
scheduling
higher
about
First Come
a simple pohcy
results
algorithms
performance
individual
in have
by
taking
requests
and
the
subsystem.
Reduction
ago,
[Denn6i]
analyzed
the
advantages
of
it-
and
the
request
that
ally
on
the
on
First
request
will
average
increases
with time
distances.
over
for
disk
a wide
starvation
a subset to
of that
range
of the
may
SSTF
reduces
be
workloads. requests
in excessive
workload,
cylinders
region,
is usu-
SSTF
of
resulting a heavy
that
but
pending It
of individual
utibation, Given
algorithm the
delay.
seek
requests
outside
seek
by using time
over
all
smallest
times,
variance.
to hover
This
seek
potential the
policy. by selecting
exact
response the
requests
the
to predict
However,
(SSTF)
to service
incur
approximated
to exhaust
the
Time
next
infeasible
closely
the
Seek
chooses
in
thereby
re-
the an
arm
attempt
starving
any
space.
(L BN-tomay
require
[Denn67]
As mentioned or
of the
25 years
a Shortest
of seek-
block
layout.
achieve
Seek Delay
Over
On
layout
rehes
algorithms
Many
information
state
shown scheduling
mm-
request.
scheduling
that
account
3.1
drive
approximations
LBN-based
have disk
The
storage
cache,
each
studies (FCFS)
performance.
current
hidden the
of the
of the
with
host
electronics.
outside
been into
starting
typically
actual
on-board
associated
use
algorithms.
to obtain.
into
controller of the
most
the
entities
any
are
In(IPI)
Layout
sequentiality
zoned
read
prefetched
been
of a previous
Scheduling
suboptimal
tends Many
above,
request
has
First,
and
end
from
by the
sponse
reducing
a read
For
sequential
on-board
the
data
obtained
First Serueri
host
System
request
offloads with
the
adapter,
total
access
process-
Interface
in terms
or no knowledge
status
overhead
to
drive,
and
media
scheduling
interface
or intermediate
approach
or controller
and
Computer
drive
associated
hand,
often
host
disk (LBN)
This
logic
Peripheral
of disk
The to the
block
Small
Intelligent
variety
a request
logical
the
a wide
clean
as the
the
manufacturers.
presents
the
requests.
algorithms.
on-board
a relatively
and
by
system
sufficient
protocols
(SCSI)
used
from
possess
Such
terface
read
into
Interface
disks
system.
an
10 cation
Numerous Most
of stor-
data
to service
data
ways.
read/write
satisfied
3
2.1
megabytes
to
subsequent
be determined
beyond
Second, data
charac-
of
aggressively,
switching
One
these
A
if the
several
request
reading
or
impact.
how
20 ms
control
buffers
cache.
necessarily
recent
di-
Some
sel&contained
of increasing
lifetime is to
features
scheduling
methods
effective
of our
several
of
efforts
Clever
and
have
2 m
disk
to prefetch
or cylinders.
disk
take
drive
prefetch
sequential
access.
existence
activities
cannot
rectly
might
only
containing
automatically satisfy
media
disk
speed-matching
the
transfer
if the
a disk
take
embedded
small,
caches can
quickly
requires
might
latency
sectors
delays,
mechanical tracks
the Second,
read/write
Additional
drive
First,
is incurred
gins.
disk
within from
logic
to more
example, Requests
Cache
dynamically-controlled
to
read/write
On-Board
2.3
a physical
surface,
a set of read/write
access
cylinder.
is active
each
is defined
very
difficult
is increased defect
rithm, SSTF by
time
manage-
with (for
is named
IVort94].
across
242
also
which
the
examined
provides only
the for
a marginal random
the
entire
way range
the
a lower
increase
workloads the
SCAN
response
disk
or
in the studied).
arm
of cylinders,
“elevator”
time
shuttles servicing
algo-
variance
average This
than
response algorithm
back
and
all
requests
forth in
its path.
It only
ermost
cylinder.
region
of the
requests
direction
Because disk
at
to middle
ever y cylinder a result, lower
changes
the
more
response
time
innermost
passes
both
phases
than
out-
the
center
the
edges,
service.
more
variance)
than
better
starvation
and
over
intervals
receive during
resists
arm
regular
cylinders
is reached
SCAN
at the
disk
However,
of the
effectively
scan.
As
(i.e.,
has
H
SSTF.
Zones
I
56-96
Interface Several posed. the
variations
The
bidirectional
seek
ing
any
der
equally,
the
returns
requests
LOOK
direction
the
than
of
if
way.
no
travel
the
[Mert70]. in
C-SCAN
the
C-LOOK
in
[Geis87],
in
and
Table
4
The
proposed
of algorithms denotes
how
t aining alent
between
the
strongly
that
average
LOOK.
the
direction
and
4.1
cur-
LOOK
Disk We
have to
and
algorithms,
starvation
main-
) is equiv-
rates,
Reducing about
the
account
the
relative for
about
the
In
mation,
the
imum
scheduler
positioning
more
of data the
delay
be
complete
the
Given
request
combined
seek
This
aJgorithm
was denoted
provided
isting
defects.
in [Selt90]
and
Shortest
this
and
The
is and
of the
Time
First
but
(SPTF)
we
use
the
to clarify
the
term
Shortest
exact
focus
Time
SPTF,
given
to
excessive as the requests
like
To
SSTF,
reduce
requests
that
periods request
suffers
response have
of time. ages,
are given
or
a higher
poor
variance,
been
in
The
a time
from
the
priority limit
for several
(SATF)
priority
host
algorithm.
activ-
extracted
a seek
bus
transfer
We also determined several
zoning,
of these
disks,
sparing,
simulator
and
ex-
configuration,
parameters,
pending
can queue
slowly
be set
after
validated of
run
inter
all
in-
is described
arrival
workloads
by exercising
SCSI
through
with
times,
request
sizes,
The
average
response
mat ch to
delays
sight
activity.
the
simulator, This
varying
read/write
and
an HP Each
delays.
in
C2247
traced
using
process
degrees times
[Ruem94]. time
As
most
for
that
with two
square
which
re-
the
ob-
was repeated ratios,
interar-
of sequentialit
of the
horizontal
Jaco91].
ure
243
for
figure the
Figure
of our
actual
y and
lo-
and
the
disk
disk
the
measured
[Ruem94] between
run
one
can
defines
the
the
two
in
figure
time
and
validation
calibration.
shown
Greater
response
results,
model
Unpredictable
cases.
difference.
for a sample
validation
distance
validation
all
this
I shows
are present.
for
in
for
by comparing
distributions
curves
a demerit
0.870
account
can be achieved
response
increase
within
pmtidly
be
resis-
priority
[Selt 90,
from
SCSI
in Time
starvation
may may
model
First
Positioning
time
was
request
butions
tance.
the
We
for
about
some
accurately was obtained
strategies.
relevant
was
traces
stream
served
min-
rotational
First
of the
1 lists
overheads,
validated all
simulator
simulator [Jaco91],
The for
To
disk.
mappings
information
capturing
quest
infor-
wit h the
as Shortest
Access
particul.m
BN
overto model
Table
by monitoring
management
values
cality. (STF)
cache
C2247.
communication
LBN-to-P
which
rival latency).
this
and
[H P92].
set of parameters and
d-
communication
~ort94].
To
media
location
the
by
HP
defect
prefetch
was configured
drives
the
simula-
regions,
various
and
disk
The
information
onto
physicaJ
data.
and exact
cluding
knowledge
requested
known.
choose
(i.e.,
requires
blocks
current
must can
only
between
latency,
mapping
head
the
seek delay
seek distances
addition,
read/write
Reduction
average
rotational actuaJ
necessary. active
Delay
simulator
for
validated
spare
caches,
the of disk
documentation
control
recording, and control
paper, series
generated
strongly algorithms.
and
an extensive
published
resistance.
zoned
buffers
specifications drive,
a detailed,
delays,
C2240
curve,
Positioning
bus
For this
HP
ity
3.2
Parameters
Validation
scheduling
models disk
heads.
thk
between
developed
the basic
[Geis87]
balance
and
compare
accurately
R parameter
to LOOK.
a good
Basic
simulator
simulator
management,
is towards
reduces
Drive
can
a continuum
VSCAN(O.O
(O. 2) provides
time
The
scheduler
of travel.
VSCAN(l.0)
VSCAN
response
and
biased
current
to SSTF,
suggests
SSTF
Disk
the
the
aJgorithm.
creates
1: HP C2247
Methodology
tor VSCAN(R),
H
cylin-
changes exist
Sparing/Reallocation
servic-
each
variation, requests
I
1-4 Segments
travel
without treats
SCSI-2
“ Cache,
Track
n
a full-
cent er cylinders.
SCAN
256 i ‘“– K13
pro-
replaces
cylinder,
C-SCAN
pending
been
of arm
last
cylinder
favoring
resulting
direction the
first
another
direction
be combined,
the
have
(C-SCAN)
a single reaches
to
along
rather
algorithm
algorithm
with arm
it
algorithm,
scanning rent
scan
When
SCAN
SCAN
Cyclical
[Seam66]. stroke
of the
H
8
1
Sect ors/Track
in-
distri-
simulated workload. barely root
see mean
distributions The
as
demerit
1 is 0.07
ms,
figor
100
-
4.3
LJlslcs
WTe chose
075
drives
for
els the have
—
050
-
-
S*mulatmn
storage
10
15
20
30
25
35
Response
Figure 10K
1:
Validation
request
Workload
30~0 sequential,
requests,
size with
8KB
Response
uniform
I 50
45
Dsmb.tmn
Time
30?’o local,
mean,
40
Tm.
in
(ins)
capacity
().5Y0 of the
demerit the
figure
average
observed
corresponding
response
over
average
all
validation
response
the
traced Sci-
Air-Rsv)
tive
data
The
worst-case
runs
was
1.9%
in
extensive the in
traces
traces
of
briefly
detail
from
as they
[Ruem93,
a broad
range
random
actual have
systems.
been
Rama92].
workloads
described
The
of environments,
traced
and
each
workshift
(8 hours)
present
in
running tem
of the
[Ruem93].
for program is from ley
used
traces a single
from
a version Celto
server
workloads
The
HP
and
Labs
Air-R
four
(5/30/92
about
vations. in which tistical machine during
travel
used
analytic packages
were
parts
distribution
environment
and
9 for
to the
line
issue rate
is that
this often
and subsequent how
is an
individual
area
work
produce
was
a range
factor
is one,
traced.
used
incorporate
completions
this
When
times
C2240
disks
would
about
effects
those
ac-
that
HP
the
a methodology
scale
Snake, the
of disks
than
model
in
interarrival
Cello,
platters.
previous
feedback
those
platters
We feel
completions
used
to of
to contain
media.
Developing
systems.
disks
enough
ideal
difficul-
not
real&
for
future
to scale
the
of average
ir-
the the
are halved
is for
simulated scale
factor
(doubling
the
made
VAXT&r [Rama92].
in which
environment
airline
Order
and and
company.
was
problem
(which
with
behaves
undoubtedly using
and
HP
this
as an open
results
wit h a identity
the
have C2240
type
been
disk
different
drives.
of trace-driven
system),
insights
scaling
but
are
This simula-
we believe
that
valid.
Metrics
4.4
for
system
and
time-sharing
software
representing
Report
were
even
these
data
commercial
processing
used.
would
traced
qualitative
However,
to 6/6/92).
from
a scientific
modeling
hours,
this
a different
traced
large
information
match
workload
system
tion
Snake
While
we report
operating
agents
Sc i- TS is from
workload.
a batch
are
VMSTM
a transaction
500
daytime
cessing
traces
the
su is from
which
were
other
the
Repor