Pen-based per- sonal digital assistants. (PDA),. e.g. Apple. Newton, replaced the ... signature of the bank accounts.) One way to handle the handwritten text is.
Fast
Retrieval
of Cursive
Ibrahim Matsushita
Kamel
Information
Technology
Princeton,
N.J.
ibrahimQMITL.Research.
presents
an indexing
used to search a large The
Each
in the feature
(data
The
space.
dimensionality)
vectors
are stored
strokes
string
number
and thus in an R-tree.
all
a
transform used
the index
size.
Similarity
search can be
commands
and
allows
the
(PDA),
data
user
handwritten
examples.
database
and
(e.g.,
store
Newton,
a pen
by which
be performed.
data
in
the
one
verifying
form
of
based
on
searching
a
queries
example,
contains
per-
Apple
can
formulate For
which
fields
with
entries
to
notes
Pen-based
e.g.
keyboard
handwritten
written
interface.
assistants entire
large
Feature
pen-baaed
the
This
as points
of features
digital
replaced
into
Corn
of the
sonal
be
can be described
can be stored
Karhuraen-Lor%e the
can
handwriting.
each cursive
and, thus,
used to minimize
that
of cursive
of these
a set of features
is then
method
collection
basic idea is to segment
set of strokes. with
08540
duction
paper
Laboratory
Panasonic.
Abstract This
Handwriting
or
more
signature
hand-
of the
bank
accounts.) performed
by executing
applying select
a simple the
strings
The
proposed
well
as substring
of errors namely,
stroke The
in search it improves sequential
that
and
algorithm
to the
output
are most
similar
can support
mat thing. result
the
index the
the matching
and m-n achieves
sequential rate
One
to the
way
to
handle
translate
it
first
into
using
pattern
to
query.
to
as
ters
kind
to store
process,
rithm
substitution.
substantial search.
then
queries
segmentation
insertion/deletion
over
to the
similarity
It is resilient
from
proposed time
queries
voting
index
that
a few range
saving
it as ASCII
characters
search
through
up to 46~0 over
tion
phase
put
device
creased
in
importance,
text
especially
then
then
the search
algo-
into
a sequence
performs
database.
of
a traditional
Thus,
is an intermediate
has
recently
since
the
inintrcr-
are
cult
even
sive
string.
phabet),
step
that
91
to
tablet)
levels
identify Moreover,
into
shape etc. the
and for
however, step
the
recogni-
between
Another recognition
storage
recognition
the
the
in-
a problem.
letter
boundaries
letter
It in
of predefine
renders
the
the
cur-
handwrit(al-
as the parthe
to this
in
is dMi-
symbols such
“allographn,
disadvantage phase
of errors
the
information,
device.
of cursive
number
poses
a sequence
of the
the
by translating
we lose much
ticular style,
and
low,
recognition
ten string Permission to make digitslhmd copies of all or ptut of tils material for peraoml or classroom usc is granted without fee provided that the copies am. not made or dkm-ibuted for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copyright is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servera or to redistribute to lists, requires specific permission snd/or fee. CIKM 96, Rockville MD USA @ 1996 ACM 0-89791-873-8/96/11 ..$3.50
(pen accuracy
writing
Introduction of handwritten
string
and
the the
search.
handling
the
charac-
techniques
Similarly,
is
Moreover,
the The
text.
and
text
ASCII-equivalent
the query
ASCII
handwritten
recognition
translates
Because
1
the
writing
method system
is sen-
sitive
to the A
underlying
more
written type
is
[LT92].
stand
it.
During
not
look
can
not
recreate
appropriate
are
Hence,
would algorithm
‘similar”
requirement
for
(or
approximate
be more
suitable
environment
retrieval
and
searching of
search
query
paper
for
a given
algorithm
be
strings
Section
it.
ming
algorithm
each
makes
online
proximity
rest
in of
paper
our
experiments the
proposed for
matching
measuring
rate
of the
the
response
proposed
index.
our
conclusions
and
future
Search
ing done
much
handwritten in
Tomklns searching
indexing [LT94] a very
text,
has not
handwritten proposed long
answer.
to
one
comparison
the
simpler
different
saves
about
strokes
the
is lost.
VUE
the
sym-
and
information
method
reduc-
of 64 alphabet
to inspect
this
query
Although
much
valuable
to the
the
In
whole
each
sub-
algorithm.
Proposed
Design
3 section,
we propose
a new
index
allows
fast
for
cursive
time
handwriting.
This
similar
and
index
retrieval
of
and
Section
strings
can
handle
insertion,
deletion,
5 substitution
errors
and
substring
matching.
work.
(VUE
been much text.
a sequential
cursive
is similar
in order
We call
associated program-
sequentially
with string,
research
cost
4 we show
Algorithm) Although
Each
database
index
deletion
Sequential
a sym-
to search
The 2
insert
necessary
m-n gives
that
vector
the
another.
it becomes
3
Section
In Section
for
using
is
a sequential
handwriting.
index.
other
strings
the cost of the transfor-
as an
between
In this describes
the
we can
two
addition,
string.
a database
the
2 describes
of
the
the
uses a dynamic
substring
nonetheless,
text
two cursive
a symbol,
to minimize
stroke
ASCII
to compare
has a predefine
is reported
prob-
and
symbol
distance”
Any
to support
problem
“Edit
The
alphabet),
into
delete one
the pen
a constant
books).
aligns
string
operations
with
digital
string
cursive
operations: substitute
personal
the
one
A
can be one of
traditional
distancen
bol,
and
the
[WF74]
“edit
following
space,
address
The
The
raising
stroke
(or code
to
string.
is to define
Each
a different
transforms
bols
additional
reduced
cursive
without
idea
types
distance
ing
which
stroke
use the edit
string
The
The strokes.
(with
mation.
string case.
One
need
cursive
for
all
not
time.
we
text.
as follow.
for
and/or
is the
this
in this
string.
response
handwritten
organized
look
will
the
comparison
of these
previously
similarity
pen-based
fast
own
query
the
people a person
match
assistant
In
exact
his
for
and
a challenging
different
a stroke.
is then
of a small
are drawn
is called
strings.
Search-
Moreover,
even
should to
is
two
same.
perfectly
and
matching)
text
more
symbols, etc.
that
alphabet
lem
string
user
non-ASCII
by
the
the
occurrences
64 different
an appropri-
languages,
written
as a
to under-
using
use
data
the query
all the
set of points
hand-
is treated
gives
cursive
exactly
word.
search
strings
other
word
can
a first-class
This
handwritten A
aa
process,
he can
the
an attempt
the search
equations,
problem.
handle
string
without
power;
drawings,
to
handwritten
function.
expressive
drawn
it
to database
distance
in
way
treat
pattern
is compared
ing
to
The
pictographlc
ate
natural
text
cate
language.
string
done work
in
been
Lopresti
and
algorithm in order
3.1
for
the
search
the
Each
92
can
that
Basic
like
and
in real
time
be a set the
query
a search of the
query
strings
or
string.
Idea
the cursive
stroke
Given
would
look
insertion
be intermixed
operations.
answer
The
We model
to lo-
in the sense that
operations
substrings
search-
has
is dynamic
string
is described
as a sequence by
a set
of strokes.
of features
and
Root
:ree
Cursive Non-leaf
string
the
repository
node
variability
in
that
correspond
tend
to vary
handwriting,
the
to different
feature
instances
vectors
of one stroke
m
+wti~= I
different
‘eafn”de \ .
1:
sponding
Leaf to the
strings
which
thus
can
be
in
choose
the R-tree
search
space
is
dimensional
R-tree A
because
We
of points by the
tuple
coordinates
of the
point
points
ends
and
the x-y
are
a new
features.
to prune
the
The
the
e.g.,
traversed,
and
multi-
that
the
reader
detailed
that
each
tablet
Each
the
point
according
feature
space
is
Sill
A
>i
multi-dimensional
with have
geometric
fI,
t points
the angle
and
length
features
are
alike to some
tend
to
distance
a set
been
the
strokes
S,
stroke
segmentation
of
set
11
described
total
of the
selected similar
functions.
in the
in the
of points in
Each
entry
so that
the
string
vector Due to
have
93
entries
leaf
that
node
the
of the
an
nodes form
by
R-tree
the
page.
index. Non-leaf
are kept
in main
on the
disk.
A
other
will
be
each
(level
O in
the
tree).
in the form
of (word-id,
of a point
P (=stroke)
pictographic
contains
Non-leaf
in
to
node
as
S is represented
are stored
coordinates for
formed
one disk
close
leaf
stokes
multi-dimensional
in number,
are
in the
a word-id
1).
stored
nodes
same
and
Figure
leaf
the
string These
occupies
that
the
P) contains
angle
are
oft
s~ is represented space
The
are small
while
stored
a sequence
stroke
space.
node
which
the
S into
f2, . . . . fll.
R-tree
nodes,
of the bounding
have
Right:
1 l-dimensional
(=strokes)
Each
as local
properties
stroke,
words;
Each
in an
points
in
a string
points
0,
to
a child
node
Rectangle the
(MBR)
child
that
encloses
all
Bounding
the
entries
in
2 shows
illustration
row,
and
drawing
an
answer
set
of
an example
below.”
For
to simplify
of the
the
sake
of
let
with
and
us assume
only
two
that
expensive
this
the
pictographic
the
the
database.
features,
same
each
the
Moreover,
represents the
let
us assume
one letter
representation
dimensional
in the of the
space.
Each
string
points
(equal
string)
in the
two-dimensional
the
The
shaded
each
letter
similar each
(stroke).
way other
3.2
in Figure
(e.g.
r, n)
or even
a manner
similar
Q.
word
twice
For
it
identically,
The
scores
of the
are
not any
strings
queries
use of
from use the
Root,
string
Q):
the
For
that
each
Form
string
in
for
each
qi,
1