Fast Retrieval of Cursive Handwriting - CiteSeerX

5 downloads 101131 Views 594KB Size Report
Pen-based per- sonal digital assistants. (PDA),. e.g. Apple. Newton, replaced the ... signature of the bank accounts.) One way to handle the handwritten text is.
Fast

Retrieval

of Cursive

Ibrahim Matsushita

Kamel

Information

Technology

Princeton,

N.J.

ibrahimQMITL.Research.

presents

an indexing

used to search a large The

Each

in the feature

(data

The

space.

dimensionality)

vectors

are stored

strokes

string

number

and thus in an R-tree.

all

a

transform used

the index

size.

Similarity

search can be

commands

and

allows

the

(PDA),

data

user

handwritten

examples.

database

and

(e.g.,

store

Newton,

a pen

by which

be performed.

data

in

the

one

verifying

form

of

based

on

searching

a

queries

example,

contains

per-

Apple

can

formulate For

which

fields

with

entries

to

notes

Pen-based

e.g.

keyboard

handwritten

written

interface.

assistants entire

large

Feature

pen-baaed

the

This

as points

of features

digital

replaced

into

Corn

of the

sonal

be

can be described

can be stored

Karhuraen-Lor%e the

can

handwriting.

each cursive

and, thus,

used to minimize

that

of cursive

of these

a set of features

is then

method

collection

basic idea is to segment

set of strokes. with

08540

duction

paper

Laboratory

Panasonic.

Abstract This

Handwriting

or

more

signature

hand-

of the

bank

accounts.) performed

by executing

applying select

a simple the

strings

The

proposed

well

as substring

of errors namely,

stroke The

in search it improves sequential

that

and

algorithm

to the

output

are most

similar

can support

mat thing. result

the

index the

the matching

and m-n achieves

sequential rate

One

to the

way

to

handle

translate

it

first

into

using

pattern

to

query.

to

as

ters

kind

to store

process,

rithm

substitution.

substantial search.

then

queries

segmentation

insertion/deletion

over

to the

similarity

It is resilient

from

proposed time

queries

voting

index

that

a few range

saving

it as ASCII

characters

search

through

up to 46~0 over

tion

phase

put

device

creased

in

importance,

text

especially

then

then

the search

algo-

into

a sequence

performs

database.

of

a traditional

Thus,

is an intermediate

has

recently

since

the

inintrcr-

are

cult

even

sive

string.

phabet),

step

that

91

to

tablet)

levels

identify Moreover,

into

shape etc. the

and for

however, step

the

recogni-

between

Another recognition

storage

recognition

the

the

in-

a problem.

letter

boundaries

letter

It in

of predefine

renders

the

the

cur-

handwrit(al-

as the parthe

to this

in

is dMi-

symbols such

“allographn,

disadvantage phase

of errors

the

information,

device.

of cursive

number

poses

a sequence

of the

the

by translating

we lose much

ticular style,

and

low,

recognition

ten string Permission to make digitslhmd copies of all or ptut of tils material for peraoml or classroom usc is granted without fee provided that the copies am. not made or dkm-ibuted for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copyright is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servera or to redistribute to lists, requires specific permission snd/or fee. CIKM 96, Rockville MD USA @ 1996 ACM 0-89791-873-8/96/11 ..$3.50

(pen accuracy

writing

Introduction of handwritten

string

and

the the

search.

handling

the

charac-

techniques

Similarly,

is

Moreover,

the The

text.

and

text

ASCII-equivalent

the query

ASCII

handwritten

recognition

translates

Because

1

the

writing

method system

is sen-

sitive

to the A

underlying

more

written type

is

[LT92].

stand

it.

During

not

look

can

not

recreate

appropriate

are

Hence,

would algorithm

‘similar”

requirement

for

(or

approximate

be more

suitable

environment

retrieval

and

searching of

search

query

paper

for

a given

algorithm

be

strings

Section

it.

ming

algorithm

each

makes

online

proximity

rest

in of

paper

our

experiments the

proposed for

matching

measuring

rate

of the

the

response

proposed

index.

our

conclusions

and

future

Search

ing done

much

handwritten in

Tomklns searching

indexing [LT94] a very

text,

has not

handwritten proposed long

answer.

to

one

comparison

the

simpler

different

saves

about

strokes

the

is lost.

VUE

the

sym-

and

information

method

reduc-

of 64 alphabet

to inspect

this

query

Although

much

valuable

to the

the

In

whole

each

sub-

algorithm.

Proposed

Design

3 section,

we propose

a new

index

allows

fast

for

cursive

time

handwriting.

This

similar

and

index

retrieval

of

and

Section

strings

can

handle

insertion,

deletion,

5 substitution

errors

and

substring

matching.

work.

(VUE

been much text.

a sequential

cursive

is similar

in order

We call

associated program-

sequentially

with string,

research

cost

4 we show

Algorithm) Although

Each

database

index

deletion

Sequential

a sym-

to search

The 2

insert

necessary

m-n gives

that

vector

the

another.

it becomes

3

Section

In Section

for

using

is

a sequential

handwriting.

index.

other

strings

the cost of the transfor-

as an

between

In this describes

the

we can

two

addition,

string.

a database

the

2 describes

of

the

the

uses a dynamic

substring

nonetheless,

text

two cursive

a symbol,

to minimize

stroke

ASCII

to compare

has a predefine

is reported

prob-

and

symbol

distance”

Any

to support

problem

“Edit

The

alphabet),

into

delete one

the pen

a constant

books).

aligns

string

operations

with

digital

string

cursive

operations: substitute

personal

the

one

A

can be one of

traditional

distancen

bol,

and

the

[WF74]

“edit

following

space,

address

The

The

raising

stroke

(or code

to

string.

is to define

Each

a different

transforms

bols

additional

reduced

cursive

without

idea

types

distance

ing

which

stroke

use the edit

string

The

The strokes.

(with

mation.

string case.

One

need

cursive

for

all

not

time.

we

text.

as follow.

for

and/or

is the

this

in this

string.

response

handwritten

organized

look

will

the

comparison

of these

previously

similarity

pen-based

fast

own

query

the

people a person

match

assistant

In

exact

his

for

and

a challenging

different

a stroke.

is then

of a small

are drawn

is called

strings.

Search-

Moreover,

even

should to

is

two

same.

perfectly

and

matching)

text

more

symbols, etc.

that

alphabet

lem

string

user

non-ASCII

by

the

the

occurrences

64 different

an appropri-

languages,

written

as a

to under-

using

use

data

the query

all the

set of points

hand-

is treated

gives

cursive

exactly

word.

search

strings

other

word

can

a first-class

This

handwritten A

aa

process,

he can

the

an attempt

the search

equations,

problem.

handle

string

without

power;

drawings,

to

handwritten

function.

expressive

drawn

it

to database

distance

in

way

treat

pattern

is compared

ing

to

The

pictographlc

ate

natural

text

cate

language.

string

done work

in

been

Lopresti

and

algorithm in order

3.1

for

the

search

the

Each

92

can

that

Basic

like

and

in real

time

be a set the

query

a search of the

query

strings

or

string.

Idea

the cursive

stroke

Given

would

look

insertion

be intermixed

operations.

answer

The

We model

to lo-

in the sense that

operations

substrings

search-

has

is dynamic

string

is described

as a sequence by

a set

of strokes.

of features

and

Root

:ree

Cursive Non-leaf

string

the

repository

node

variability

in

that

correspond

tend

to vary

handwriting,

the

to different

feature

instances

vectors

of one stroke

m

+wti~= I

different

‘eafn”de \ .

1:

sponding

Leaf to the

strings

which

thus

can

be

in

choose

the R-tree

search

space

is

dimensional

R-tree A

because

We

of points by the

tuple

coordinates

of the

point

points

ends

and

the x-y

are

a new

features.

to prune

the

The

the

e.g.,

traversed,

and

multi-

that

the

reader

detailed

that

each

tablet

Each

the

point

according

feature

space

is

Sill

A

>i

multi-dimensional

with have

geometric

fI,

t points

the angle

and

length

features

are

alike to some

tend

to

distance

a set

been

the

strokes

S,

stroke

segmentation

of

set

11

described

total

of the

selected similar

functions.

in the

in the

of points in

Each

entry

so that

the

string

vector Due to

have

93

entries

leaf

that

node

the

of the

an

nodes form

by

R-tree

the

page.

index. Non-leaf

are kept

in main

on the

disk.

A

other

will

be

each

(level

O in

the

tree).

in the form

of (word-id,

of a point

P (=stroke)

pictographic

contains

Non-leaf

in

to

node

as

S is represented

are stored

coordinates for

formed

one disk

close

leaf

stokes

multi-dimensional

in number,

are

in the

a word-id

1).

stored

nodes

same

and

Figure

leaf

the

string These

occupies

that

the

P) contains

angle

are

oft

s~ is represented space

The

are small

while

stored

a sequence

stroke

space.

node

which

the

S into

f2, . . . . fll.

R-tree

nodes,

of the bounding

have

Right:

1 l-dimensional

(=strokes)

Each

as local

properties

stroke,

words;

Each

in an

points

in

a string

points

0,

to

a child

node

Rectangle the

(MBR)

child

that

encloses

all

Bounding

the

entries

in

2 shows

illustration

row,

and

drawing

an

answer

set

of

an example

below.”

For

to simplify

of the

the

sake

of

let

with

and

us assume

only

two

that

expensive

this

the

pictographic

the

the

database.

features,

same

each

the

Moreover,

represents the

let

us assume

one letter

representation

dimensional

in the of the

space.

Each

string

points

(equal

string)

in the

two-dimensional

the

The

shaded

each

letter

similar each

(stroke).

way other

3.2

in Figure

(e.g.

r, n)

or even

a manner

similar

Q.

word

twice

For

it

identically,

The

scores

of the

are

not any

strings

queries

use of

from use the

Root,

string

Q):

the

For

that

each

Form

string

in

for

each

qi,

1

Suggest Documents