Computation of Substring Probabilities in ... - Semantic Scholar

1 downloads 0 Views 215KB Size Report
proach to Hypnogram Classification. Em 10th Nordic-Baltic Conference on Biomed- ical Engineering and 1st International Conference on Bioelectromagnetism, ...
Computation of Substring Probabilities in Stochastic Grammars Ana L. N. Fred Instituto de Telecomunicac~oes Instituto Superior Tecnico IST-Torre Norte, Av. Rovisco Pais, 1049-001 Lisboa, Portugal [email protected]

Abstract. The computation of the probability of string generation ac-

cording to stochastic grammars given only some of the symbols that compose it underlies pattern recognition problems concerning the prediction and/or recognition based on partial observations. This paper presents algorithms for the computation of substring probabilities in stochastic regular languages. Situations covered include pre x, su x and island probabilities. The computational time complexity of the algorithms is analyzed.

1 Introduction The computation of the probability of string generation according to stochastic grammars given only some of the symbols that compose it, underlies some pattern recognition problems such as prediction and recognition of patterns based on partial observations. Examples of this type of problems are illustrated in 2{ 4], in the context of automatic speech understanding, and in 5, 6], concerning the prediction of a particular physiological state based on the syntactic analysis of electro-encephalographic signals. Another potential application, in the area of image understanding, is the recognition of partially occluded objects based on their string contour descriptions. Expressions for the computation of substring probabilities according to stochastic context-free grammars written in Chomsky Normal Form have been proposed 1{3]. In this paper algorithms for the computation of substring probabilities for regular-type languages expressed by stochastic grammars of the form

 ! Fi  Fi !   Fi ! Fj   2     Fi  Fj 2 VN ( representing the grammar start symbol and VN   corresponding to the non-

terminal and terminal symbols sets, respectively) are described. This type of grammars arises, for instance, in the process of grammatical inference based on Crespi-Reghizzi's method 7] when structural samples assume the form : : : dcfgbecdab]]]] (meaning some sort of temporal alignment of sub-patterns).

The algorithms presented next explore the particular structure of these grammars, being essentially dynamic programming methods. Situations described include the recognition of xed length strings (probability of exact recognition { section 3.1, highest probability interpretation { section 3.2) and arbitrary length strings (prex { section 3.3, sux { section 3.5 and island probabilities { section 3.4). The computational complexity of the methods is analyzed in terms of worst case time complexity. Minor and obvious modications of the above algorithms enable the computation of probabilities according to the standard form of regular grammars.

2 Notation and Denitions

Let G = (VN   Rs ) be a stochastic context-free grammar, where VN is the nite set of non-terminal symbols or syntactical categories  is the set of terminal symbols (vocabulary) Rs is a nite set of productions of the form pi : A ! , A 2 VN ,  2 (  VN ) , the star representing any combination of symbols in the set and pi is the rule probability and  2 VN is the start symbol. When rules take the particular form A ! aB or A ! a, with A B 2 VN and a 2  , then the grammar is designated as nite-state or regular. Along the text the symbols A, G and H will be used to represent non-terminal symbols. The following denitions are also useful: w1 : : : wn def = nite sequence of terminal symbols H )  def = derivations of  from H through the application of an arbitrary number of rules CT ( ) def = number of terminal symbols in  (with repetitions) def CN ( ) = number of non-terminal symbols in  (with repetitions) nmin (HG) def = min n  max  fCT ( ) : H ! Gg The following graphical notation is used to represent derivation trees: { Arcs are associated with the direct application of rules. For instance, the rule H ! w1 w2 G is represented by HHH or HHH  w1 w2 w1 w2 G G { The triangle represents derivation trees having the top non-terminal symbol as root and leading to the string on the base of the triangle: HHH     H ) w1 : : : wn   w1 : : : w  n

3 Algorithms Description 3.1 Probability of Derivation of Exact String { Pr(

H

) 

i+n )

i

w :::w

 Let Pr (H ) wi : : : wi+n ) be the probability of all derivation trees having H as root and generating exactly wi : : : wi+n . According to the type of grammars considered the computation of this probability can be obtained as follows (see gure 1):  Pr( ) wi : : : wi+n ) =

X G

 Pr( ! G)Pr(G ) wi : : : wi+n )

(1)

 Pr(H ) wi : : : wi+n )  H 6=  = Pr(H ! wi : : : wi+n ) +

+

X nminXHG (

G

)

k=1

 Pr(H ! wi : : : wi+k;1 G)Pr(G ) wi+k : : : wi+n )

H

H w HA1 : :H: H w +1 w+

H

i

i

i

i

:::

H w w+1 HA1 : :H: wH+ w +2

n

i

i

i

(2)

HHH  w : : : w + 1 A1 i

i

;

n

n

w+ i

n

Fig. 1. Terms of the expression 2. This can be summarized by the iterative algorithm:

1. For all non-terminal symbols H 2 VN ; fg, determine Pr(H ! wi+n )

2. For k = 1 : : : n and for all non-terminal symbols H 2 VN ; fg, compute:  Pr(H ) wi+n;k : : : wi+n ) = Pr(H ! wi+n;k : : : wi+n ) +

+

X nminXHG (

G

j =1

)

Pr(H ! wi+n;k : : : wi+n;k+j;1 G) 

 Pr(G ) wi+n;k+j : : : wi+n )

(3)

3.

 Pr( ) wi : : : wi+n ) =

X G

 Pr( ! G)Pr(G ) wi : : : wi+n )

(4)

This corresponds to lling in the matrix of gure 2, column by column, from the right to the left. In this gure, each element corresponds to the probability of all derivation trees for the substring on the top of the column, with root on the  non-terminal symbols indicated on the left of the row, i.e. Pr (Hi ) wj : : : wi+n ). H1 H2 .. . Hj N j

w :::w + ::: i

i

n

w+ i

;1

n

:::w + i

Pr (H2 ) w + i

n

i

;1 w + )

n

i

w+ Pr (H1 ! w + ) n

i

n

n

V

Fig. 2. Array representing the syntactic recognition of strings according to the algorithm.

Notice that the calculations on step 2 are based only on direct rules and access to previously computed probabilities on columns to the right of the position under calculation.

Computational complexity analysis For strings of length n, the computational cycle is repeated n times, all non-terminal symbols being considered. Associating to the terms in equation 3 A: Pr(H ! wi+n;k : : : wi+n ) B:

X nminXHG (

G

j =1

)

 Pr(H ! wi+n;k : : : wi+n;k+j;1 G)Pr(G ) wi+n;k+j : : : wi+n )

the worst case time complexity is of the order

0 1 BB CC 0computation of A computation of B1 nX ;1 BB | {z } {z } B@| CA + CCC OB + ( j V j ; 1) ( j V j ; 1)  +  N N |{z} CC BB| {z } k=1 step 3A @ step 1 | {z } step 2

= O (jVN jn)

3.2 Maximum Probability of Derivation { m( P

H

) 

i

w :::w

i+n )

 Let Pm (H ) w1 : : : wn ) denote the highest probability over all derivation trees having H as root and producing exactly w1 : : : wn and dene the matrix Mu as follows:

Mu i j ] = Pm(Hi ) wj : : : wn ) Observing that

i = 1 : : :  jVN j j = 1 : : :  n

n

 Pm (  ) w1 : : : wn ) = max Pm(G ) w1 : : : wn ) G

(5)

o

(6)

and  Pm(H ) w1 : : : wn )  H 6=  = max fPr(H ! w1 : : : wn ) 

n

Pr(H ! w1 : : : wi G)Pm (G ) wi+1 : : : wn ) max iG 

o

(7)

the following algorithm computes the desired probability: 1. For i = 1 : : :  jVN j  Hi 6= 



Hi ! wn ) if (Hi ! wn ) 2 Rs Mu i n] = Pr( 0 otherwise

2. For j = n ; 1 : : :  1 and for i = 1 : : :  jVN j  Hi 6= 

Mu i j ] = max f Pr(Hi ! wj : : : wn )  max fPr(Hi ! wj : : : wk Hl )Mu l k + 1]g k>j l

3. For i : Hi = 

g

Mu i 1] =max fPr( ! Hj )Mu j 1]g j

This algorithm corresponds to the computation of an array similar to the one in gure 2, but where probabilities refer to maximum values of single derivations instead of total probability of derivation when ambiguous grammars are considered. Based on the similarity between this algorithm and the one developed in section 3.1 it is straightforward to conclude that it runs in O (jVN jn) time.

3.3 Prex Probability { Pr (

)

i i+n   ) The probability of all derivation trees having H as root and generating arbitrary  length strings with the prex wi : : : wi+n { Pr (H ) wi : : : wi+n  )) { can be expressed as follows:  Pr( ) wi : : : wi+n   ) =

H

X G



w :::w

 Pr( ! G)Pr(G ) wi : : : wi+n   )

(8)

 Pr(H ) wi : : : wi+n   )  H 6=  = Pr(H ! wi : : : wi+n ) +

+ +

X nminXHG (

G X G

)

k=1

 Pr(H ! wi : : : wi+k;1 G)Pr(G ) wi+k : : : wi+n   ) +

Pr(H ! wi : : : wi+n G) +

XX

!

Pr(H wi : : : wi+n v1 : : : vk G) G k2N + = Pr(H wi : : : wi+n ) + X nminX(HG) + Pr(H wi : : : wi+k;1 G)Pr(G G k=1 X + Pr(H wi : : : wi+n v1 : : : vk G) + Gk2N0 +

!

!

) wi

+k

: : : wi+n   ) +

!

H i

i n

w :::w +



(9)

H H H H i HHA1 : : : i inHHA1 ii Hn HA1 i inHHA1 i Hi Hn i nHH HH HH w :::w +

w

w +1 :::w +





;1

w :::w +

w +





w :::w +













Fig. 3. Terms of the expression 9. The previous expression suggest the following iterative algorithm: 1. For all nonterminal symbols H 2 VN ; fg, determine Pr(H ! wi+n ) +

X

k2N +

Pr(H ! wi+n v1 : : : vk )

2. For k = 1 : : :  n and for all nonterminal symbols H 2 VN ; fg, compute  Pr(H ) wi+n;k : : : wi+n  )

= Pr(H ! wi+n;k : : : wi+n ) + +

X kminXHG (

j =1

G

)

Pr(H ! wi+n;k : : : wi+n;k+j;1 G) 

  Pr(G ) wi n;k j : : : wi n   ) + XX + Pr(H ! wi n;k : : : wi n v : : : vj G) +

+

G j 2N0+

3.

+

 Pr( ) wi : : : wi+n   ) =

X G

+

+

1

(10)

 Pr( ! G)Pr(G ) wi : : : wi+n  ) (11)

This algorithm, like the one of section 3.1, corresponds to lling in a parsing matrix similar to the one in gure 2. The similarity with the algorithm in section 3.1 leads to the conclusion that this algorithm has O (jVN jn) time complexity.

3.4 Island Probability { Pr (

)

i+n   )  The island probability consists of Pr (H )  wi : : : wi+n   ), the probability of all derivation trees with root in H that generate arbitrary length sequences containing the subsequence wi : : : wi+n . Let H

PR (H ! G) def =

X 





i

 w :::w

Pr(H ! G)   2  

(12)

= probability of rewriting H by sequences with the nonterminal symbol G as sux One can write

 Pr( )   wi : : : wi+n   ) =

X G

 Pr( ! G)Pr(G )   wi : : : wi+n   ) (13)

   Pr(H ) X  wi : : : wi+n  )   H 6=   = PR (H ! G)Pr(G )  wi : : : wi+n  ) +

G

+ +

X nminXHG (

G X G

j =1

)

 PR (H ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n   ) +

  PR (H ! wi : : : wi+n G) Pr( | G{z)  }) + =1

XX

+

  Pr(H ! v1 : : : vk wi : : : wi+n z1 : : : zj G) Pr( | G{z)  }) +

G jk

+

XX

=1

Pr(H ! v1 : : : vk wi : : : wi+n z1 : : : zj )

G jk

(14)

For strings suciently long ( n >max G fCT ( ) : G !  g) the last three terms do not exist, therefore we will ignore them from now on. H H H H H HHG i iHkHG i iHnHG ii Hn HG i i n i i Hn H i kHi Hn HH HH









 w :::w +



w



:::w +



w + +1 :::w +

w





:::w +

 w :::w +















w

:::w +







Fig. 4. Terms of the expression 14.    Pr(H ) X  wi : : : wi+n  )   H 6=   = PR (H ! G)Pr(G )  wi : : : wi+n  ) +

G

+

X nminXHG (

j =1

G

)

 PR (H ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n   ) (15)

Recursively applying the above expression and after some manipulation (details can be found in 6]) one obtains:    Pr(H ) +n  )  H 6=  X  wi : : : wiX X  = QR (H ) A) PR (A ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n  )

A

+ with

XX

G j 2N +

G j 2N +

 PR (H ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n   )

QR (H ) G) = Pr(H )   G) = PR (H ! G) + X + PR (H ! A)PR (A ! G) +



A PR (A2

! G) + : : :

X A1 A2

PR (H ! A1 )PR (A1 ! A2 ) 

QR (H ) G) obeys the equation X QR (H ) G) = PR (H ! A)QR (A ) G) + PR (H ! G) A

(16)

(17) (18)

Dening the matrices

Q is given by

6]:

PRH G] = PR(H ! G) QRH G] = QR(H ) G)

(19) (20)

QR = PRI ; PR];

(21)

1

The algorithm can thus be described as:

1. O-line computation of QR = PR I ; PR ];1 .  2. On-line computation of Pr(G ) wn   )  Pr(G ) wn;1 wn  )  : : :  Pr(G )  w2 : : : wn  ) for all nonterminal symbols G 2 VN ;fg using the algorithm in section 3.3. 3. For all nonterminal symbols H 2 VN ; fg compute    Pr(H ) +n  ) X  wi : : : wiX X = QR (H ) A) PR (A ! wi : : : wi+j;1 G) 

A

+ 4.

G j 2N +

XX G j 2N +

Pr(G ) wi

: : : wi+n   ) +  PR (H ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n  ) +j

 Pr( )   wi : : : wi+n   ) =

X G

 Pr( ! G)Pr(G )   wi : : : wi+n   )

The required on-line computations have the following time complexity:

0 1 OB  C @(|jVN j ; 1)( A= |{z} {z n ; 1)} + (|jVN j ;{z1)jVN j} + step 4 step 2 step 3 ; 2 = max(O (jV jn)  O jV j N

3.5 Su x Probability { Pr(

)

N

i i+n )  Let Pr (H )  wi : : : wi+n ) be the probability of all derivation trees with root in H generating arbitrary length strings having wi : : : wi+n as a sux. This probability can be derived as follows: H

 Pr( )   wi : : : wi+n ) =

X G





 w :::w

 Pr( ! G)Pr(G )   wi : : : wi+n )

(22)

  Pr(H ) X  wi : : : wi+n )  H 6=  = PR (H ! G)Pr(G )  wi : : : wi+n ) +

G

+ +

X nminXHG (

G X G

j =1

)

 PR (H ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n ) +

PR (H ! wi : : : wi+n )

(23)

H H H HHG i iHkHG i i n i HiHn HHi n i k









 w :::w +



w

:::w +

 w :::w +

w + +1 :::w +

Fig. 5. Terms of the expression 23. For strings suciently long ( n >max G fCT ( ) : G !  g) the third term does not exist, so it will be ignored henceforth. Recursively applying the resulting expression to the second part of the rst term one obtains:   Pr(H ) X X wi : : : wi+n )  H 6=   = PR(H ! wi : : : wi+j;1 A1 )Pr(A1 ) wi+j : : : wi+n ) +

A1 j 2N +

X X

!

PR (H A1 )PR (A1 A1 A2 j 2N + :Pr(A2 wi+j : : : wi+n ) + +

! wi : : : wi

+j ;1

A2 ):

)

+::X :+ + PR (H ! A1 )PR (A1 ! A2 ) : : : A1 :::Ak

: : : PR (Ak;1 ! wi : : : wi+j;1 Ak )Pr(Ak ) wi+j : : : wi+n ) + : : : (24) X XX  = QR (H ) A) PR (A ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n ) + A

+

XX

G j 2N +

G j 2N +

 PR (H ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n )

The algorithm is then: 1. O-line computation of QR = PR I ; PR ];1 .

(25)

 2. On-line computation of Pr(G ) wn )  Pr(G ) wn;1 wn )  : : :  Pr(G ) w2 : : : wn ) for all non-terminal symbols G 2 VN ; fg using the algorithm in section 3.1. 3. For all non-terminal symbols H 2 VN ; fg compute

  Pr(H ) +n ) X  wi : : : wiX X  = QR (H ) A) PR (A ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n )

A

+ 4.

XX

G j 2N +

G j 2N +

 PR (H ! wi : : : wi+j;1 G)Pr(G ) wi+j : : : wi+n )

 Pr( )  wi : : : wi+n ) =

X G

 Pr( ! G)Pr(G )   wi : : : wi+n )

On-line computations have the time complexity:

0 1 OB  C @(|jVN j ; 1)( A = O (jVN jn) {z n ; 1)} + (|jVN j{z; 1)} + |{z} step 2

step 3

step 4

4 Conclusions This paper described several algorithms for the computation of substring probabilities according to grammars written in the form

 ! Fi  Fi !   Fi ! Fj   2     Fi  Fj 2 VN

Table 1 summarizes the probabilities considered here, and the order of complexity of the associated algorithms. Algorithm Expression time Complexity  Fixed length Pr(H ) w1 : : : w ) O(jV jn)  strings Pm (H ) w1 : : : w ) O(jV jn)  Arbitrary Pr(H ) w1 : : : w   ) O(jV jn) ;  length Pr(H )   w1 : : : w   ) max(O (jV jn)  O jV  strings Pr(H )   w1 : : : w ) O(jV jn) n

N

n

N

n

N

n n

N

N

j2

N

Table 1. Summary of proposed algorithms for the computation of sub-string probabilities.

More general algorithms for the computation of sub-string probabilities according to stochastic context-free grammars, written in Chomsky Normal Form, can be found in 1{3]. However, the later have O(n3 ) time complexity 2]. The herein proposed algorithms, exhibiting linear time complexity in string's length, represent a computationally appealing alternative to be used whenever the application at hand can adequately be modeled by the types of grammars described above. Examples of application of the algorithms described in this paper can be found in 5, 6, 8, 9].

References 1. F. Jelinek, J. D. Laerty and R. L. Mercer. Basic Methods of Probabilistic Context Free Grammars. In Speech Recognition and Understanding. Recent Advances, pages 345{360. Springer-Verlag, 1992. 2. A. Corazza, R. De Mori, R. Gretter, and G. Satta. Computation of Probabilities for an Island-Driven Parser. IEEE Trans. Pattern Anal. Machine Intell., vol. 13, No. 9, pages 936{949, 1991. 3. A. Corazza, R. De Mori, and G. Satta. Computation of Upper-Bounds for Stochastic Context-Free Languages. In proceedings AAAI-92, pages 344{349, 1992. 4. A. Corazza, R. De Mori, R. Gretter, and G. Satta. Some Recent results on Stochastic Language Modelling. In Advances in Structural and Syntactic Pattern Recognition, World-Scienti c, pages 163{183, 1992. 5. A. L. N. Fred, A. C. Rosa, and J. M. N. Leit~ao. Predicting REM in sleep EEG using a structural approach. In E. S. Gelsema and L. N. Kanal, editors, Pattern Recognition in Practice IV, pages 107 { 117. Elsevier Science Publishers, 1994. 6. A. L. N. Fred, Structural Pattern Recognition: Applications in Automatic Sleep Analysis, PhD Thesis, Technical University of Lisbon, 1994. 7. K. S. Fu and T. L. Booth. Grammatical inference: Introduction and survey { part I and II. IEEE Trans. Pattern Anal. Machine Intell., PAMI-8:343{359, May 1986. 8. Ana L. N. Fred and T. Paiva. Sleep Dynamics and Sleep Disorders: a Syntactic Approach to Hypnogram Classi cation. Em 10th Nordic-Baltic Conference on Biomedical Engineering and 1st International Conference on Bioelectromagnetism, pp 395{ 396, Tampere, Finl^andia Junho, 1996. 9. A. L. N. Fred, J. S. Marques, P. M. Jorge, Hidden Markov Models vs Syntactic Modeling in Object Recognition, Proc. Intl. Conference on Image Processing,ICIP'97, 1997.

Suggest Documents