Statistical comparison of images using Gibbs random fields Bruno R´emillard and Christian Beaudoin Laboratoire de Recherche en Probabilit´es et Statistique Universit´e du Qu´ebec a` Trois-Rivi`eres C.P. 500, Trois-Rivi`eres (Qu´ebec), Canada, G9A 5H7 bruno
[email protected] christian
[email protected]
Abstract The statistical tools used to compare images are often based on multivariate statistics and use images as observations. This approach does not take into account the spatial and dependence structures in the images. Here we follow another approach in the sense that the number of pixels in the images is the number of observations. We also take into account spatial and dependence structures by modelling images with Gibbs random fields with nearest-neighbors interactions. The idea of the proposed method of comparison of images is to model the images by Gibbs random fields with a finite number of parameters and to test equality of parameters between the pooled images of each group. We obtain a test statistic that is easy to calculate, and the limiting distribution of the statistic is a chi-square distribution. Applications to handwrittens signatures and pattern recognition are presented.
In what follows we assume that the images are comparable in the sense that they have the same size and the same orientations (e.g. [1]). The idea of the proposed method of comparison of images is to model the images by Gibbs random fields with a finite number of parameters and to test equality of parameters between the pooled images of each group. In Section 2, images models with an arbitrary number of gray levels are introduced. Section 3 deals with methods for estimating the parameters of these images and an appropriate test statistic is defined. Section 4 is devoted to examples of applications for binary images, while the Appendix contains the proof of the asymptotic behavior of the proposed test statistic together with efficient Monte-Carlo Markov Chain methods for generating images.
2 Gibbs random fields 2.1 Notations
1 Introduction In many applications one wants to compare images of the same object. For example, one is interested in comparing fingerprints, handwritten signatures, radar images, etc. The statistical tools used to compare images are often based on multivariate statistics and use images as observations. This approach does not take into account the spatial and dependence structures in the images. Here we follow another approach in the sense that the number of pixels in the images is the number of observations. We also take into account spatial and dependence structures by modelling images with the so-called (finite) Gibbs random fields with nearest-neighbors interactions. These models are very popular in Statistical Physics. They have also been used by many authors for denoising images using a Bayesian approach; see for example [2] or [3].
The of pixels will be denoted by . It is assumed that is set a rectangle of . The “color” of the pixel at site will be denoted by , , where a white pixel has value , and a black pixel has value . #$ &%'( *)+ . For example, for black Set !" and white images or binary images, , . For gray level images, -.0/213 and / is the number of gray levels. For sake of simplicity, extend every so that 4 whenever 65 . If one wants to use statistics, one has to define an interesting parameterized family of distributions on the set of all possible images 738+9 . The desirable features of random images are
:
:
relatively few parameters;
Work supported in part by the Fonds Institutionnel de Recherche and by the Natural Sciences and Engineering Research Council of Canada, Grants No. OGP0042137.
612
:
“Markov property”: the law of the color of a pixel at site given all other pixels should depend only on the colors of the pixels in a given neighborhood ;=< of ; “stationarity”: ;=?A@B; , for any 6
;
C
each possible image has a positive probability.
One such family is given by Gibbs random fields introduced in the next subsection.
2.2 Gibbs random fields Throughout the rest of the paper let DFEHG , DFEJI , KKKML DFEN be fixed vectors with integer components. Let OQPSR OTG LKKKML OVU'WXGY and Z[P\R Z G LKKKML Z]N^Y . Set OX_`P(a . These vectors will serve to define the neighbors of a pixel and O and Z will be parameters of the family of laws for the images. For every defined by
R O L ZVY , the probability of an image b*Pcb]d
e dgf hif jJR b Y P?EJk lnm opm qsr tuwvyxzdgf hif j L
where {
N
is
where ²gR b L L§³ Y P8b R yE YHyb R A2E Y . The probabilities defined by (2.1) are called the local specifications of the Gibbs random field. REMARK . The parameter Z can be interpreted as a dependence parameter in the sense that if it equals zero, then the pixels at different sites are independent and
'U WXG e b R Y P?¡£¢`P8E ph © . v ´ | ° E h±¶µ L a¸·¹¡+·¹º?» K _
3 Comparison of images Using the Gibbs random fields models, we can compare images by first estimating the parameters of the respective groups and then perform an hypothesis test. Let us start with parameters estimation.
3.1 Parameters estimation
dgf hif jiR b Y P|}$~ XO R b R Y&Y | Z |}$~ b R Y b R yBE Y L There are many methods that can be used to estimate the d G d parameters of a statistical model, but the two most efficient ones are the maximum likelihood method and the maximum and where xzdgf hif j is the normalizing constant defined by pseudo-likelihood method. In our setting, the maximum likelihood method conxzdgf hif jP!| ~g
EJk lnm opm qsr u K sists is finding the values R O L ZVY maximizing the probability e gd f hif jJR b Y , where b is the observed image. Unfortunately Then one can check that for any finite subset of , one since xzdgf hif j has not a closed form except in the indepen-
has
dence case, it would take years to compute this constant for
e dgf hif j`b F b dnV P8E k lnm l r t l u vyx f L images having »nananana pixels. Therefore one cannot apply the t l maximum likelihood method directly. A similar method is proposed in [8]. To compute the constant term, the author where x f t l P ~i| _Mf G E k Jm l r t u , and relies on Monte-Carlo Markov Chain methods. It works as { e dgf hif jJR b Y is the same as maximizing t follows. ¼ ½n¾ e dgf hif Maximizing jJR b Y . Hence using calculus it amounts to solve the f t l R b Y[P}$| ~ OXR b R Y&Y following equations in R O L ZVY : N À ¿ ª P » |}$~ b R Y P8¡£¢TP » |}$~ e dgf hif j $ R YP8¡£¢ L | Z } }| ' b R Y b R yBE Y Á dz  Á d G f $ N · º?» , and | Z }$~ }| ' ~ b R Y b R yBE Y »·¡+¹ G f dn N » |}$~ b R Y b R yBE Y ÿ P Á d | Z }$~ } | ' ~ b R Y b R E Y K G f W dn » |}$~ dgf hif jÁR R Y R ABE Y&Y L P Á dzÄ }$ e f hiInf jparticular, f
R Y 8 P £ ¡ ¢ t $ »· ³ ·Å . P e dgf hif j £ª«"R ¬ iY ® P? ¡+ ¨ ¢ ¯ } R ¤¥YP8b R ¤¥Y L§¦ ¤P8 hp© j rt f f u Once these probabilitities and expectations are estimated P U'WXE G ° « ¬ i® ¯ } L (2.1) using Monte-Carlo methods (see appendix), one can use rej rt f f u cursive methods to solve the equations. However the cal| ° E h± culation time is enormous. To overcome this difficulty, we _ 613
(3.2)
ï Áïð+ñ Þ , Þ ï¹î+ïô , and Þ ò Ý Ë ùÍ ê\Ï ò Ó ò ¥Ó ê ÐÈ ÉiÈ ÊiÈ ÖsØ$Ù Ë Ú Í$Û Ï ñà áÁÞ àså Ö$× Ç õ ñ ÐÈ ÉiÈ ÊÈ Ö Ø$Ù Ë Ú Í$Û£ögË Ì Ò§ÚÒ Í ögË Ì Ò§ÚÒ§îiÍ$Ò õ ï Ò§î+ïô . Þ
(3.3)
REMARK .
propose to use the maximum pseudo-likelihood method. Instead of maximizing ÆVÇgÈ ÉiÈ ÊJË Ì Í , one maximizes the product Î of the local specifications, which is called the pseudolikelihood, that is
ÎcÏ?Î^ÐË ÑÒ§ÓVÍÏÕÔ ÆÁÐÈ ÉiÈ ÊiÈ ÖsØ$Ù Ë Ú Í Ï?Ì Ë Ú Í$ÛnÜ Ö$× Ç Note that maximizing Î is the same as maximizing Ý Ï Ý ÐË ÑÒ§ÓVÍÏßà áÁÞ àiâ ãnä Î Ï
à áÁÞ àå
Ö$× Ç
â ãnä ÆÁÐÈ ÉiÈ ÊiÈ ÖsØ$Ù Ë Ú Í Ï?Ì Ë Ú Í$ÛnÜ
Let "!HÈ ÐÈ Ö be the Ë ð^ñ
defined by
Þ$# ôÍ -dimensional vector
REMARK . Under the independence hypothesis, that is Ø$Ù Ë Ú Í Ï ÛnÒ Xï¹ðñ Þ ï Þ Ò ô?Ò ÓæÏèç , the maximum likelihood and maximum pseudo- %Ë "!HÈ ÐÈ Ö Í]Ï'&)Ù( ( Ë Ú Í ögË Ì Ò§ÚÒ ñ g ð $ Í ý Ò y ð Xï¹ðñ ï #*Þ Þ(3.6) # likelihood estimates of Ñ are the same and are given by éÑ ê"Ï â ãnä Ë ë é êì ëHé í Í , where ë é ê is the proportion of pixels where is the indicator function of the set , , that is ( (+* having color î , ç¸ï¹î+ïðñ . -0/1, %Ë -¥Í`Ï.& . One can see that the gradient Þ Þç ifotherwise ( (+* é Ý é of is given by Using calculus, it is easy to see that the values of Ë ÑÒ ÓVÍ that maximize the pseudo-likelihood must satisfy the fològÝ ògÝ ògÝ ògÝ 2 Ý lowing equations: Ï ü ò ÑTþ ÒÜÜÜMÒ ò Ñ 34Xþ Ò ò Ó þ ÒÜÜÜÒ ò Ó 5 ògÝ Ï à áÁÞ àså Ø sÐÈ ÐÈ Ö ñ ÐÈ ÉiÈ ÊiÈ Ö %Ë "!HÈ ÐÈ Ö Í$Û ò Ñ ê Ï ë é êXñ\à áÁÞ àså ÆÁÐÈ ÉiÈ ÊÈ ÖØ$Ù Ë Ú ÍÏ8î£ÛTÏ4çÒ (3.4) Ö$× Ç õ Ö$× Ç é ÏË ë Ò§ôÍ]ñ\à áÁÞ à å ÐÈ ÉiÈ ÊÈ Ö Ë%"!HÈ ÐÈ Ö Í$Ü (3.7) Þ ï¹î+ïðñ Þ , and Ö$× Ç õ ògÝ Ý ò Ó ó Ï8ôóJñ à áÁÞ à å ÐÈ ÉiÈ ÊÈ ÖØ$Ù Ë Ú Í$Û$ögË Ì Ò§ÚÒ§÷¶Í Ï4çÒ (3.5) Moreover the Hessian matrix ù of can also be written in Ö$× Çzõ the form
where ôógÏøà áÁ Þ à
å
Ö$× Ç
Ì Ë Ú Í ögË Ì Ò§ÚÒ§÷¶Í , Þ ï¹÷Fï¹ô
ù
.
Ï
ñà áÁÞ àså
ÐÈ ÉiÈ ÊiÈ Ö Ë% Ö Ö Í
Ö$× Ç õ Ý Note that the Hessian matrix ùúÏcùÐsË ÑÒ§ÓVÍ of satisñ ÐÈ ÉiÈ ÊÈ Ö Ë% Ö Í ÐÈ ÉiÈ ÊiÈ Ö Ë% Ö Í6 Ü (3.8) ù6þ¶þÿù6þ , where õ õ fies ùûÏýü It follows from representation (3.8) that ù is negative ù þ ù definite provided the linear span ágenerated by the vectors ò Ý Ø Ë ögË Ì Ò§ÚÒ Í$ÒÜÜÜÒ§ögË Ì Ò§ÚÒ§ôÍ&Í7'Ú/ Û has dimension ô . Ë ù6þ¶þÍ ê\Ï ò Ñ ò Ñ ê Þ Þ àså ÆÁÐÈ ÉiÈ ÊiÈ ÖsØ$Ù Ë Ú Í Ï Û ñà áÁ Ö$× Ç 3.2 Estimation of parameters for a group of
ÖØ$Ù Ë Ú Í ñ Á Æ Ð È i É È Ê È Ï Û J Ò VÏ8înÒ images Ï Þ Ö $ Ø Ù à Á á à å Þ ÆÁÐÈ ÉiÈ ÊÈ Ë Ú Í Ï Û If we have 8:9 independent images Ì þÒÜÜÜMÒ§Ì= ;@? Ë ÑÒ§ÓVÍ ï Ò§îï¹ðñ Þ , defined by Þ ò Ý Ë ù6þ Í ê\Ï ò Ñ ò Ó¥ê Ý = ;@? Ë ÑÒ§ÓVÍÏ å ; Ý ÐDCË ÑÒ§ÓVÍ$Ò BA þ Ï ñà áÁÞ àå ÆÁÐÈ ÉiÈ ÊiÈ ÖsØ$Ù Ë Ú Í Ï Û$ögË Ì Ò§ÚÒ§îiÍ Ý Ö$× Ç where ÐDC is the function defined by (3.3) calculated with ñ ÐÈ ÉiÈ ÊiÈ ÖsØ$Ù Ë Ú Í$Û ¶Ò the image Ì . õ 614
3.3 Test statistic Suppose one has E images of an object and one observes F new images numbered EHGJI to EHGKF . To compare images, one just have to test the null hypothesis L0MONQP%RST U"VWG degrees of freedom.
The results are given in the table below. One can see that the chi-square approximation of the statistic is quite good since the rejection rate is ËÆ Ç+Ì for images with the same parameters, which is very closei to the true value of Ë+Ì j . For the first image, the value of is always j P Æ ÍVÆ IV Æ ÅVÆ VÆ +Ë+X . i
The p-value of a test statistic having value a is thus approximated by P% e a1X . The null hypothesis is r rejected when that p-value is too small.
j P j
For the binary images treated in this section, we have chosen a model with eight neighbors at every interior sites, as seen in figure 1.
j
j
Æ Í+ËVÆ IV
P P
j
Æ Í+VÆ IV
P
4 Examples of applications
for second image
j
j
Æ Å©VÆ IV Æ Í+VÆ V
j
Proportion of rejections
Æ ÅVÆ VÆ +Ë"X
Æ +Ë+Ç
Æ ÅVÆ VÆ +Ë"X
Æ +Ë+Î
Æ ÅVÆ VÆ +Ë"X
Æ Î+Ê+Ë
Æ ÅVÆ VÆ +Ë"X
Æ Ç+I+
These results indicates that the test has a very power since small differences in parameters are easily detected.
4 3 2
4 3 2
4.2 Comparison of handwritten signatures
5 0
1 0
For the second application, we considered handwritten signatures from the same person and we compared the first one with the other ones. Using a level of Ë+Ì , only the last one was rejected, having a value abZI+ÊÆ Ï+ . The associated p-value is Æ ++Í+I . One can see that the image is somewhat different from the other ones so it is not surprising to reject the null hypothesis in that case.
1
6 7 8
1
2 3 4
Figure 1: Neighbors of pixel at site bors (right).
(left), pairs of neigh-
615
6 Appendix In this appendix, we first prove that the limiting distribution of the test statistic is chi-square and then we state some efficient ways of generating Gibbs random fields.
6.1 Asymptotic behavior of the test statistic Let ÙÚÜÛ%ÝOÞßà be fixed and let á be an ergodic stationary Gibbs probability measure with local specifications given by (2.1). Set âãYä å"ä æÚ ç^ãYä ãYä åyè0ékãYä ê}ä ëä å+Û%ç"ìQä ãYä å"à , íîï . Since we used the maximum pseudo-likelihood method for estimating parameters, it is tempting to use the difference of the log-pseudo-likelihoods as a test statistic as it is done in [6, Theorem, p. 158] for the maximum likelihood method. However, because of the dependence in the models, it does not work. Instead we can use a quadratic form based on the differences of parameters in the two groups. To make it clear, set
Figure 2: Four handwritten signatures of Alain Chalifour.
4.3 Pattern recognition To illustrate again the method, let us compare each of the last three letters in Figure 3 with the first one.
ñð
òmó Ú
ö ÷Bø£ù´ö óõô
Ð ï
â ãDüä å"äoýæ â þD ã üä å"ä ýæ Þ
åsúQû
and ÿð
Ð
If
Figure 3: Four different fonts for letter A.
Ú é
ï
ó
ö ÷Bø£ù ö ô
â ãDüä å"ä ýæ â þD ã üä å%äoýæ Þ
åsúQû
ð
âãYä å"ä æâ þãYä å"ä æ
Ú
éâãYä å"ä æ âþãYä å%ä æ ù"!
ù
&
ó
ï
ù
ó
ð Û Ùkè0Ù
#
ö ø£ù ÿ Õ
.
and
follows ù from [5] ò ñ ñ ÛÑÞ à . Let Ù$# ó ó images. Then ï ð Û%Ù # è0Ù à ðñ Since and converges spectively, it follows that
5 Conclusion
ñð Ú
ù
We proposed a new statistical method for comparing two groups of images. The idea is to model images using parameterized families of probability distributions and to test equality of parameters of the respective groups. This method has the advantage of being easy to implement and also have a very good discriminating power, as seen by simulations and comparisons of real binary images. We believe that our method could be very usefull in comparing radar images with true images in geomatics, or to find fingerprints that match well.
Ð
. Further set
ñ ö ø£ù
It
The respective values of the test statistic are: Ð+ÑsÒÓ Ò©Ô , Ð+Ñ+ÕÓ Ô+Ô , and Ò+ÒÓ Ô+Ö . Therefore each time the null hypothesis is rejected, using a level of ×+Ø .
ñ
òmó Ú
àoþ
ö ø£ù
éâãYä å%ä æâãY þ ä å"ä æ Ó ù ó
that
ï
ó
ð Ùkùè0 ! Ù
ù
estimated using the % new ñ ñ ÛÑÞ %}à . ñ almost surely to and re-
ù ñð Û
ð à
ñð
ð Û Ùkè0Ù
# à
ô
converges in law to a random variable having chi-square dis tribution with ' Ú)( èÐ
degrees of freedom.
6.2 Monte-Carlo Markov Chain methods It is sometimes impossibly difficult to generate directly observations with a given law. The idea behind Monte-Carlo Markov Chain methods is the following: one generates a Markov chain where the stationary measure is exactly the desired law. Then one runs the Markov chain for some time
616
and one takes the last observation of the Markov chain. A nice introduction of these methods is [7]; see also [4]. We will now describe the algorithms we used for the generation of gray levels images. 6.2.1 Metropolis-Hastings algorithm Let *,+.-0/01324-65879;: . Then the next configuration *,+@?BA at time >DC : is determined in the following way:
E choose a pixel F at random in G ;
E choose color HI15879;:9KJKJKJ;9MLBNO: