Latent Fingerprint Segmentation with Adaptive Total Variation Model Jiangyang Zhang1 , Rongjie Lai2 and C.-C. Jay Kuo1 1 Ming Hsieh Department of Electrical Engineering 2 Department of Mathematics University of Southern California
[email protected],
[email protected] and
[email protected]
Abstract Image segmentation is an important step in automatic fingerprint identification systems. While tremendous progress has been made in rolled and plain fingerprint segmentation, the segmentation of latent fingerprints is still a difficult problem. Features used for rolled and plain fingerprint images fail to work properly on latent images due to the poor quality in ridge information and the presence of multiple types of strong structured noise. In this work, we present an adaptive total variation (TV) model to achieve effective latent fingerprint segmentation. The proposed solution can remove various types of structured noise existing in a single latent image and automatically locate the region of interest (ROI), which contains primarily the latent fingerprint. Then, the following tasks such as fingerprint feature extraction and matching can be conducted in the ROI only. In the proposed TV-based image model, one can adaptively adjust the weight coefficient of the fidelity term in L1-norm depending on the background noise level, which is estimated via TV-based texture analysis. We apply the proposed TV-based segmentation algorithm to the NIST SD27 latent fingerprint database to demonstrate its superior performance.
1. Introduction Fingerprint segmentation is an important step in an automatic fingerprint identification system. The goal of segmentation is to decompose a fingerprint image into two disjoint regions: foreground and background. The foreground, also called the region of interest (ROI), consists of desired fingerprints while the background contains noisy and irrelevant contents that will be discarded in the following processing steps. Accurate fingerprint segmentation is essential as it affects the reliable extraction of minutiae and singular points, which are key features in fingerprint matching. Fingerprints generally fall into three categories; namely, rolled, plain and latent fingerprints, where each category is
collected through a different process. Rolled fingerprints are obtained by rolling the finger from one side to the other in order to capture all the ridge details while plain fingerprints are acquired by pressing the fingertip onto a flat surface. Latent fingerprints, however, are usually lifted from object surfaces that were inadvertently touched or handled by a person, and they play a crucial role in identifying suspects by law enforcement agencies. Segmentation on rolled and plain fingerprint images has been widely studied in the literature. In the early work of [14], segmentation was achieved by partitioning the fingerprint image into blocks, followed by block classification based on the gradient and variance information. In [1], fingerprints were segmented using three pixel level features (i.e., coherence, mean and variance). A linear classifier was trained and morphology was applied to obtain compact clusters. In real world applications, not all fingerprint images have clear ridges and boundaries. Recently, several methods have been proposed to target specifically on lowquality fingerprint images [8, 18]. In [8], four features (i.e., the global mean, the local mean, variance and coherence) were used for segmentation. A new feature, the eccentric moment, was used in [18] to locate the boundaries of lowquality fingerprints. While a significant amount of effort has been made in segmenting rolled and plain fingerprints, latent fingerprint segmentation remains to be a difficult problem. The difficulty mainly lies in poor image quality in terms of the clarity of the ridge information as well as the overlap of the ROI and structured noise in the background. To achieve automatic latent identification, it is important to preserve the accuracy of latent matching while minimizing human intervention. Recent studies [7, 13, 23] have made significant progress in the first objective while the second objective still remains a challenging research issue. As in [23], the ROI and feature points were still manually marked and assumed to be known. Undoubtedly, accurate latent segmentation is an essential step towards achieving automatic latent identification, and it is the main focus of our current research. 189
978-1-4673-0397-2/12/$31.00 ©2012 IEEE
Total-Variation-based (TV-based) image models have been widely used in the context of image decomposition. Among several well known TV-based models, the model using total variation regularization with an L1 fidelity term, denoted by the TV-L1 model, is especially suited for multiscale image decomposition and feature selection [22, 4, 5]. Furthermore, a modified TV-L1 model was adopted in [5] to extract small-scale facial features for facial recognition under varying illumination. It appears that the TV-based image model with proper adaptation may offer a suitable tool for latent fingerprint segmentation. Being motivated by the above observation, we propose an adaptive TV-based model for efficient latent segmentation in this work. The proposed adaptive TV model has the ability to decompose a single latent image and locate the essential latent area, which is then used for feature extraction and latent matching. The weight coefficient of the fidelity term is adaptively determined according to the background noise level, estimated by TV-based texture analysis. The rest of this paper is organized as follows. Several forms of structured noise in latent fingerprint images are first examined in Sec. 2, which suggests some ideas for their removal. Then, a new adaptive TV-based image model is proposed for effective latent fingerprint segmentation, and its behavior is analyzed in Sec. 3. Experimental results are shown in Sec. 4 to demonstrate the effectiveness of the proposed solution. Finally, concluding remarks are given and future research directions are pointed out in Sec. 5.
2. Structured Noise in Latent Fingerprint Images The major difference between latent and rolled/plain fingerprints is the presence of structured noise in latent fingerprint images. Based on the appearance, we can classify structured noise into the following six categories: arch, line, character, speckle, stain and others (see Fig. 1).
Arch
overlaps with latent fingerprints, which makes fingerprint segmentation very difficult. Previous work on fingerprint image segmentation is mostly feature-based. Features used for segmentation include the mean, variance, contrast, coherence and their variants [18, 8]. Features targeting at rolled/plain fingerprints may not work properly for latent fingerprints since many assumptions made on rolled/plain fingerprints fail to apply to latent fingerprints. For instance, in [1], the mean feature was used since the background was assumed to be bright and the variance feature was used since the variance of background noise was assumed to be much lower than that of the ridge-valley structure. However, the validity of these assumptions is in question in the context of latent fingerprint images. We plot the distributions of three segmentation features; namely, the mean, variance and coherence, in Fig. 2 to evaluate their effectiveness for fingerprint image segmentation. That is, we first segment a plain and a latent fingerprint image manually into foreground and background regions, and plot the histograms of these features in each region. As shown in the figure, the distributions of these features in foreground and background regions are well separated for plain fingerprints. In contrast, those of latent fingerprints have a significant overlap. The overlapping of these features on latent fingerprint images can be explained by two reasons. First, the structured noise is often dominating in the captured image and it has high contrast and high coherent gradient directions. Second, the quality of some latent fingerprints is so poor that they cannot be well characterized by these features. Therefore, we need to consider a new image model to get more useful features so that latent fingerprint patterns and structured noise can be well separated based on them.
Plain Fingerprint
Mean
Variance
Coherence
Latent Fingerprint
Mean
Variance
Coherence
Line
Text
Stain
Speckle
Other
Figure 1. Illustration of six types of structured noise in latent fingerprint images.
The challenge of latent fingerprint image segmentation lies in how to separate latent fingerprints, which is a relatively weak signal, from all types of structured noise in the background region. Most often, the structured noise
Figure 2. Comparison of distributions of the mean, variance and coherence features in the foreground and background areas of plain and latent fingerprints.
3. Segmentation with Adaptive TV-based Image Model In this section, we first introduce the TV-L1 model and explain its capability in multiscale feature selection. Then, we propose an adaptive TV-L1 model and discuss the choice 190
of the key parameter, λ(x). Finally, we show how to perform latent fingerprint segmentation using the decomposition property of the proposed adaptive TV-L1 model.
3.1. TV-L1 Model TV-based image models have been widely studied to achieve the task of image decomposition. Among many existing TV models, the total variation regularization model with an L1 fidelity term, denoted by TV-L1, is suitable for multiscale image decomposition and feature selection [22, 4, 5]. In the context of facial recognition under varying illumination, a modified TV-L1 model was proposed in [5] to extract small-scale facial features. Being similar to other TV-based image models (e.g., the ROF model [16]), the TV-L1 model decomposes an input image, f , into two signal layers: • cartoon u, which consists of the piecewise-smooth component in f , and • texture v, which contains the oscillatory or textured component in f . The decomposition f = u + v, is obtained by solving the following variational problem: Z Z min |Ou| + λ |u − f | dx, (1)
Bregman iteration for constrained optimization. The equivalence of the split Bregman iterations with the alternating direction method of multipliers (ADMM), the DouglasRachford splitting and the augmented Lagrangian method can be found in [6, 17, 3, 20].
3.2. Multiscale Feature Selection Property of TV-L1 The TV-L1 model distinguishes itself from other TVbased models by its unique capability of intensityindependent multiscale decomposition. It was shown both theoretically [4] and experimentally [22] that the fidelity weight coefficient, λ, in (1) is closely related to the scale of features in texture output v. This relation is supported by the analytic example in [4]. If f is equal to a disk signal, denoted by Br , which has radius r and unit height, the solution uλ of (1) is given as 2 0 if 0 ≤ λ < r 2 uλ (x) = f (c) if λ > r cf (x) if λ = 2 , for any c ∈ [0, 1] r In other words, depending on the λ value, the TV-L1 functional is minimized by either 0 or input f . This demonstrates the ability of the TV-L1 model to select geometric features of a given scale. Fig. 3 shows an example of feature selection on the latent fingerprint image.
u
where f , u and v are functions of image gray-scale intensity values in R2 , Ou is the gradient value R of u and λ is a constant weighting parameter. We call |Ou| and |u − f | the total variation of u and the fidelity term, respectively. The TV-L1 model is difficult to compute due to nonlinearity and non-differentiability of the total variation term as well as the fidelity term. A gradient descent approach was proposed in [4], which solves for u as a steady solution of the Euler-Lagrange equation of (1): f −u Ou +λ = 0. (2) O· |Ou| |f − u| Although Eq. (2) is easier to implement, the gradient descent approach is slow due to a small time step imposed by the strict stability constraint. That is, the term |ff −u −u| is non-smooth at f − u, which forces the time step to be very small when the solution is approaching the steady state. In Ou addition, |Ou| in the term |Ou| might be zero, and a small positive constant needs to be added to avoid zero division, which results in inexact solution. Many numerical methods have been proposed to improve this method. One approach is the split Bregman iteration [10, 11, 15, 20], which uses functional splitting and
f
u0.10
u0.30
u0.70
v0.10
v0.30
v0.70
Figure 3. Feature selection based on the TV-L1 model for latent fingerprint image: input image f (left most) and its TV-L1 decomposed components u and v with its λ value shown in the subscript.
We see that the numerical results match the analytic results well in Fig. 3. Basically, the λ value can control the selection of features with unknown intensity values. When λ is very small (e.g., λ = 0.10), u captures the inhomogeneous illumination in the background while most fine structures are kept in v. When λ = 0.30, large-scale objects (arch) are separated from structures of smaller scales (characters). As λ continues to increase, only small-scale structures (fingerprint and noise) are left in v and the major content of f appears in u. 191
Since the fingerprint and structured noise patterns differ in their relative scales, we can separate them using the TVL1 model with an appropriately chosen λ. However, there arise two problems by applying TV-L1 to latent fingerprint images directly. 1. Some structured noise patterns (e.g., speckle, stain) are of smaller or similar scales as the latent fingerprints and, as a result, they will be extracted along with fingerprints in v. 2. A small amount of boundary signals near non-smooth edges will appear in v as shown in Fig. 4 due to the non-smoothness of the boundary and the use of finite differencing. This was also reported in [5]. To overcome these limitations, we propose an adaptive TVL1 model with spatial-dependent fidelity.
f
u
v
Figure 4. Illustration of the boundary signal problem in TV-L1 decomposition: a small amount of structure noise edge signal is still kept in texture v (left) and signals along the dash line depicted in f , u and v (right).
3.3. Adaptive TV-L1 Model The TV-L1 model with spatially invariant fidelity (1) does not generate the desired output throughout the whole latent fingerprint image. In the fingerprint region, when λ is well matched with the scale of fingerprints, all essential contents can be captured in output texture v. However, in the noisy region, some unwanted signals will also be extracted to v with the same λ. This motivates us to consider an adaptive TV-L1 model with spatially variant fidelity: Z Z min |Ou| + λ(x) |u − f | dx, (3) u
where λ(x) is a spatially varying parameter. The spatially varying parameter, λ(x), can be understood by two ways. First, as analyzed in Sec. 3.2, λ(x) is a scalar that controls the scale of features appearing in v at pixel x. A large λ(x) value enforces most textures to be kept in u,
leaving only tiny-scale structures in v. When λ(x) is sufficiently large, u(x) ≈ f (x), and the original content is almost totally blocked from v. Thus, v(x) ≈ 0. Second, parameter λ(x) can also be interpreted as a weighting coefficient that balances the importance between fidelity and smoothness of u. In the fingerprint region, the λ(x) value should be relatively small, since low fidelity ensures the smoothness of u and, thus, more textures could be extracted in v. In regions with structured noise, fidelity becomes important since a large λ(x) value ensures all noise components to be filtered out from texture v. We use the augmented Lagrangian method [9, 12, 19, 21] to solve the proposed adaptive TV-L1 model given in (3). The augmented Lagrangian method is both accurate and efficient, as it benefits from the FFT-based fast solver with a closed-form solution. It has been proven that the augmented Lagrangian is equivalent to the split Bregman iteration and its convergence is always guaranteed [20]. In the augmented Lagrangian method, two new variables are introduced and it can be reformulated as the following constraint optimization problem: Z Z * min p + λ(x) |v − f | dx, u (4) * p1 ∂x u s.t. p = = = Ou, v = u. p2 ∂y u To solve (4), the following augmented Lagrangian functional is defined: Z Z * * * L(u, p , v, λp , λv ) = p + λ(x) |v − f | dx Z Z * rp * * ( p − Ou)2 + λp ( p − Ou) + 2 Z Z rv + (v − u)2 + λv (v − u), 2 *
where λp and λv are the Lagrange multipliers and rp and rv are positive constants. The augmented Lagrangian method uses an iterative procedure to solve (4) as shown in Algorithm 1. The iterative scheme runs until some stopping condition is satisfied. * * * Since variables u, p and v in L(u, p , v, λp , λv ) are coupled together in the minimization problem (5), it is difficult to solve them simultaneously. Instead, the problem is decomposed into three sub-problems and an alternative minimization process is applied. All three sub-problems can be efficiently solved as they all have closed-form solutions. They are given as: *k * −rp · F(div) · F( p ) − F(div) · F(λp ) +rv F(v) + F(λkv ) k −1 , u =F r − r F(4) v p
192
1 * p (x) = max 0, 1 − * · w(x), rp w(x) λ(x) k v (x) = max 0, 1 − · q(x), rv |q(x)|
*k
*
where F(u) denotes the Fourier transform of u, w(x) = *k
Ou(x) −
λp (x) rp
and q(x) = v(x) − f (x) −
Parameter ησ provides useful information about the underlying structure of a given texture region. Intuitively, it describes the local oscillatory behavior of a texture pattern at spatial scale σ. Fig. 5 shows ησ at different pixels in several latent fingerprint images. We see that ησ in the fingerprint region has a sharp peak located at σ = 2.0.
λk v (x) rv .
Algorithm 1. Augmented Lagrangian method for our proposed adaptive TV-L1 model. 1. Initialization: *k+1,0 *k uk+1,0 = uk , p = p , v k+1,0 = v k 2. For k = 0, 1, 2, ..., compute: *k
*
*k
(uk , p , v k ) = argmin L(u, p , v, λp , λkv )
(5)
*
(u, p ,v)
3. Update: * k+1
*k
*k
λp = λp + rp ( p − Ouk ) k+1 λv = λk+1 + rv (v k − uk ) v
3.4. Parameter Estimation Ideally, parameter λ(x) should take larger values in regions with structured noise and be relatively small in fingerprint regions. To differentiate these regions, we study their total variation behaviors after low-pass filtering. When an input image, f , is filtered by a low-pass filter denoted by Lσ (ξ) =
1 , 1 + (2πσ |ξ|)4
its cartoon and texture components, although both being blurred to some extent, change differently in terms of local total variation (LTV), which is defined as LT V (f ) = Gσ ∗ |Of | , where f is the image region and Gσ is a Gaussian kernel with standard deviation σ. Cartoon and texture regions were differentiated by their relative LTV reduction rate in [2]. This is based on the property that the LTV of cartoon regions decreases only slightly after low-pass filtering while the LTV of the texture region decays rapidly under the same condition. Although being effective in separating edge regions from texture regions, the algorithm proposed in [2] cannot differentiate textures of different scales (e.g., fingerprints with speckles). To overcome this limitation, we define the differential LTV reduction rate, denoted by ησ , as ησ =
LT V (Lσ ∗ f ) − LT V (Lσ−1 ∗ f ) . LT V (f )
(6)
Figure 5. Plots of ησ (x) for several pixels in different latent fingerprint images. It has a sharp peak located near σ = 2.0 in the fingerprint region while it reaches the maximum at different σ values in other regions.
Finally, we choose the spatially variant coefficient λ(x) in (3) as κ , (7) λ(x) = ηc (x) + where ηc is the differential LTV reduction rate at σ = c, which is adjusted to the best response of fingerprint patterns, κ and are positive constants controlling the range of λ(x). In our experiments, we observe that c = 2.0 gives the optimum value for the latent fingerprint patterns while parameters κ and are empirically set to 0.5 and 0.01.
3.5. Region-of-Interest Segmentation The variance value acts as a key segmentation feature for rolled/plain fingerprints [1]. As discussed in Section 2, this feature cannot be directly applied to latent fingerprints due to the presence of structured noise. However, after applying the adaptive TV-L1 model, we can decompose the latent image into two parts: 1) cartoon u, which contains both structured noise and small-scale structures, and 2) texture v, which consists of latent fingerprints and a small amount of noise. In the texture output v, most noise with highvariance disappears, allowing us to use the variance feature for segmentation. The probability distribution of the variance feature at the foreground and the background regions are shown in Fig. 6. 193
the proposed adaptive scheme in Fig. 8. Visual inspection shows that the proposed adaptive TV-L1 algorithm provides very satisfactory segmentation results. Although performance benchmarking for latent fingerprint segmentation is desirable, a systematic approach for objective evaluation is to be developed in the near future. (a) Input f
(b) Texture v
Figure 6. Distributions of the variance feature for the foreground and background region in f and v, respectively.
4. Experimental Results In this section, we present experimental results for the proposed TV-based latent fingerprint segmentation algorithm. The experiments were conducted on the NIST SD 27 database, which contains 258 latent fingerprint images.
5. Conclusion and Future Work An adaptive TV-L1 image model was proposed for efficient latent fingerprint segmentation in this work. The adaptive TV model has the ability to decompose a single latent image into layers and locate the essential latent area for feature matching. The weight coefficient of the L1norm fidelity term was adapted to the background noise level, which was automatically estimated using TV-based texture analysis. Some preliminary experimental results were reported to demonstrate the performance of the proposed adaptive TV-based scheme. Automatic segmentation on latent fingerprints is still a relatively new topic. To the best of our knowledge, no performance benchmarking of various algorithms has been extensively conducted and reported in the open literature. This is an important area for future research. Furthermore, the orientation of the ridge pattern in latent fingerprints can be exploited for their extraction. We are currently looking for efficient ways to use this information to achieve better segmentation results.
References (a)
[1] A. M. Bazen and S. H. Gerez. Segmentation of fingerprint images. In ProRISC 2001 Workshop on Circuits, Systems and Signal Processing, pages 276–280, 2001. [2] A. Buades, T. M. Le, J.-M. Morel, and L. A. Vese. Fast cartoon + texture image filters. Trans. Img. Proc., 19:1978– 1986, August 2010. [3] J. Cai, S. Osher, and Z. Shen. Split bregman methods and frame based image restoration. Multiscale Model. Simul., 8, 2009. [4] T. Chan and S. Esedoglu. Aspects of total variation regularized l1 function approximation. SIAM Journal on Applied Mathematics, 65, 2004. [5] T. Chen, W. Yin, X. S. Zhou, D. Comaniciu, and T. S. Huang. Total variation models for variable lighting face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28:1519–1524, 2006. [6] E. Esser. Applications of lagrangian-based alternating direction methods and connections to split bregman. UCLA CAM Report (09-31), 2009. [7] J. Feng, S. Yoon, and A. K. Jain. Latent fingerprint matching: Fusion of rolled and plain fingerprints. In Proceedings of the Third International Conference on Advances in Biometrics, ICB ’09, pages 695–704, Berlin, Heidelberg, 2009. SpringerVerlag. [8] H. Fleyeh, D. Jomaa, and M. Dougherty. Segmentation of low quality fingerprint images. International Conference on Multimedia Computing and Information Technology (MCIT2010), March 2-4, 2010.
(b)
Figure 7. Performance comparison of the non-adaptive TV-L1 scheme and the proposed adaptive model: (a) texture output v by non-adaptive TV-L1 (λ=0.50), (b) texture output v obtained by the proposed adaptive TV-L1 scheme.
In Fig. 7, we compare the texture decomposition result of the non-adaptive TV-L1 model (λ fixed at 0.50) with that of the proposed adaptive TV-L1 scheme. In the proposed scheme, the fidelity weight coefficient λ(x) is automatically adjusted according to the image content, leading to an improved decomposition result. In the fingerprint regions, the proposed model is able to extract all essential textures of fingerprints, despite the presence of varying illuminations in the background. In regions consisting of structure noise, boundary noise adjacent to the fingerprint region is clearly observed in texture v obtained by the non-adaptive TV-L1 model. Two zoom-in images are shown in the bottom part of Fig. 7(a). In contrast, the proposed adaptive model can filter out these boundary signals from texture output v as shown in the bottom part of Fig. 7(b). Finally, we present several segmentation results using 194
Figure 8. Experimental results of some latent fingerprints from NIST SD27 database. From left to right: original image f , fidelity weight λ(x), texture output v and the final segmentation result.
[9] R. Glowinski and P. L. Tallee. Augmented lagrangian and operator-splitting methods in nonlinear mechanics. SIAM, 1989. [10] T. Goldstein, X. Bresson, and S. Osher. Geometric applications of the split bregman method: Segmentation and surface reconstruction. Journal of Scientific Computing, 45(13), 2009. [11] T. Goldstein and S. Osher. The split bregman method for l1regularized problems. SIAM J. Img. Sci., 2:323–343, April 2009. [12] M. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4, 1969. [13] A. K. Jain and J. Feng. Latent fingerprint matching. IEEE Trans. Pattern Anal. Mach. Intell., 33:88–100, January 2011. [14] B. Mehtre. Segmentation of fingerprint images - a composite method. Pattern Recognition, 22(4):381–385, 1989. [15] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total variation-based image restoration. Multiscale Mode. Simul., 4, 2005. [16] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Phys. D, 60:259–268, November 1992. [17] S. Setzer. Split bregman algorithm, douglas-rachford splitting and frame shrinkage. In Proceedings of the Second International Conference on Scale Space and Variational
[18]
[19] [20]
[21] [22] [23]
195
Methods in Computer Vision, SSVM ’09, pages 464–476, Berlin, Heidelberg, 2009. Springer-Verlag. Z. Shi, Y. Wang, J. Qi, and K. Xu. A new segmentation algorithm for low quality fingerprint image. In Proceedings of the Third International Conference on Image and Graphics, ICIG ’04, pages 314–317, Washington, DC, USA, 2004. IEEE Computer Society. X. Tai, J. Hahn, and G. Chung. A fast algorithm for euler’s elastic model using augmented lagrangian method. SIAM J. Imaging Science, 4, 2011. C. Wu and X. Tai. Augmented lagrangian method, dual methods and split bregman iterations for rof model, vectorial tv and higher order models. SIAM J. Imaging Science, 3, 2010. C. Wu, J. Zhang, and X.-C. Tai. Augmented lagrangian method for total variation restoration with non-quadratic fidelity. Inverse Problems and Imaging (IPI, 4, 2011. W. Yin, D. Goldfarb, and S. Osher. The total variation regularized l1 model for multiscale decomposition. Multis. Model. Simul., 6, 2006. S. Yoon, J. Feng, and A. K. Jain. On latent fingerprint enhancement. In Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 7667 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Apr. 2010.