IEEE/OSAIIAPR International Conference on Informatics, Electronics & Vision
A Comparative Study of Signature Recognition Problem Using Statistical Features and Artificial Neural Networks Mahabub Akram
Romasa Qasim
EECS, North South University Dhaka, Bangladesh
[email protected]
EECS, North South University Dhaka, Bangladesh
[email protected]
Abstract-
This paper summarizes a research effort for an off
line signature recognition system. Neural network is used to address this problem because the learning and generalization abilities of NNs enable them to cope up with the diversity and the variation of human signatures. Since neural network have proven performance in other pattern recognition tasks such as character recognition
therefore,
it
is
equally
suitable
for
the task
of
signature recognition. In this paper we present a comparative study of signature recognition comprises of three different neural networks i.e., feed-forward-back propagation neural network, competitive and probabilistic neural network. We have proved by experiment and analysis that probabilistic neural network is best suited to deal with signature recognition problem with an average of 100% accuracy.
Keywords-feed-forward back-propagation; entropy; kurtosis; precision;skewness; INTRODUCTION I. Machines have taken many of the tasks performed by humans from as simple as adding few numbers to as complicated as recognizing human signatures. Human signature is a biometric measure of person's identification. Many sectors like banks, official documents, receipts etc. use handwritten signature to secure and identify concerned person. This is because the probability of exact duplication of biometric measure is low as compared to other security measures like password or personnel identification number. Areas like banks, deals with hundreds of different signatures daily and made transactions after the verification of signature. To avoid human error and to maximize the efficiency and speed of task researchers worked on automating handwritten human recognition problem. This problem is unique and interesting because each individual has his own signature different with others but a person cannot do exactly the same signature every time. According to Justinoet. Et Al [1] that signature verification problem aims to maximize the interpersonal differences and minimize the intrapersonal differences. There are two popular techniques for the recognition of signatures; online [2], [9] and offline [3]. This paper used offline technique of recognition in which only scanned images of signatures are available. This paper uses ofiline signature recognition technique because not each area dealing with signatures can afford the facilities of dynamic signatures. This
978-1-4673-1154-0112/$31.00 ©2012 IEEE
M Ashraful Amin SECS, Independent University, Bangladesh, Dhaka, Bangladesh
[email protected]
paper uses different neural networks to recognize 40 individuals' signatures. Main emphasize will be to observe the behavior of different neural networks and to end that which neural network can optimally perform this task. A.
Related Work
Signature recognition is a complex classification problem which deals with the variation among same class signatures and differences one signature with another. Researchers have already performed rich amount of work to solve this problem. KshitijSisodia and Mahesh Anand [4] solved this problem using back-propagation network, they extracted 14 features. Neural network was trained with these features for both coarse and fine training sets and the system was successfully tested with 94.27% efficiency. Moment invariant method is a new technique introduced in [5], in which scanned images of 30 individuals (6 samples each) were normalized and digitalized and then its moment invariant had been calculated. Seven values of moment invariants along with seven other global variables were presented to multilayer feed-forward back-propagation network and 100% result was achieved. Research done in [6] uses global features, grid information features and texture features. And these features are used in the artificial neural network to verify and recognize image with One Class One network classifier. The effort made in [7] deals with an automated method of verification by extracting features that characterize the signature. For this signature recognition and verification research, four main features (eccentricity, skewness, kurtosis, orientation) will be extracted. The structure of this neural network is multi-layer feed forward. An artificial neural network based on back propagation algorithm is used in [8] for recognition and verification. The system was tested with 400 test signature samples, which include genuine and forgery signatures of twenty individuals. The skeletonized image is divided into 96 rectangular segments (12 x 8), and for each segment, the area (the sum of foreground pixels) is calculated. A false rejection ratio of less than 0.1 and a false acceptance ratio of 0.2 were achieved in this system.
ICIEV 2012
IEEE/OSA/IAPR International Conference on Infonnatics, Electronics & Vision II. A.
METHODOLOGY
•
Data Collection and Preprocessing:
Total 4000 signature samples are collected from 40 individuals (100 samples from each individual). In the preprocessing step, these signatures are scanned at resolution of 600 dpi, 8-bit gray scale mode. After scanning, the images are cropped because each individual filled two A4 sized pages each with 50 signatures. Cropping of signature images are followed by its resizing because 600 dpi resolution takes more time to process, so the size of cropped images are reduced to 266x86 each signature.
Variance is the second central moment about its mean and it is used to find the deviation from expected grayscale pixel value of an image. Second order moment is calculated using the formula below: 2 2 Var(X) E(X )-[E(X)] (2) =
•
Skewness
is an asymmetry measure of pixel's grayscale value's probability distribution. Since the distribution of pixel values is not symmetric therefore, it must contain a non-zero value. Mathematical representation of skewness is given below: (3)
B.
Feature Extraction:
After the preprocessing of signature images, following features are extracted from each sample: • Aspect Ratio is the ratio of signature width to height. Signature Width · Aspect R atlO -=-----(I) Signature Height
It is the measure of lopsidedness of the distribution (Fig 3). It is clear from the figure below that the skewness of each individual is different from the other.
=
There exists intrapersonal vanatIOn in signature which depends on different pen or paper, time etc. But, in spite of these variations, it is observed that the width to height ratio of person's signature is consistent. So, this feature is a useful information to train the neural network because most of the time, aspect ratio varies from person to person and remains nearly consistent for a particular person.
oL-- -6
Fig 3: Skewness of two different signatures just below the histogram. •
Fig 1: Width an 1elgnt of green boxes gI ves t e va ue of aspect ratio; this signature sample has width almost 4 times more than its which is reflected by the values i.e., 3.8525 (Signature Courtesy: Neil Heyes) •
Kurtosis
is a measure of whether the distribution is tall and skinny or short and squat, compared to the normal distribution of the same variance. Kurtosis can be mathematically defmed as:
Slant Angle
is the angle of inclination of an imaginary line, on which is signature is assumed to rest, to the x-axis (fig. 2). It is observed from the collected samples of signatures that this angle doesn't deviate much.
k
To calculate slant angle, a regression line is fitted to the points near to x-axis. Angle of this fitted regression line with x-axis is the slant angle.
926
X
E[(
4
- .u) ) ] 0.4
•
Horizontal and Vertical shifts
•
Entropy
1 �.· 12.1970 Fig 2: Slant Angle of signatures. Box below each signature shows slant angle of this signature. Consider the blue line, the angle between this baseline with the horizontal is the slant angle. It can be seen that slant angle of this signature is very small which is 2.197 [Signature Courtesy: Reimy Booth I
=
(4)
are calculated by slicing the image horizontally into four parts and by summing shifts from black to white or white to black image. For vertical shifts, image is to be sliced vertically. This information is another distinguishing property of signature because the chances of two signatures having exactly same shift numbers are very low.
considered here is the Shannon's entropy used to mathematically quantify the degree of uncertainty of pixel values.[10] n
H(P), P2,· ..· . . . . �) =
-
KIF; lnF; ;=1
ICIEV 2012
(5)
IEEE/OSA/IAPR International Conference on Infonnatics, Electronics & Vision K is a constant of Shannon's entropy which is 1 in this case and p is the histogram of pixel values with 256 bins; each bin represents the grayscale value. Joint Entropy
is the measure of uncertainty that two event occur simultaneously. It can be represented mathematically as:
•
(6)
Where p(x,y) is the probability of occurrence of same locations' pixel value of two signature sample. Joint entropy usually contains more information than entropy. In this paper, joint probability of two samples of same individual's signatures is taken as a feature. Mutual Information
can be calculated, given the entropy of two images and their joint entropy, using the equation below:
•
I(X, Y)
=
H(X) +H(Y) - H(X, Y)
(7)
Where H(X), H(Y) are entropies of two images and H(X, Y) is the joint entropy of these images. This feature is inversely proportional to joint entropy. This feature can help specially when there is low contrast due to scanner. Because if the contrast is low the joint entropy is also low hence the overall information is low. C.
Neural Network As Classifier:
All the above features are concatenated in a single column vector for each signature sample. So, the input size is 16x3600 for 90 samples of each individual. Neural Network is used as a classifier because of its proven classification performance for highly mingled data.
Signature classification has same problem that the cluster of individual's signature is not clearly distinguishable (fig 4). For this kind of input space, neural networks require enough training. Therefore, the input space is divided into two parts, one part is to train the neural network and the other part is to test it. This input if fed to three different neural networks to analyze which one is most suitable to address signature recognition problem. Below is a brief discussion of these neural networks: D.
Feed-forward Back-propagation:
The back-propagation algorithm is a generalization of the least mean squared algorithm that modifies network weights to minimize the mean squared error between the desired and actual outputs of the network. It is not guaranteed that this neural network always converges to the solution because it uses gradient decent algorithm which may converge to local minima. E.
Competitive Neural Network:
All the neurons compete with each other to fmd a winner. Neuron with maximum output wins the game. F.
Probabilistic Neural Network:
This network perfonns classifications where the target vector is categorical. Each neuron considers its clusters neurons and then decides its category. This network is a type of radial basis network and is guaranteed to classify input correctly provided with sufficient training inputs. Performance of the neural networks discussed above is measures on the basis of its accuracy, precision [11] and the time required to perform training and testing of signatures. Accuracy and precision are calculated using number of true positive (TP), true negative (TN), false positive (FP) and false negative (FN):
8
TP+TN - - -+- -N - - -TP T +FP+F-N TP \ .. Precision( %, = --TP+FP
Accuracy (01) =
;.'0
7 6
f
w
5
(9)
4
Where,
c
�
(8)
3 2
20 10
0 2
1.5
0.5 Mutual Infonnation
o
o Aspect Ratio
Fig 4: 3D scatter plot of features (aspect ratio, mutual information, joint entropy) extracted from signatures showing the distribution of these features in space.
927
TP: number of correctly predicted positive samples FN: number of wrongly predicted positive samples FP: number of wrongly predicted negative samples TN: number correctly predicted negative samples Overall network's accuracy and precision are obtained by taking the average of individual's signature testing accuracies and precisions.
ICIEV 2012
IEEE/OSA/IAPR International Conference on Infonnatics, Electronics & Vision III. RESULTS AND DISCUSSION Effort is made to recognize signatures using different neural network and to fmd a set of parameters where the performance of neural network is the best. To attain this goal, simulation is performed by varying the parameters of the neural networks discussed above. In competitive neural network the only adjustable parameter is its learning rate. There is no fixed rule of adjusting learning rate but a rule of thumb is to start learning rate with lower value and increase it gradually to find a learning rate where performance is maximum. So, analysis is done by varying the learning rate with fixed number of iterations i.e., 150 epochs which is sufficient enough for weight vectors to get trained properly. This analysis is summarized in the Table I. TABLE 1. INCREASING LEARNING RATE WITH PRECISON & ACCURACY
Learning Rate
Precision
0.04
46.75
97.3375
0.06
45.00
97.25
0.08
49.00
97.45
0.1
50.75
97.5375
0.12
41.75
97.0875
0.14
42.75
97.1375
0.16
40.00
97
Accuracy
number of layers and number of neurons in each layer can be varied according to the requirements and hence are adjustable. System is tested first with single hidden layer by increasing number of neurons. It is observed that network accuracy and precision increases with increasing number of neurons from 20 to 200. 100
80
� c 0
...
'13 � D"0 C
'" (;'
60
40
"-
J
�
r--
----V-
/'
r---/"
/�V
- Accuracy
I·
Precision
/
"--,/"
Ij
� :::J U U
:
20
o
20
40
60
80 100 120 140 Number of Neurons
160
180
200
Fig 6: Plot of accuracy and precision of feed-forward back-propagation neural network with single hidden layer 100 �--�----�--�-,
-4
r----t�����t-��-----r��� �
-8
�--��� �----+b��---'��__+--1 ��-+--=-�----�--+4�--�--�+--4
- 16
-12 -20 -24
The average accuracy of competitive neural network is ""'97% which seems to be good but its precision is very low. 100 90 ::R
80
c:: 0
70
e..... '(ii 'u �
O L---�----�--�� 20 40 60 80 100 120 140
Number of Neurons in first Layer
(a)
60
100 ,.-------,---r--,--,
Q. 50 "0 c:: ro 40 >U � 30 :J U U 20
4:
..-----
90 r----+--�--r---�--_r--+__1
�
-4 -8 -12 -16
1- Accurac�
10
I
0.06
0.08
0.1
0.12
Precision 0.14
-20 -24
�
0.16
Learning Rate Fig 5: Accuracy and precision of competitive neural network
Time complexity is high in competitive network as It IS taking minimum 13 minutes to train and test this network. It is worth mentioning here that this time is for just 150 epochs and with 40 individuals. Time complexity will increase with increasing number of individuals and epochs. Feed-forward back-propagation neural network is more flexible as compared to competitive neural network in terms of its adjustable parameters. In back-propagation neural network,
928
0 L---�----�--�_1 20 120 80 100 60 140 40
Number of Neurons in first Layer
(b)
Fig 7: Accuracy (a) and Precision (b) of feed-forward back-propagation neural network with two hidden layers; legends show the number of neuron in second layer of feed-forward back-propagation neural network
ICIEV 2012
IEEE/OSA/IAPR International Conference on Infonnatics, Electronics & Vision Each neuron in this network draws a line and divides the input space into two. With more number of neurons, network has more lines to set the boundary across each cluster of signature space. That is why with increasing number of neurons its perfonnance is better. This justification seems true for two layer hidden neural network. Adding a hidden layer is not very helpful in this case. With 4 neurons in second layer and varying number of neurons from 20 to 140 in first layer the accuracy of network is between 20 - 50%, which is not good. Network performance is far better with 24 neurons in second layer and 140 neurons in first layer. So, number of neurons has significant impact on the performance of feed forward back-propagation neural network. Network's precision with more neurons is better but it is not reliable for this particular problem of signature recognition because this problem requires high precision along with high accuracy. Table II shows the summery of experimental results achieved from probabilistic neural network. In probabilistic neural network, its spread is a feature which can be varied. It is observed that increasing the spread value beyond 0.1 results in slight reduction of accuracy. Still the performance of probabilistic neural network is far better than any other neural network both in terms of time and accuracy. Average precision obtained from PNN is 99 percent which makes this network reliable for signature recognition problem. TABLE II. SUMMERY OF PROBABILISTIC NEURAL NETWORK'S PERFORMANCE
ii 1 Layer 8ackprop. 2 Layer 8ackprop. ii i
o
100
200
300
400
500
Time (seconds)
Fig 9: Analysis of time comparison between Competitive, feed-forward back propagation and probabilistic neural network CONCLUSION
IV.
A comparative study of offline signature recognition based on geometrical and statistical features using different neural networks is performed in this paper. Features used for this experiment are mostly statistical. Features selection is made such that it can distinguish two individuals and can train neural networks well. Signature recognition problems need accurate, precise and fast solution. Experiments show that probabilistic neural network is the best and reliable to address the signature recognition based on these feature sets. Other neural networks like feed-forward back-propagation or competitive also gave better accuracy but their accuracy level deviates a lot. Also, these networks did not prove it selves to be precise enough to be acceptable. In terms of time, again the probabilistic neural network is the fastest. It takes pretty much for competitive neural network to train itself. That is why these networks do not suit for this problem.
Spread
Precision
0.005
100
100
0.01
100
100
0.05
100
100
0.1
100
100
0.5
100
100
[2] R. Plamondon, G. Lorette, Automatic signature verification and writer
I
99.75
99.9875
identification - The state of the art, Pattern Recognition 22, No. 2, pp. 107-
1.5
98
99.9
[3] R. Sabourin, J.P. Drouhard: Off-line Signature Verification Using
Accuracy
REFERENCES [I] E.J. Justino et ai, "The Interpersonal and Intrapersonal Variability Influences of Off-Line Sign.Verification using HMM", XV Brazilian Symp.on Compo Grap.s and Image Proces. (SIBGRAPI 2002), 2002.
131 (1989) Directional PDF and Neural Networks. ProC. Of IIlh IAPR, VoUI, IEEE
110
Computer Society Press, pp. 321-325 (1992)
100
�
90
C o
80
:�Q)
a
7
c:
6a
'0 C Cl)
5a
[) Cl) 5
4a
::J.
2a
tJ
[4]
MahesAnand,
"Oftline
Handwritten
Signature
Journal of Recent Trends in Engineering, vol 2, No. 2, November 2009 [6] Papunzarkos,N., Baltzakis,H.(1977), "off-line signature verification using mUltiple neural network [7]
classification structures", IEEE.
Suhail,M.O,Manal,K.(2011),"Off-line
signature
verification
and
recognition: Neural Network Approach",IEEE vol-II
-Accurac;
a
Precision 0.2
0.4
0.6
0.8 Spread
1.2
f
1.4
Fig 8: Performance measure of Probabilistic Neural Network
Time comparison analysis is shown in fig 9. It can be seen the figure that time complexity of competitive neural network is very high. In contrast, the time complexity of probabilistic neural network is pretty low.
929
and
[5] Cemil OZ et ai, "Signature Recognition and Verification with ANN"
3a
a
KshitijSisodia
Verification Using Artificial Neural Network Classifier", International
[8] Karki,M.V,Indira,K.,Selvi,S.S.(2007),"Off-Line Signature Recognition and
Verification
using
Neural",TEEE-lnternational
Conference
on
Computational Intelligence and Multimedia Applications 2007. [9] Dariusz Z. Lejtman and Susan E. George, "Online handwritten signature verification
using
wavelets
and
backpropagation
neural
networks",
Proceedings, Sixth International Conference on Document Analysis and Recognition, 2001. [10] Ding, J. & Zhou, A. (2009).
Systems (J st ed.).Springer
Statistical Properties of Deterministic
[II] Tan,P.N., Steinbach,M., Kumar,V.(2006). New York,NY: Pearson Education
introduction to Data Mining.
ICIEV 2012