A A A A A AA A A A A AA A A A A AA A A A A AA A A A A AA A A A A AA A A A A A B B B B B B BB B B B B B BB B B B B B BC C C C C C CC C C C C C CC ...
AUDIO-VISUAL BASED PERSON AUTHENTICATION SYSTEM S. Palanivel, Vinod Pathangay and B. Yegnanarayana Speech and Vision Laboratory Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai-600 036, India. Email: spal, vinod, yegna @cs.iitm.ernet.in ABSTRACT This report describes the demonstration of the Audio-Visual based person authentication system. The system records the audio-visual data and splits it into acoustic and visual streams. The spectral features are derived from the acoustic stream and are represented by Weighted Linear Prediction Cepstral Coefficients(WLPCC). The system uses motion information to detect the face region in the visual stream, and the region is processed in YCrCb color space to determine the location of the eyes. The system extracts the gray level features relative to the location of the eyes. Autoassociative Neural Network(AANN) models are used to capture the distribution of the extracted acoustic and visual features. The proposed system can be used to recognize the identity of a test subject in addition to authentication. The performance of the system is invariant to size, and tilt of the face and is also not sensitive to variations in natural lighting conditions. 1. INTRODUCTION Automatic authentication of a human subject is one of the challenging problems in human-computer interaction. The automatic person authentication system has many applications in the area of biometrics, information security, smart cards, access control and video surveillance. The objective of an audio-visual based person authentication system is to accept or reject the identity claim of a subject using acoustic and visual features. The proposed person authentication system consists of three main modules, namely acoustic feature extraction, visual feature extraction and Autoassociative Neural Network(AANN) model for authentication. The acoustic and visual feature extraction techniques are described in Section 2 and 3, respectively. Section 4 describes the AANN model for authentication. The enrollment and authentication procedure are described in Section 5. 2. ACOUSTIC FEATURE EXTRACTION The acoustic feature extraction module extracts the vocal tract system features from the acoustic signal. The differ-
enced acoustic signal is analyzed by segmenting it into frames of 20 msec, using a Hamming window with a shift of 10 msec. The silence frames are removed using an amplitude threshold. A order Linear Prediction(LP) analysis is used to capture the properties of the signal spectrum. The recursive relation between the predictor coefficients and cepstral coefficients is used to convert the 10 LP coefficients into 19 cepstral coefficients. The LP cepstral coefficients for each frame is linearly weighted to form the Weighted Linear Prediction Cepstral Coefficients(WLPCC). The distribution of the 19 dimension WLPCC feature vectors in the feature space for a given subject is captured using an AANN model. Separate AANN model is used to capture the distribution of feature vectors of each subject. 3. VISUAL FEATURE EXTRACTION The accumulated gray level difference image is used to estimate the face region in the image. The estimated face region is processed in YCrCb color space to determine the location of the eyes for each frame. The visual feature extraction module extracts 73 dimension gray level features relative to the location of the eyes, and hence the features are invariant to size and tilt of the face. The detailed description of the visual feature extraction is given in [1]. The 73 dimension gray level features are normalized to the range [-1,1]. The normalized feature vector is invariant to the image brightness. The distribution of the normalized feature vectors is captured using an AANN model. 4. AUTOASSOCIATIVE NEURAL NETWORK MODEL FOR AUTHENTICATION Autoassociative neural network models are feedforward neural networks performing an identity mapping of the input space, and are used to capture the distribution of the input data [2]. The five layer Autoassociative neural network model as shown in Fig.1 is used to capture the distribution of the feature vectors [3].
2
4
DE DE DE DE DE DE E DE E DE E DE E DE E DE E DE EDDEDE Layer 1 D D D D D D 66 7 66 7 66 7 66 7 66 7 66 7676 DE DE DE DE DD EDED 3 7 ED E E E E E D D D D ED 66 7 66 7 66 7 66 7 66 7 66 7766 7 7 7 7 7 7 D E D E D E D E D E D EDED 7 E 7 7 7 7 7 7 DE DE DE DE DE ED E 6 6 6 6 6 66 7676 7 7 7 7 7 7 D D D D D D EDCB E E E E E E B B B B B C C C C C 66 66 66 66 66 DCBE DCBE DCBE DCBE DCBE EDCB 6 6776 77 77 77 77 77 77 DCBC E DCBC E DCBC E DCBC E DCBC E DCBC EDEDED 7 E 5544 7 5544 7 5544 7 5544 7 5544 7 :; : : : : : ; ; ; ; ; ; 6 6 6 6 6 655447 D D D D D D E E E E E E B B B B B B :; : : : : : ; ; ; ; ; 6 6 6 6 6 654%$7 7 7 7 7 7 76%$76%$ D D D D D D D E E E E E E E 4 4 4 4 4 5 5 5 5 5 B B B B B B $ $ $ $ $ C C C C C C % % % % % :: ; :: ; :: ; :: ; :: ;;:: 6 6 6 6 6 6 7 7 7 7 7 D D D D D D EDD E E E E E E 4 4 4 4 4 4 5 5 5 5 5 5 B B B B B B $ $ $ $ $ $ C C C C C C % % % % % % ; ; ; ; ; 6 6 6 6 6 6 7 7 7 7 7 7 7676%$$ D D D D D D ECB E :; :; :; :; :; ; 65544%%$$7 65544%%$$7 65544%%$$7 65544%%$$7 65544%%$$7 65544%%$$7 7 7 7 7 7 DCBECBE DCBECBE DCBECBE DCBECBE DCBECBE EDCB E E E E EDEED! 7 ::9898 ; ::9898 ; ::9898 ; ::9898 ; ::9898 ;:;;::9898 @ @ @ @ @ A A A A A 6 6 6 6 6 6 D D D D D EDCBAA@@! 4 4 4 4 4 4 5 5 5 5 5 5 B B B B B $ $ $ $ $ $ C C C C C % % % % % % 2 2 2 2 2 2 ! ! ! ! ! 3 3 3 3 3 3 @ @ @ @ @ ; ; ; ; ; A A A A A 66554432%$32%$ 7 66554432%$32%$ 7 66554432%$32%$ 7 66554432%$32%$ 7 66554432%$32%$ 7 66554432%$32%$ %76%$7766%$%$ DDCBA@!CB! E DCBA@!CB! EEDD!! 7 EE EDCBA@!CB! E EDCBA@!CB! E EDCBA@!CB! E EDCBA@!CB! E E :98; : : : : : ; ; ; ; ; ; 7 7 7 7 7 7 8 8 8 8 8 D D D D D 9 9 9 9 9 :98; :98; :98; :98; :98; ED!ED 7 A@EDCBA@ E 65432%$7 65432%$7 65432%$7 65432%$7 65432%$7 65432%$7 DCB!A@DCBA@! E DCB!A@E DCB!A@E DCB!A@E DCB!A@E EDCB! ::9898 ; ::9898 ; ::9898 ; ::9898 ; ::9898 ;:98;;::9898 @@!EDB?> E @@!EDB?> E @@!EDB?> E 665432%$5432%$ 7 665432%$5432%$ 7 665432%$5432%$ 7 665432%$5432%$ 7 665432%$5432%$ 7 665432%$5432%$ 76%$7766%$%$ DCBAA DCBAA DCBAA DCBAA@@!E @E ; ; ; ; ; A@! A 7 7 7 7 7 DDCCBBA@??>>!! E EE :98; :98; :98; :98; :98; ; ; ; ; ; ;:98;:98 !EEDD!ED! 7 6541%$7 6541%$7 6541%$7 6541%$7 6541%$7 6541$65433221%$ 76%$76%$ 7 7 7 7 7 7 ?EDDCBA@?>>!E ?DCA@>!E ?DCA@>!E ?DCA@>!E ?DDCBA@?>>!E E E E E B B B B B C C C C C 2 2 2 2 2 ! ! ! ! ! 3 3 3 3 3 1 1 1 1 1 :98; : : : : @ @ @ @ @ @ A A A A A A 6 6 6 6 6 8 8 8 8 > > > > > > D D D D D D 9 9 9 9 ? E ? ? ? ? ? 4 4 4 4 4 5 5 5 5 5 B B B B B B $ $ $ $ $ C C C C C C % % % % % 2 2 2 2 2 ! ! ! ! ! ! ! 3 3 3 3 3 1 1 1 1 1 17 ::9988--,, ; ::9988--,, ; ::9988--,, ; ::9988--,, ; ::9988--,, ;;::9988--,, 6654321%$5432%$ 77 6654321%$5432%$ 77 6654321%$5432%$ 77 6654321%$5432%$ 77 6654321%$5432%$ %7 6654321%$5432%$ 7766%$%$ DDCBAA@@??>>!CB! E DDCBAA@@??>>!CB! E DDCBAA@@??>>!CB! E DDCBAA@@??>>!CB! E DDCBAA@@??>>!CB! EEDD!! 77 EEDDCBAA@@??>>! E ; ; ; ; 7 E E E E E B C ! 1 1 1 1 1 :98-,; : : : : : @ @ @ @ @ @ ; ; ; ; ; ; A A A A A A , , , , , 6 6 6 6 6 6543210%$7 7 7 7 7 7 7 8 8 8 8 8 > > > > > > 9 9 9 9 9 ? ? ? ? ? ? 4 4 4 4 4 5 5 5 5 5 ; B B B B B B $ $ $ $ $ C C C C C C % 0 % 0 % 0 % 0 % 0 2 2 2 2 2 ! ! ! ! ! ! ! 3 3 3 3 3 1 1 1 1 1 ::98-,98-, ; ::98-,98-, ; ::98-,98-, ; ::98-,98-, ; ::98-,98-, ;;::98-,98-, ?>! @ @ @ @ A@?> ? A@? A A A A 6 6 6 6 6 65432110%$0)(% 76%$76%$%$ 7 7 7 7 7 > > > > > ? ? ? ? 4 4 4 4 4 5 5 5 5 5 $ $ $ $ $ 0 0 0 0 0 % % % % % 2 2 2 2 2 ! ! ! ! ! ! 3 3 3 3 3 1 1 1 1 1 ; ; ; ; ; > > > > > ? ? ? ? $ ( $ ( $ ( $ ( $ ( $0)(% ) 0 ) 0 ) 0 ) 0 ) 0 % % % % % ! ! ! ! ! ! ! :98-,; : : : : : ; ; ; ; ; ; , , , , , & & & & & & ' ' ' ' ' ' 8 8 8 8 8 9 9 9 9 9 $ ( $ ( $ ( $ ( $ ( $ 0 0 0 0 0 % ) % ) % ) % ) % ) ! ! ! ! ! ! ! :98-, : : : : : ; ; ; ; ; , , , , , & & & & & & ' ' ' ' ' ' 8 8 8 8 8 9 9 9 9 9 ( ( ( ( ( ( $ $ $ $ $ $ % ) 0 % ) 0 % ) 0 % ) 0 % ) 0 % ) 0 ! ! ! ! ! ! !
%%$$%$ ::98--,,;8.9 ; ::98--,,98. ; ::98--,,98. ; ::98--,,98. ; ::98--,,98. ;;::98--,,98. '&! ' '& !! 0 $0)(% )(%$)(%$ 0 )(%$0 )(%$0 )(%$0 )(%$0 !'&&! ' !'&&! ' !'&&! ' !'&&! '&! ; ; ; ; ; '&! ((%$ 0 ((%$ 0 ((%$ 0 ((%$ 0)(% )) )) )) )) '& 8//.-, 8//.-,/ 8//.-,/ 8//.-,/ 8//.-,/ 9 9 9 9 9 98//.-,/. () ) )($)( %$ !! &' ! &' ! &' ! &' ! '&! ! !! 0 & & & & & & ' ' ' ' ' ' ./ . . . . ( ( ( ( ( ) ) ) ) ! ! ! ! ! ./ / . / . / . / . /. Compression Input Layer Layer
5
PQ PQ PPON Q PPON Q PPON Q PPON Q PPON QPQP Q Q PONQPONIH Q PONQ PP PP PP PP PP PP Q Q Q Q Q ONIHQONIH Q ONIHQONIH Q ONIHQONIH Q ONIHQONIH Q ONIHQONIH Q ONIHQONIH QPPIHQIH P QQPOONNIH P P P P P PONPONHIH QPQPIHH Q Q Q Q Q Q N N N N N O O O O O PONIIHHQONIH PONIHIHQ PONIHIHQ PONIHIHQ PONIHIHQ PONIHIHQ Q IQ IQP PONMLKJIH Q ONIHPONMLKJIH Q ONIHPONMLKJIH Q ONIHPONMLKJIH Q ONIHPONMLKJIH Q ONIHPONMLKJIH Q P P P P P PONIHPQ QP Q Q Q Q Q Q P P P P P POONNMMLLKKJJIIHH=