TOWARDS THE AUTOMATED IDENTIFICATION OF FinSL SIGNS

3 downloads 43169 Views 2MB Size Report
TOWARDS THE AUTOMATED IDENTIFICATION OF FinSL SIGNS. Matti Karppa1, Tommi Jantunen2, Markus Koskela3 & Jorma Laaksonen4. 1,3,4 Aalto ...
TOWARDS THE AUTOMATED IDENTIFICATION OF FinSL SIGNS Matti Karppa1, Tommi Jantunen2, Markus Koskela3 & Jorma Laaksonen4 Aalto University School of Science and Technology & 2 University of Jyväskylä E-mail: [email protected], [email protected], [email protected], [email protected] 1,3,4

INTRODUCTION This poster outlines and demonstrates the first experiment on the automated identification of signs from a video containing continuous Finnish Sign Language (FinSL) signing. The work described here forms a starting point for the new CoBaSiL project (Content-based video analysis and annotation of FinSL) that starts in January 2011 and lasts for four years.

THE CONTENT-BASED ANALYSIS OF FinSL VIDEOS Our automated sign identification technique is based on the content-based quantitative analysis of signed language motion from digital FinSL videos. The method includes four main phases, illustrated in Figures 1-5.

Independently, Kanade-Lucas-Tomasi tracking is used to track interest points, providing additional measurements. Currently, we are able to calculate the placement of the selected articulators, the amount and direction of their motion, their acceleration, and velocities. Figure 1: First, the signer’s face is detected on the video.

Figure 2: The second phase identifies skincoloured regions.

Figure 3: In the third phase, the regions are first given identities as body parts (i.e., left or right hand, or head).

Figure 4: As the body part identities are established, also Point Distribution Models (PDMs) are created, describing their shape.

Figure 5: Finally, the movement of the body parts is tracked by fitting Active Shape Models (ASMs) in the video, using the PDMs as bases.

THE AUTOMATED SIGN IDENTIFICATION PROCESS Sign-like sequences are isolated from the video by relying on, for example, the information on ASM-based dominant hand arm orientation, ASM-based dominant hand velocity, and the ASM-based calculation of body part overlap. The preliminary recogniser software is partially integrated with ELAN's video recogniser interface and the results of the segmentation process are viewable in ELAN as automatically generated annotations (Figures 6 and 7).

Figure 6: Two close-ups showing the video recogniser interface and segment creation options in ELAN.

CONCLUSION The results of the experiment (see demonstration) are promising although the recogniser is still at very early stage of development. In general, the results suggest that an approach that combines different types of quantitative information about the motion and placement of the articulators is a good starting point for the development of an automated sign recogniser.. ACKNOWLEDGEMENTS The authors wish to thank Päivi Mäntylä and Päivi Rainò (the Finnish Association of the Deaf). Financial support of the Academy of Finland is gratefully acknowledged.

Figure 7: Two types of automatically generated annotations (SLMotion) in ELAN, calculated on the basis of ASM values representing the degree of left hand overlap with right hand and head, visualized also in track panels.

REFERENCES Cootes T. F. & Taylor C. J. Active Shape Models – ‘smart snakes’. Proceedings of the British Machine Vision Conference (1992). Cootes T. F., Taylor C. J., Cooper D. H. & Graham J. Training models of shape from sets of examples. Preceedings of the British Machine Vision Conference (1992). Shi J. & Tomasi C. Good Features to Track, 9th IEEE Conference on Computer Vision and Pattern Recognition (1994). Tomasi C. & Kanade T. Detection and tracking of point features, Carnegio-Mellon University Tech. Rep. CMU-CS-91-132, Apr. 1991. Viola P. & Jones M. J. Rapid Object Detection Using a Boosted Cascade of Simple Features. IEEE CVPR (2001).