Pattern Recognition and Synthesis for Sign Language Translation

Pattern Recognition and Synthesis for Sign Language Translation System Masaru Ohki, Hirohiko

Sagawa,

Tomoko

Sakiyama,

Hiromichi

Fujisawa

Eiji Oohira,

Hisashi

Ikeda and

Central Research Laboratory, Hitachi, Ltd. 1-280, Higashi-koigaktrbo,

Kokubunji-shi,

example, when a hearing-impaired person goes to a hospital, he/she generally asks a sign language interpreter to go with him/her. However, in an emergency, it is often difficult to quickly find a sign language interpreter. Therefore, there is a great need for hearing-impaired people to be able to communicate directly with hearing people without an interpreter. To meet this needs, we have been studying a sign language translation system which translates from JSL into Japanese, and vice versa. Our sign language translation system consists of two subsystems as follows:

Abstract is means Sign language one of for hearing-impaired communication Words and sentences in sign people. language are mainly represented by hands’ In this report, we show a sign gestures. language translation system which we are developing. The system translates

Japanese sign language into Japanese and vice versa. In this system, hand shape and position data are inputted using Inputted hand motions are DataGlove. recognized and translated into Japanese sentences. Japanese text is translated into represented as 3-D sign language of sign computer-graphic animation language gestures.

1

(1) Sign Language - Japanese Translation Subsystem This subsystem translates from JSL into Japanese. The subsystem reads hand shapes and motions representing sign DataGlovet [16]. It language using recognizes inputted hand motions and translates them into Japanese.

Introduction

of Sign language is the usual method communication for hearing-impaired people [1,5,6,7]. It is a natural language for them and is not a burden. Basically, sign language is a way to communicate information through hand gestures and body actions. Japanese sign language(JSL) is different from Japanese in that the means of representation are different and the vocabulary and grammar are different. Because sign language is not commonly known among hearing people, they generally communicate with hearing-impaired people through sign language interpreters. For ——

——..-

(2) Japanese - Sign Language Translation Subsystem This subsystem translates Japanese into Sign language is displayed as JSL. animated sign language gestutes using threcdimensional computer graphics. Sign language has been researched from many angles [2,3, 14,15,17]. Our system is one of the systems which allows two-way communication between sign language and voice language [8,10,11,12,13]. We believe that sign language animation is an excellent medium for presenting information to heating-impaired people and it has many poten[ial application domains in addition to the translation system. Many hearing-impaired people in Japan

-

Authors’ c-m~il mJdrcsscs: [email protected], hsaguwati)cr l.hi[uchi.co.jp, tomoko@Jcr l.hitachi.co. ~p, k-ikcdis@crl .hitachi.co.Jp, ohhira@crl .hi[achi.co.jp, fujisawa6Jcrl. hi[achi.co.jp

Perrnisalon to y wlthwf faa all or part of this rnaterkd is dad% t the ooptea are not made or dlstribuki for %%%%rctal tin the ACM eopyrtght nottea anctthe Utleof the pubtleatlon and date thatoopyt Isbyparrnlaalon cdthe%XK!n’%&r$%~ Machinery.Y o copy otherwlee, or to republlsh, requires a fee Nw%?%!Rl!%w# ‘@ 1994 ACM 0-S9791-S49-

CA,,.

W1O..$3.5O

Tokyo 185, Japan

1DataGlove is a trademark of VPL Research, Inc.

1

Table 2.1: Number of handicapped people in Japan (in thousands). Totat

Items

354

Deaf and

2nd Rank 90

1st Rank 28

3rd Rank 65

Over 4th Rank 171

Speech impediment Blind Physical handicap

307

107

66

30

105

1,460

186

291

246

735

*The 1st Rank specifies the most heavy disability.

usually use sign language, would often prefer to receive information in a sign language format and would be more responsive to that information. Fullherrnore, it would improve comprehension for some people who use sign language as a first language. This is highly likely for people in Japan who am born hearing impaired or lose their hearing before nine years because, as some interpreters claim, they think in JSL and Japanese is like a foreign language to them. We therefore believe that sign language animation is a medium which will enhance participation in society by the hearingimpaired people. In this paper, we outline the situation for hearing-impaired people in Japan. Then, we present sign language recognition and the synthesis of our sign language translation system.

The primary means of communication for hearing-impaired people are reading lips and conversation by writing, in addition to sign language, as shown in Table 2.2 [6]. In the United States, the idea of total communication using all means and media was proposed about twenty years ago. We think it is a good idea. It is not suitable to limit any one of the means in Table 2.2, if we follow the idea. We should use different means according to the situation. Following the idea of total communication, we should use several media like characters, images, and sign language to communicate information to hearing-impaired people, even if communication is only one way. We think that sign language animation can be useful as one information media for hearing-impaired people.

2

2.2 Disability

There are about 350,000 hearing-i mpaited people in Japan as shown in Table 2.1. Here, the first rank represents the most severe cases [18].

In the United States, “Access to information Technology by Users with Disabilities, initial Guidelines” was completed in 1987 and it was issued as a government ordinance in the following year. ln 1990, the “ Americans with Disabilities Act (ADA)” aimed at eliminating discrimination against the disabled in employment, transportation, building access, and communication.

Situation for Hearing-impaired People and Disability Policy in Japan 2.1 Situation for hearing-impaired people

policy

Table 2.2: Characteristics of communication methods for hearing-impaired people, Sign Language

I Can receive speech in relax way I Canimrnediately

I Lip reading

understand speech

Long-time communication is not tiring Possibility for misunderstanding

because there are fewer words

Dialect Problems Difficult to read lips becnuse many lip shapes tare simil,ar

I Difficult to read words that are not familiar Used by many deaf people Conversation by

Suitable to inform of important speech

writing

Necessary to write easily

2

Meanwhile, in 1988 in Japan, the guidelines were studied by the Committee for Humanity Electronics of the Japan Electronic industry Development Association. The committee’s guidelines were then published by the Ministry of International Trade and Industry as “Information Processing Equipment Accessibility Guidelines. ” [4] In Japan, sign language has only recently begun being taught in hearing-impaired schools. Previously, only reading lips and vocal pronunciation were taught. Sign language is gaining popularity because reading lips and vocal pronunciation are not easy. Sign language is commonly used in welfare regions. In Japan, the authorization system for sign language interpreters was set up in 1988, and sign language testing was conducted in 1989. JSL is now regarded as the primary language of hearing-impaimd people.

3

Sign Language

(5) No particles are used. A typical JSL does not use particles, but uses relations between spatial posi[ions. Japanese uses particles to indicate the cases of words in sentences. To desclibc “A man give a book to a woman” in JSL, first m-w represents “man” with the right hand (its sign is the hand standing only on the thumb). Next, one represents “a woman” (the hand standing only on the little finger) with the left hand, and represents “a book” (bring both hands and together open the hands) at the position of the right hand with which one represented “a man”. Then. one moves both hands from the right to the lefr. The movement of “a book” from the right to the left represents the idea that the book moves from “a man” to “a woman”. The form of representation and the overall language system in JSL and Japanese are different, as mentioned above. Therefore. the sign language translation system needs to translate the differences in the languages as well as transform the form of repttxentation as a pattern. Figure 3. I shows the composition of the sign language translation system currently being developed. The translation system consists of the sign language - Japanese subsystem that translates JSL into Japanese, and the Japanese - sign language subsystem that translates Japanese into sign language. The sign language - Japanese translation subsystem consists of the sign language recognition part that recognizes sign language gestures, and the sign language - Japanese translation part that translates recognized gestures into Japanese. The sign language recognition pan recognizes hand motions inputted from DataGlove. The Japanese - sign language translation subsystem consists of the Japanese - sign language translation parl which translates Japanese into JSL, and the sign language synthesis part that displays JSL as animation using three-dimensional computer graphics. In the following sections. we will describe the sign language recognition and synthesis part.

Translator

The features of JSL are as follows: (I) TI$timjn idea is transfened with hand (2) Facial expressions artd gestures a~ linguistic components, Facial expressions and gestures provide important information in sign language. For example, a chin is often moved in addition to hand motions in order to recognize a particular sign language word. (3) The vocabulary of JSL is different from that of Japanese. For example, in sign language, the word “summer” is equivalent to “hot”. These two words are distinctly different in Japanese. On the other hand, the sign language word for “drink”, as in “drink a cup of tea”, is different from the word for “drink” as in “drink a cup of beer”. (4) One motion can represent more than one idea. A compound idea can be represented all at once by representing separate ideas with the right hand and the left hand. For example, “1see a flying airplane” can be represented by one motion.

4 Sign Language Recognition 4.1 Subjects in sign recognition 2The particles in Japanese are similar to the prepositions in English.

3

Japanese - Sign Language Trmalatirrn Suhaystem .4

-~”

Sign Language - Japanese Translation Subsystem

R“& \ IF’

I

u Character

Head very

+

Ache

Voice

DatmGl(wc

Figure 3.1: Sign language translation system. There are a few problems recognition:

related

to sign

( 1) Recognition of hand motions (2) Recognition of facial expressions and gestures (3) Real time recognition It is necessary to interpret in real-time in sign language translation systems. Therefore, it is necessary to recognize hand motions, facial expressions and gestures in Ital-time. (4 Continuous sign language recognition To recognize continuous sign language, the system needs to automatically recognize pauses between sign language words. (5 Recognition of unspecified sign language talker Unspecified sign language talker recognition means recognizing the sign language of any person, an unspecified It is equivalent to talkerperson. independent recognition of audio signals, and independent speaker recognition in voice recognition.

4.2 Sign language

recognition

Our sign language translation system uses DataGlove to input hand motions. DataGlove is popular for virtual reality. There are two optical fibers per finger in one DataGlove, making a total of ten optical fibers. The angle of a finger is determined by detecting the change in the ratio of light going through the fiber due to a bend in the fiber. The position of a hand is detected by a The data obtained from magnetic sensor. DataGlove includes the bending of fingers, the position of a hand, and direction of a hand. Sixteen pieces of data, a total of 32 for both hands, can be obtained. These data are inputted 30 times per second as an operation pattern. There is a problem, however, in that it is difficult to recognize hand motions in a short time because the quantity of data coming from DataGlove is too much. To solve this, we extract the features of hand motions before comparing input patterns to standard patterns. We speed up the process of pattern matching by reducing the amount of data by feature extraction. (1) Feature extraction We analyze the patterns inputted from DataGlove, and extract the feattmx of hand motions in sign language. By obtaining featured patterns of the hand motions, we can compress thc data for the input Pa[[crn. (2) Pattern matching

start

Sign language patterns arc expressed by Ihc length of time between feature points. We select the following features as the features of dynamic patterns.

End

Hand Motion

(1) The position at minimum velocity (2) The position when the change in vector direction exceeds a threshold Let’s examine the method of feature extrwtion. Figure 4.1 shows a sign in which a circle is formed in front of the chest. This is the sign for The hand motion from “body condition”. DataGlove changes it, as in Figure 4.2, into the points X, Y, and Z. Data relating to hand position and shape are sent 30 times a second from DataGlove to the computer. The feature points based on the positions aI minimum velocity are the minimum values in the position co-ordinates, Using this method, we can extract the features of basic hand motions, as shown by black points in Figure 4.3.

Figure 4.1: Japanese sign language for body condition. We match the input patterns to standard patterns, which were registered in advance. We use DP (Dynamic Programming) matching for pattern matching. DP matching is usually used in the field of speech recognition [9]. We chose this method, because pattern of data from DataGlove is similar to speech patterns. We compress the standard patterns in addition to input patterns by featttte extraction.

4.3 Feature

However, with this method we cannot extract the features of hand motions drawing arcs, Therefore, we select the position when the-change in vector direction excee-ds a threshold as anoth& feature point. Figure 4.4 shows these feature points, which are s~ecified as t 1 and t3, in the case of a circular arc. ‘

extraction

We can classify hand motions in sign language into the following types: (I) Static patterns like finger characters (2) Dynamic patterns

x Positions Right Hand

.-. -.,-.,-. -,-.-

-----

Y t

z Directions

.

. { Shapes Figure 4.2: Motions of right hand before feature extraction

Right Hand

(]

t

z

Directions Shapes

Figure 4.3: Motions of right hand after feature extraction.

5

‘otiono’”a~, “:’ !zkmYd Fig. 4.4: Resul of feature extraction. The pattern matching process uses the patterns obtained from standard patterns and input patterns by extracting features, as shown in Figure 4.5. There are few points used to match patterns, because both patterns are compressed. This reduces the recognition time. Figure 4.6 shows the effects of feature extraction. The recognition ratio in the case without feature extraction is I00% among I7 sign language words. The recognition time is 1.23 seconds per word wit h a HP90W720 UNIX workstation. The recogn ition ratio in the case that performs the feature extraction is 97.3910, and its recognition time is 0.076 second per word. To-date we have developed two sign language Japanese translation subsystems. One translates continuous sign language sentences into Japanese @ Point of Feature t

Figure 4.5:

Pattern matching using feature points.

[second]

-, No features extraction

i .0

Time of Recognition [ Features extraction

Figure 4.6 Recognition ratio.

However, its vocabulary is only sentences. seventeen words and it translates only very simple sentences, for example, “ I have a stomach ache”, The other subsystem or “I wish to marry”. translates sequences of individual sign language words and ignores sentence structure but it has a larger vocabulary of 100 words. This latter subsystem can be used in syntactically understanding situations, such as a counter for tickets.

5 Sign Language Synthesis 5.1 Sign language synthesis The sign language synthesis part is one component of Japanese - sign language translation subsystem which translates Japanese into JSL. The sign language is displayed as sign language animation using three-dimensional CG (computer graphics). Figure 5. I shows Japanese - sign language translation subsystem. Important problems related to sign language synthesis in our approach are as follows: (1) It is necessary to make data into threedimensional CG to display sign language word as animation. (2) It is necessary to connect the intervals of each sign language word to smoothly display the animation as one sign language sentence To solve the first problem, we use DataGlove. The data to move three-dimensional CG is registered by DataGlove, thus making the animation of sign language move naturally. We solve the second problem by automatically interpolating the animation between sign language words. One sign language word is animated by using CC patterns for that word. We interpolate the interval between sign language words by automatically generating the animation from the ending position of the hand in the previous word to

Japanese

=+Bp

%:%::

+

I Sign Language Animation DataGlove W

Sign Language CG Pauems :T:

‘

Figure 5. I: Japanese - Sign Language Translation Subsystem the starting position of the hand in the next word. Consequently, the sign language sentence is displayed as a smooth, serial animation. One advantage of our sign language synthesis method is that it displays one sign language sentence as animation by simply combining CG patterns of sign language words. Therefore, it is easy to cteate and change sign language animation. We can use computer graphics or video to display sign language. We think that the video is more realistic and gives a warmer impression than computer graphics. However, if we use video, it is difficult to display free, continuous sign language as a smooth video in the sign language translation system. Moreover, it is generally difficult to edit and modify video. Moreover, as time passes, it is increasingly difficult to modify the video because the appearance of even the same person may change. For these reasons, we use computer graphics. The system can translate very simple Japanese sentences. For example, when it gets “What is your name?” or “We need your name,”, it generates sign language animation which has the same meaning of the inputted sentences.

5.2 Application synthesis

We have built a prototype system which explains the operations of the automatic resident card delivery machineJ with sign language animation to hearing-impaired people. The prototype system displays sign language animation, in addition to texts, to explain the operations. We think that we can explain more clearly by using both texts and sign language animation. There are many situations in which we need to transfer information to hearing-impaired people in public institutions. In such a situation, we can expect sign language animation to be useful in spreading the information quickly and emily m hearing-impaimd people. The prototype system has 1I sign language sentences. Four hearing-impaired persons and two interpreters evaluated these sign language sentences. Everybody managed to comprehend all of the sentences though some were difficult to understand. Two factors are attributed to this problem. When the hands move forward, the words are not easily read, and sometimes the facial were poor. Clearly we need to expressions improve our animation system. Nevertheless, it is in a sufficient state of development to prove the usefulness of sign language animation as an information medium. This Japanese - sign language translation subsystem can not translate many Japanese sentences for the explanation of how to operate the machine into sign language sentences, because the explanations are too complicated for the subsystem. The current subsystem can translate simple sentences such that on] y one verb is in a sentence. However, many sentences for the

of sign language

As mentioned before, people who were born hearing-impaired or lost the abilities of hearing before about nine years old may think in sign language. Therefore, we think that we should use sign language animation over and above the original purpose of sign language translation as a means of transferring information to hearingimpaired people.

31tis similar to ATM, which can use to obtain resident cards.

7

Y. Nakano, “A Study of Sign Litnguage”, Fukumura Publisher, 1981 (in Japanese) [6] K. Nozawa, “Case Work 11 for Hearingimpaired People”, 1991. [7] J. Ogawa and K. Kanda, “Basic of Sign Language Translation”, Daiichi-Hoki, 1992 (in Japanese) [8] M. Ohki, H. Sagawa, T. Sakiyama, E. Oohira, and M. Fujisawa, “Pattern Recognition and Synthesis for Sign Language Translation System”, Technical Reportof Information,Media of IPSJ, 15-6, 1994 (in Japanese). [9] R. Oka, “Continuous Words Recognition by Use of Continuous Dynamic Programming for Pattern Matching”, Acous. Sot. J., SIGS, S78-20, pp. 145 - 152, 1978 (in Japanese). [10] E. Oohira, T. Sakiyarna, M. Abe, and H. Sagawa, “Study on Sign Language Generation System”, 46th IPSJ Annual Conference, Vol. 1, pp. 309-310, 1993 (in Japanese) [11] H. Sagawa and M. Abe, “Enlargement of Vocabulary on Sign Language Translation System”, 46th IPSJ Annual Conference, Vol. 1, pp. 307-308, 1993 [5]

Figure 5.2: Example of sign language animation. explanations are long and have more than two verbs. To generate the sequences of sign language words we need human interpreters. However, even without automatic translation, the system is a practical tool for creating sign language animation to hearing-impaired people.

6

[12] H. Sagawa, T. Sakiyrrma, E. Oohira, H. Sakou, and M, Abe, “Prototype Sign Language Translation System”, Proc. of IISF/ACM Japan International Symposium, pp. 152-153, 1994. [13] T. Sakiyama, E. Oohira, H. Sagawa, M. Abe, and K. Arai, “Sign Language Generation using Computer Animation”, 46th IPSJ Annual Conference, Vol. 1, pp. 311-312, 1993 (in Japanese) [14] T. Takahashi and F. Kishino, “A Hand Gesture Recognition Method and Its Application”, The Transactions of IEICE, Vol. J73-D-11, No. 12, pp. 1985-1992, 1990 (in Japanese) Y. Nagashima, H. Mihara, H. [15] M. Terauchi, Nagashima and G. Ohwa, “The Examination which is Basic is Done by the animated Induction System”, Technical Report of Human Interface of lPSJ, 41-7, 1992 (in Japanese). “DATAGLOVE MODEL 2 [16] VPL Research, Operation Manuat”, VPL Research, Inc., 1989 [17] S. Watanabe, T. [zuchi, E. Fujishige and T. Kurokawa, “Technical Aspects of Automatic Translation of Japanese to Sign Language”, Human Interface, Vol. 8, pp. 363-370, Society of Instruments and Control Engineers, 1993 (in Japanese). [18] Welfare Ministry, “ Welfare White Paper”, Welfare Ministry, 1989 (in Japanese).

Conclusion

We have described the basic situation for hearingimpaired people in Japan as background for the introduction of our sign language translation system. We are now working on increasing the sign recognition speed for large vocabularies. We are also investigating the application of sign language animation as an information media for hearing-impaired people.

References [1]

S. Ishihara, “Sign Language series, 1990 (in Japanese)

[2]

J. Lee and T. L. Kunii, “Generation and Recognition of Sign Language Using Graphic Models”, Proc. of IISF/ACM Japan International Symposium, pp. 96103, 1994. T. Matsumoto and K. Kamata, “Basic Study on Constructing Sign Word Processor”, Technical Report of IEICE, HC93- 10, 1993 (in Japanese). T. Miura, “Answering the Challenge to Develop Accessible Information System for the Disabled”, Proc. of IISF/ACM Japan International Symposium, pp. 112-119, 1994.

[3]

[4]

for Everyone”,

NHK

8