Effective Animation of Sign Language with ... - Semantic Scholar

4 downloads 148 Views 866KB Size Report
Effective Animation of Sign Language with Prosodic Elements for. Annotation of Digital Educational Content. Nicoletta Adamo-Villani, Kyle Hayward, Jason ...
Effective Animation of Sign Language with Prosodic Elements for Annotation of Digital Educational Content Nicoletta Adamo-Villani, Kyle Hayward, Jason Lestina, Ronnie Wilbur, Purdue University*

1. Introduction Computer animation of American Sign Language (ASL) has the potential to remove many educational barriers for deaf students, because it provides a low-cost, effective means for adding sign language translation to any type of digital content. Several research groups [1-3] have investigated the benefits of rendering ASL in 3D animations. Although the quality of animated ASL has improved in the past few years and it shows strong potential for revolutionizing accessibility to digital media, its effectiveness and wide-spread use is still precluded by two main limitations: (a) low realism of the signing characters, which result in limited legibility of animated signs and low appeal of virtual signers, and (b) lack of easy-to-use public domain authoring systems that allow educators to create animated ASL annotated educational materials. The general goal of our research is to overcome both limitations. Specifically, the objective of the work reported in the paper was to research and develop a software system for annotating math/science digital educational content for grades 1-3 with expressive ASL animation with prosodic elements. The system provides educators of the Deaf with an effective means of creating and adding grammatically correct, life-like sign language translation to learning materials such as interactive activities, texts, images, slide presentations, and videos.

2. The ASL Authoring System The system has been iteratively developed with continuous feedback from teachers and students at the Indiana School for the Deaf (ISD). It includes 3 components: 3D Model Support Component. This component allows importing 3D models of characters and background 3D scenes. Animation Support Component. This component enables the user to (a) import signs from a sign database, (b) create new signs, (c) create facial articulations, (d) smoothly link signs and facial articulations in ASL continuous discourse, and (e) type an ASL script in the script editor and automatically generate the corresponding ASL animation. (a) The system includes an initial database of animated signs for mathematics for grades 1-2; more signs can be added to the library. (b) If a needed sign is not available in the database, it can be created by defining character hand, limb, body poses. (c) Facial articulations are created by combining morph targets in a variety of ways, and applying them to the character. (d) The animation support module computes realistic transitions between consecutive poses and signs. (e) The ASL system includes a tool that understands ASL script syntax (which is very similar to ASL gloss): the ASL Script Editor. The ASL script editor enables a user with knowledge of ASL gloss, to type an ASL script including both ASL gloss and mathematical equations; the script is then automatically converted to the correct animations with prosodic elements.  Rendering Support Component. This component implements advanced rendering effects such as ambient occlusion, motion blur, and depth-of-field to enhance visual comprehension of signs. It exports the final ASL sequences to various movie formats. _________________________________________________ *email: {nadamovi, khayward, jlestina, wilbur}@purdue.edu Copyright is held by the author / owner(s). SIGGRAPH 2010, Los Angeles, California, July 25 – 29, 2010. ISBN 978-1-4503-0210-4/10/0007

3. ASL animation with prosodic elements Although various attempts at animating ASL for purposes of deaf education and entertainment currently exist, they all fail to provide regular, linguistically appropriate grammatical markers that are made with the hands, face, head, and body, producing animation that is stilted and difficult to process (as an analogy, try to imagine someone speaking with no intonation). That is, they lack what linguists call ‘prosody’. Prosodic markers (e.g. head nod, hand clasp, body lean, mouth gestures, shoulder raise, etc.) and prosodic modifiers (e.g. sign lengthening, jerk, pauses, etc.) are used in ASL to convey and clarify the syntactic structure of the signed discourse [4]. Research has identified over 20 complex prosodic markers/modifiers and has measured frequencies of up to 7 prosodic markers/modifiers in a two second span [5]. Adding such number and variety of prosodic elements by hand through a graphical user interface (GUI) is prohibitively slow. Our system includes a novel algorithm that automates the process of enhancing ASL animation with prosodic elements. The algorithm interprets the ASL script entered in the Script Editor (described in section 2) and identifies the signs and the prosodic markers/modifiers needed to animate the input sequence. The prosodic elements are added automatically from ASL prosody rules. Example prosody rules automated by our algorithm are: - High-level sentence structure. Appropriate prosodic modifiers are added to mark the beginning (i.e. blink before hands move) and end (i.e. longer last sign) of the sentence. Periods and commas are automatically translated to their respective prosodic modifier (longer and shorter pauses, respectively). - Sentence type. Interrogative, imperative, and conditional sentences are detected based on punctuation marks (?, !) and key words (e.g. “wh-words”, “if”, “whether”) and appropriate prosodic markers (i.e. raised eyebrows) are added. The final animation is assembled by retrieving the required signs from the sign database and by translating the identified prosodic elements to corresponding animation markers/modifiers. A multitrack animation timeline is populated with the animated signs and animation markers/modifiers. Most prosody markers are layered on top of the animated signs. Some prosody markers, such as e.g. hand clasp, are inserted in between signs. Prosody modifiers are layered on top of the signs they modify. The supplementary video includes examples of ASL animated sequences enhanced with algorithmically generated prosodic elements.

4. Discussion and Conclusion The system described in the paper is the first and only animation-based sign language program that produces fluid ASL animation enhanced with automatically generated prosodic elements. The problem of advancing Deaf education decisively can only be solved if the process of increasing ASL animation quality is automated. Scalability to all age groups and disciplines can only be achieved if educational content can be easily annotated with life-like, grammatically correct ASL animation by teachers with no computer animation expertise. Our system provides a solution to this problem because it enables users with no technical background to create high quality ASL animation by simply typing an ASL script.

Effective Animation of Sign Language with Prosodic Elements for Annotation of Digital Educational Content Nicoletta Adamo-Villani, Kyle Hayward, Jason Lestina, Ronnie Wilbur, Purdue University

Supplementary Document 


References (for the 1-page abstract) 1. ADAMO-VILLANI, N. and WILBUR, R. 2008. Two Novel Technologies for Accessible Math and Science Education. IEEE Multimedia – Special Issue on Accessibility, October-December 2008, 38-46. 2. THE DEPAUL UNIVERSITY SIGN LANGUAGE PROJECT available at: http://asl.cs.depaul.edu/ 3. VCOM3D, Inc. available at: http://www.vcom3d.com/ 4. BRENTARI, D. 1998. A Prosodic Model of Sign Language Phonology. Cambridge, MA: MIT Press. 5. NICODEMUS, B. 2007. The use of prosodic markers to indicate utterance boundaries in American Sign Language interpretation. Doctoral dissertation, University of New Mexico.

_____________________________________________________________________________________


THE
ASL
SCRIPT
EDITOR
­
DETAILS
 There is no generally accepted system of written ASL and it is not possible to translate English into ASL word-by-word. Therefore, to write ASL signs and sentences, linguists and educators use glossing. In ASL gloss, every sign is written in CAPITAL LETTERS, i.e. PRO.1 LIKE APPLE “I like apples”. Gestures that are not signs are written in lower-case letters between quote marks, i.e. "go there". Proper names, technical concepts and other items with no obvious translation into ASL may be finger-spelled, which is glossed either as fs-MAGNETIC or m-ag-n-e-t-i-c. Upper body, head and facial articulations associated with syntactic constituents are shown above the signs with which they co-occur, with a line indicating the start and end of the articulation. wh-q wh-q YOUR NAME? or YOUR NAME WHAT? ‘What is your name?’ where ‘wh-q’indicates a facial articulation with lowered eyebrows (Weast 2008). In our ASL system, to animate ASL sentences, the user types the corresponding ASL script in the script editor. The script is interpreted and automatically converted to the correct series of sign animations and facial expressions that include clearly identifiable ASL prosodic elements. The ASL script is similar, but not exactly the same, as ASL gloss, Table 1 shows differences and similarities. We anticipate that ASL users familiar with ASL gloss will learn ASL script quickly and easily. English


See
the
bowl
there?
It
has
5
apples.
Now,
you
click
and
drag
to
remove
3
apples.



ASL
gloss
 br

























bl







hn lf SEE
BOWL
IXa

HAVE
5
APPLE.


NOW,



YOU
CLICK‐DRAG
REMOVE
3
APPLE

ASL
script


br = brow raise; bl = blink; hn = head nod; lf = lean forward SEE
BOWL
PT‐h2?
IX‐LOC:
none?
HAVE
5
APPLE.
NOW,
YOU
CLICKDRAG
CDREMOVE
 3
APPLE.







Table 1. In this example, the ASL script differs from ASL gloss: there are no lines above the sign names; the pointing to the signer’s non-dominant hand after signing BOWL is indicated in the script as PT-h2 (point to hand2); a question mark is used to trigger brow raise and phrase final lengthening of the final sign; a period is used to trigger a blink and phrase final lengthening; the comma after NOW will trigger the head nod; the sign YOU triggers the start of a lean forward; the name of the sign CDREMOVE (computer term: click and drag remove) calls a different sign from the lexicon than REMOVE.

_____________________________________________________________________________________


NOTES
ON
ASL
PRODOSY
 What is prosody and why it is important To understand prosody, we begin with a brief discussion of speech, because it is much better studied and can be used as a reference by ASL prosody studies. Spoken language has a hierarchical structure of sounds (the smallest units), syllables (groups of sounds), words (made from syllables), phrases (groups of words), and discourse utterances (groups of adjacent phrases). Prosodic markers indicate which units are grouped together and serve as cues for parsing the signal and aiding comprehension. As a general notion, prosody includes the relative prominence, rhythm, and timing of articulations in the signal. For speech, the manipulated variables are pitch (fundamental frequency), intensity (amplitude), and duration. For example, a word at the end of a phrase will have a longer duration than in the middle of a phrase (Phrase Final Lengthening). While there is a good deal of correspondence between syntactic breaks and prosodic breaks, syntax by itself does not uniquely predict prosodic structure [1,2]. The Prosodic Hierarchy [3]; from smallest to largest) is: Syllable < Prosodic Word < Prosodic Phrase < Intonational Phrase. How Prosodic Phrases and Intonational Phrases are constructed from Prosodic Words may depend on: information status in the phrase (old vs. new), stress assignment (often affected by information status), speaking rate, situation formality/register (articulation distinctness), among others. From this brief introduction to prosody, we draw a number of lessons for application to ASL. First, the manipulated variables in the signed signal are displacement, time, and velocity (v=d/t), and derivatives thereof [4]. Second, like speech, ASL has Phrase Final Lengthening [5,6,7,8]. Third, like speech, there is a good deal of correspondence between syntactic breaks and prosodic breaks [9,10]. Fourth, like speech, syntax by itself does not predict all prosodic domains [11,12,13,14,15], with information structure [16,17] and signing rate strong influences. Fifth, like speech, the Prosodic Hierarchy holds for ASL [18]. To date, Syllables, Prosodic Words, and Intonational Phrases are well understood [11,19,20,21,22,23]; Prosodic Phrases are just now being investigated. Speech is made with several articulators (vocal cords, velum, tongue, teeth, lips), only a few are visible (lips, teeth, occasionally tongue), and except for speechreading, what is visible is not relevant. In contrast, the entire body is visible when signers are signing, and when they are not. Thus, the signal must contain ways to indicate to the viewer that linguistic information is being transmitted. Also, signers can use one or both hands while signing. In addition, there are 14 potential articulators besides the hands (the nonmanuals): body (leans), head (turn, nod, shake), eyebrows (up, down, neutral), eyelids (open, droop, closed), eyes (gaze) (up, down, left, right, each of four corners), nose (crinkle), cheeks (puffed), lips: upper; lower; both; lip corners (up, down, stretched), tongue (out, touch teeth, in cheek, flap, wiggle), teeth (touch lip/tongue, clench), chin(thrust) [23] . The visual signed signal may contain several simultaneous nonmanuals. Finally, the non-signing hand may be used as a phrasal prosodic marker. The ASL linguistic system has evolved so that articulations do not interfere with each other either in production or in perception [24]. ASL nonmanuals are layered: they can be subdivided into groups: those on the upper face and head [20,21,23] occur with syntactic constituents (clauses, sentences), and on the lower face with adverbial/adjectival information and mark Prosodic Words [11,23,8].

From this summary of ASL prosody, it should now be clear why linguistic prosody is missing from existing animated signing. The state of the art is that animators do not have most of this information available to them. Worse, if they wanted to add it into their animations, they would have to modify the code for each sign individually for each nonmanual articulator and each change in articulation movement, by hand. Thus, our project represents a major leap forward by combining known prosodic characteristics with an animation algorithm for easily adding predictable prosody to animated signing. Figure 1 shows the difference between a real signer and an avatar without prosody.

Figure 1. The top row shows a Deaf signer signing with prosody, and the bottom row shows an animated signing avatar that represents the state-of-the-art in ASL animation. The frames were extracted from the ASL translation of “See the bowl right there? It has 5 apples. Now you click and drag to remove 3 apples.” In neutral position before the start of signing, note that the signer blinks and the avatar does not. For SEE, the signer’s mouth indicates that she is mouthing the English word ‘see’, as does the avatar. However the signer also leans slightly forward, including the viewer in the conversation, and raises her eyebrows as part of a questioning face. The signer signs BOWL and says ‘bowl’, whereas the avatar only makes the sign. On ‘point to BOWL’, the signer maintains her left hand in the shape of the bowl (‘non-dominant hand spread’ is a marker of signs being in the same phrase), continues to lean forward, keeps her brows raised, gazes at the camera to continue the contact with the viewer, and nods her head to emphasize ‘the bowl there’ and to indicate the end of the syntactic clause. Producing YOU, the signer’s head has straightened up, indicating new clause; she leans slightly forward to include the viewer, gazes at the camera, has her brows in neutral position, and says the English word ‘you’. On REMOVE, she leans slightly back, moves her head in the direction that her hand moves, closes her eyes, and has a facial expression indicating ‘dismissal’, a form of negation. Note also that her non-signing hand has been kept at waist height, not dropped to neutral position (non-signing hand spread again). During this entire sequence, the avatar has not used the non-signing hand, has not changed eye gaze, has not shifted body position, has not turned or nodded his head, and has not blinked.


Prosody
in
animated
speech
generation
 The importance of facial expression and gestures as prosodic elements for enhancing speech communication is well established. The first challenge towards producing computer animation of a speaking character that is enhanced with rich prosody is to understand the various types of prosodic elements. Whereas for facial expressions accompanying speech the semantics are obvious, there is no clear taxonomy for gestures. McNeill identifies 4 types of gestures— iconics (concrete objects or actions), metaphorics (abstract ideas), deictics (object location), and beats (emphasis) [25] but later suggests that the original taxonomy is too rigid [26]. Recently, [27] suggests new ways of approaching gesture analysis. The second challenge is to determine when and where prosodic elements are needed. One approach is automatic processing from a variety of input. Examples include natural language processing of text [28], processing of video [29], and pitch, intensity, and duration analysis of speech to derive body-language [30] or head motion prosody [31]. A second approach is to rely on the user to encode the need and location of prosodic elements. Several prosody annotation schemes have been proposed [32,33,34], but a standard is yet to emerge. The third challenge is the actual rendering of desired prosodic elements through animation. The approaches explored include rule-based methods [28], data-driven methods based on Hidden Markov models [31], hybrid rule-based and data-driven methods [35], video morphing methods [36], and physical muscle simulation methods [37].

ASL prosody animation shares some of these challenges, such as taxonomy and textual annotation methods of prosodic elements not yet being fully established. ASL animation does not benefit from input in audio form and any automatic detection of prosodic elements has to rely on video. Our project targets ASL annotation of educational materials, thus real-time interpretation of ASL is not needed. One advantage specific to ASL is that part of the prosody has linguistic function, and such prosodic elements can be added automatically to text based on simple rules. Finally, whereas for applications such as ASL story-telling data-driven animation of prosodic elements should be preferred in order to capture and convey the talent of ASL artists, in the context of our project a more suitable choice is robust and low-cost rule-based animation.

ASL
Prosodic
elements
we
predict
with
certainty
 Prosodic constituency and phrasing. Grosjean & Collins [38] reported that for English, pauses with durations of greater than 445 ms occur at sentence boundaries; pauses between 245 and 445 ms occur between conjoined sentences, between noun phrases and verb phrases, and between a complement and the following noun phrase; and pauses of less than 245 ms occur within phrasal constituents. Grosjean & Lane’s [9] findings for ASL were that pauses between sentences had a mean duration of 229 ms, pauses between conjoined sentences had a mean duration of 134 ms, pauses between NP and VP had a mean duration of 106 ms, pauses within the NP had a mean duration of 6 ms, and pauses within the VP had a mean duration of 11 ms. These results support the relevance of the Prosodic Hierarchy to ASL. Recent research [8] shows that pause duration and other prosodic markers are dependent on signing rate. Phrase Final Lengthening (holding or lengthening duration of signs) is also well-documented [5,6,8]. Our animation algorithm ensures that as signing rate is adjusted, prosodic indicators are also properly adjusted. The highest prosodic grouping, Intonational Phrases (IP), are marked with changes in head, body position, facial expression, and periodic blinks [1,21]. IPs can be determined by two algorithms. Wilbur [21] used Selkirk’s [14,15] derivation of IPs from syntactic constituents, whereas Sandler & Lillo-Martin [13] used Nespor and Vogel’s [3] Prosodic Hierarchy. Both reflect word groupings, so it seems to make no difference. The one remaining prosodic level that has been well studied is the Prosodic Word. Brentari & Crossley [11] reported that changes in lower face tension (cheeks, mouth) can separate Prosodic Words from each other. Stress. As indicated, syllables are marked by hand movement. ASL stress is marked by modifying the hand movement, In particular, peak velocity (measured with 3D motion capture) is the primary kinematic marker of stress [22], along with signs being raised in the vertical signing space. Generally, every sentence has one stressed sign (most signs are single syllables) and our own research shows that the stress in ASL is predictably at the end of the clause [16,22]. For those few signs that have more than one syllable, we have worked out the stress system and can predict which syllable gets stress [22,39]. The notion of intonation for sign languages. Intonation in sign languages is dependent on which nonmanual articulators take which poses in connection with which signs. For example, in addition to, or even instead of, a negative sign in a sentence (NOT, NEVER, NONE, etc), a negative headshake starts at the beginning of the sentence part that is negated and continues until the end. For us, this means that our algorithm need only find a negative sign in the input, and it can automatically generate the negative headshake starting and ending at the right time. (We do, however, have to investigate the correct turning rate of the head to get the right natural look.) Another example is the brow lowering that occurs with content questions, those with ‘wh-words’ (who, what, when, where, why, how). In this case, the brow lowers at the beginning of the clause containing the wh-sign and continues to the end, even if the wh-sign is itself at the end of the clause (in English, it always goes to the beginning, but this is not true in some languages). A third example is brow raising, This brow position occurs with a wide variety of structures in ASL: topic phrases, yes/no questions, conditional (‘if’) clauses, relative clauses, among others. In each case, the brow raises at the beginning and lowers at the end of the clause. Unlike negative headshakes and brow lowering, it is possible to have more than one brow raise in a sentence; for this reason, the input to our computational algorithms will include commas to separate clauses. The comma serves other purposes as well – a sentence can start with a brow raise on a conditional clause (“if it rains tomorrow”), and then lower for a content question (“what would you like to do instead?”) or return to neutral for a statement (“I think I’ll stay home and read.”). To predict the proper patterns, we have to signal the ‘on and off’ of various poses for various articulators other than the hands. A fourth example is the use of eye blinks. Baker and Padden [40] first brought eyeblinks to the attention of sign language researchers as one of four components contributing to the conditional nonmanual marker (others were contraction of

the distance between the lips and nose, brow raise, and the initiation of head nodding). Stern and Dunham [41] distinguish three main types of blinks: startle reflex blinks, involuntary periodic blinks (for wetting the eye), and voluntary blinks. Both periodic blinks and voluntary blinks serve specific linguistic functions in ASL. Periodic blinks (short, quick) are a primary marker of the end of IPs in ASL [21]. In contrast, voluntary blinks (long, slow) occur with specific signs to show emphasis (not a predictable prosodic function, but a semantic/pragmatic one). One further study warrants attention. Weast [19] measured brow height differences at the pixel level for ASL statements and questions produced with five different emotions. She observed that eyebrow height showed a clear declination across statements and to a lesser extent before sentence-final position in questions, parallel to intonation (pitch patterns) in languages like English. Weast also shows that eyebrow height differentiates questions from statements, and yes/no questions from wh-questions, performing a syntactic function. Furthermore, maximum eyebrow heights differed significantly by emotion (sad and angry lower than neutral, happy, and surprised). The syntactic uses of eyebrow height are constrained by emotional eyebrow height, illustrating the simultaneity of multiple messages on the face and the interaction between information channels [42]. To recap, we can predict where pausing and sign duration lengthening should occur, along with changes in poses of the brows, head, and body, eye blinks, eye gaze, cheek and general mouth pose (although we have not given examples for all). Now we can see the brilliance of the ASL prosodic system: syllables are marked by hand movement; Prosodic Words are marked by lower face behaviors; and Intonational Phrases are marked by upper face (blinks, eye gaze, brows changing pose), head (nods), and body (leans) positions. Emotions affect the range of movement of articulators. Everything is visible.

REFERENCES
(for Notes on ASL Prosody) 1. Sandler, W. & Lillo-Martin, D. (2006). Sign Language and Linguistic Universals. Cambridge: Cambridge University Press. 2. Wilbur, R. B. & Patschke, C. (1999). Syntactic correlates of brow raise in ASL. Sign Language & Linguistics 2: 3-40. 3. Nespor, M. & Vogel, I. (1986). Prosodic phonology. Dordrecht: Foris. 4. Wilbur, R. B. & Martínez, A. (2002). Physical correlates of prosodic structure in American Sign Language. M. Andronis, E. Debenport, A. Pycha & K. Yoshimura (eds.), CLS 38: 693-704. 5. Liddell, S. K. (1978). Non manual signals and relative clauses in ASL. In P. Siple (ed.), Understandinglanguage through sign language research, (pp. 59-90), New York: Academic. 6. Liddell, S. K. (1980). American Sign Language syntax. The Hague: Mouton. 7. Wilbur, R. B. (2008). Success with deaf children: How to prevent educational failure. In D. J. Napoli, D. DeLuca, K. Lindgren (eds.), Signs and Voices, 119-140. Washington, DC: Gallaudet University. 8. Wilbur, R. B. (2009). Effects of varying rate of signing on ASL manual signs and nonmanual markers. Language and Speech 52(2/3): 245-285. 9. Grosjean, F. & Lane, H. (1977). Pauses and syntax in American Sign Language. Cognition 5: 101-117. 10. Wilbur, R. B. (1994). Eyeblinks and ASL phrase structure. Sign Language Studies 84: 221-240. 11. Brentari, D. & Crossley, L. (2002). Prosody on the Hands and Face: Evidence from American Sign Language.

Sign Language and Linguistics 5(2): 105-130. 12. Sandler, W. (1999) Prosody in Israeli Sign Language. Language and Speech, 42(2–3):127–142. 13. Sandler, W. & Lillo-Martin, D. (2006). Sign Language and Linguistic Universals. Cambridge: Cambridge University Press. 14. Selkirk, E. 1986. On derived domains in sentence phonology. Phonology 3: 371-405. 15. Selkirk , E. O. (1995). Sentence prosody: Intonation, stress, and phrasing. In J. Goldsmith (ed.), The handbook of phonological theory, pp. 550-569. Cambridge, MA: Blackwell Publishers. 16. Wilbur, R. B. (1997). A prosodic/pragmatic explanation for word order variation in ASL with typological implications. In K. Lee, E. Sweetser, & M. Verspoor (eds.) Lexical and syntactic constructions and the construction of meaning, Vol. 1, pp. 89-104. Philadelphia: John Benjamins. 17. Wilbur, R. B. (2006). Discourse and pragmatics in sign language. In The Encyclopedia of Language and Linguistics – 2nd Edition (EEL2), 11:303-307. Oxford, England: Elsevier. 18. Brentari, D. (1998). A prosodic model of sign language phonology. MIT Press. 19. Weast, T. (2008) Questions in American Sign Language: A quantitative analysis of raised and lowered eyebrows. Doctoral dissertation, University of Texas, Arlington. 20. Wilbur, R. B. (1991). Intonation and focus in American Sign Language. In Y. No & M. Libucha (eds.), ESCOL '90: Eastern States Conference on Linguistics, pp. 320-331. Columbus, OH: Ohio State University Press. 21. Wilbur, R. B. (1994). Eyeblinks and ASL phrase structure. Sign Language Studies 84: 221-240. 22. Wilbur, R. B. (1999). Stress in ASL: Empirical evidence and linguistic issues. Language & Speech 42: 229-250. 23. Wilbur, R. B. (2000). Phonological and prosodic layering of non-manuals in American Sign Language. In Lane, H. and Emmorey, K. (eds.), The signs of language revisited: Festschrift for Ursula Bellugi and Edward Klima, pp. 213-241. Hillsdale, NJ: Lawrence Erlbaum. 24. Siple, Patricia. 1978. Visual constraints for sign language communication. Sign Language Studies 19:95-110. 25. McNeill, D. (1992). Hand and Mind: What Gestures Reveal About Thought. University of Chicago Press. 26. McNeill, D. (2005). Gesture and Thought. University of Chicago Press. 27. Wilbur, R. B. & Malaia, E. (2008). Contributions of sign language research to gesture understanding: What can multimodal computational systems learn from sign language research. International Journal of Semantic Computing 2(1): 5-19. 28. Cassell, J., Vilhjalmsson, H. H. & Bickmore, T. (2001). Beat: the behavior expression animationtoolkit. In Proc. ACM SIGGRAPH, pp.477–486. 29. Neff, M., Kipp, M., Albrecht, I. & Seidl, H.P. (2008). Gesture modeling and animation based on a probabilistic

re-creation of speaker style. ACM Transactions on Graphics 27(1):1–24. 30. Levine,S., Theobalt,C. & Koltun, V. (2009). Real-Time prosody-driven synthesis of body language. To appear ACM Transactions on Graphics, SIGGRAPH ASIA. 31. Busso, C., Deng, Z., Neumann,U. & Narayanan, S.(2005). Natural Head Motion Synthesis driven by acoustic prosodic features. Computer Animation and Virtual Worlds 16(3-4): 283-290. 32. Hartmann, B., Mancini, M. & Pelachaud, C. (2002). Formational parameters and adaptive prototype instantiation for mpeg-4 compliant gesture synthesis. In Proc. on Computer Animation, IEEE Computer Society, Washington, DC, USA, p. 111. 33. Kipp, M., Neff, M. & Albrecht, I. (2007). An annotation scheme for conversational gestures: how to economically capture timing and form. Language Resources and Evaluation 41(3/4): 325–339. 34. Kopp, S. & Wachsmuth, I. (2004). Synthesizing multimodal utterances for conversational agents: Research articles. Computer Animation and Virtual Worlds 15(1): 39–52. 35. Beskow, J. 2003. Talking Heads - Models and applications for multimodal speech synthesis. PhD thesis, KTH Stockholm. 36. Ezzat, T., Geiger, G. & Poggio, T. (2002). Trainable videorealistic speech animation. In SIGGRAPH’02: ACM SIGGRAPH 2002 Papers, pp. 388–398. ACM, New York, NY. 37. Sifakis, E., Selle, A., Robinson-Mosher A. & Fedkiw, R. (2006). Simulating speech with a physicsbased facial muscle model. In Proc. of ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 261–270. 38. Grosjean, F. & Collins, M. (1979). Breathing, pausing, and reading. Phonetica 36: 98-114. 39. Wilbur, R.. B. (in press b). Sign syllables. In van Oostendorf, M. (ed), Companion to Phonology. NY/Oxford: Wiley-Blackwell. 40. Baker, C. & Padden, C. (1978). Focusing on the nonmanual components of ASL. In Siple, P., (ed.), Understanding language through sign language research, pp. 27-57. New York: Academic Press 41. Stern, J., & Dunham, D. (1990). The ocular system. In Cacioppo, J. T. & Tassinary, L. G. (eds.), Principles of psychophysiology: Physical, social, & inferential elements, pp. 513-553. Cambridge,England: Cambridge University Press. 42. Ladd, D. R., Jr. (1996). Intonational Phonology. Cambridge, England: Cambridge University.

Suggest Documents