An NLP-based 3D Scene Generation System for ...

An NLP-based 3D Scene Generation System for Children with Autism or Mental Retardation ¨ Yılmaz Kılı¸caslan, Ozlem U¸car, and Edip Serdar G¨ uner Department of Computer Engineering, University of Trakya, Ahmet Karadeniz Yerle¸skesi, 22030 Edirne, Turkey {yilmazkilicaslan,ozlemucar,eserdarguner} @trakya.edu.tr http://tbbt.trakya.edu.tr/

Abstract. It is well-known that people with autism or mental retardation experience crucial problems in thinking and communicating using linguistic structures. Thus, we foresee the emergence of text-to-image conversion systems to let such people establish a bridge between linguistic expressions and the concepts these expressions refer to via relevant images. S2S is such a system for converting Turkish sentences into representative 3D scenes via the mediation of an HPSG-based NLP module. A precursor to S2S, a non-3D version, has been tested with a group of students with autism and mental retardation in a special education center and has provided promising results motivating the work presented in this paper. Key words: Sentence-to-Scene Conversion, Educational Technology, NLP, HPSG.

1

Introduction

The use of educational technologies for supporting the education of disabled children continues to increase both in quantity and quality. Particularly with personal computers that have advanced by leaps and bounds and become cheap enough to be ubiquitous in the last thirty years, software technology offers great opportunities for disabled people to communicate and socialize.1 Griswold et al. [6] lists ‘poor comprehension of abstract concepts’ among other cognitive problems exhibited by individuals with autism. It is also observed that some people with mental retardation experience the same sort of problems: as they have difficulties in grasping abstract concepts, they tend to think in terms of concrete visual images rather than linguistic expressions [13]. A crucial claim bearing particular relevance to the matter under discussion is that computerbased technologies have a potential to significantly alleviate educational and 1

See [12] for a comprehensive and detailed account of computer-based technologies that can serve to include people with disabilities into the mainstream of society.

2

ICAISC 2008

social and/or communicative handicaps that individuals with mental retardation or autism face ([7], [14]). To this effect, we have developed a series of software programs as part of a research project2 to assist the education and training of children with autism and mental retardation.3 In the first and second cycles of the project, two modules were consecutively developed to map words to images and sentences to 2D pictures ([10]). Several experiments have been performed using these modules with 88 children in Ya˘gmur C ¸ ocuklar Psychological Counseling and Special Education Center, Istanbul. An improvement between 20 to 25% was observed in the learning performances of the children when they were assisted with the modules.4 These results and suggestions by the trainers encouraged us to move to the third cycle of project in order to incorporate a 3D scene generator to the system. This latest version of the software awaits to be tested with children. Except for two previous works of ours, to the best of our knowledge, there is no work based on a notion of conversion from natural language sentences to scenes developed with the aim of assisting the education of children suffering from autism or mental retardation. However, there is an abundance of work reported to be about scene generation based on natural language. Even though it did not have a graphics component, the SHRDLU system [15] was one of the early AI programs that successfully used natural language to move various objects around in a closed virtual world. The system by Adorni et al [1] was intended to imagine a static scene described by means of a sequence of simple phrases, focusing particularly on the principles of equilibrium, support and object positioning. The Put system by Clay & Wilhelms [3] allowed spatial arrangements of existing objects on the basis of an artificial subset of English consisting of expressions of the form Put(X P Y), where X and Y are objects, and P is a spatial preposition. The WordsEye system by Coyne & Sproat [4], which is currently under active development, is a major improvement over the restrictions of the preceding systems. It is not confined to a closed world but provides a blank slate where the user can paint a picture with a text describing not only spatial relations but also actions performed by objects in the scene. S2S comes close to Put in limiting object configuration to spatial relations and it is like WordsEye in allowing entirely natural linguistic expressions as input. However, S2S generates considerably less complicated scenes compared to WordsEye. This is partially a shortcoming of a system which is still under development but also a requirement imposed by the current field of application, namely assisting the education of disabled children not capable of grasping complicated configurations.

2

3

4

The University of Trakya Scientific Research Project Office supported this work with grant number TUBAP-760. The precursory work was disseminated in a special issue of the IEEE Journal of Pervasive Computing dedicated to works in progress in healthcare systems and other applications (Droes et al.(2007)). See [14] for a detailed discussion of the experimental results.

ICAISC 2008

2

3

Architectural Design

S2S is composed of four components: a Lexical Preprocessor, a Natural Language Processor, a Scene Generator, and a Renderer. The following diagram shows the interaction of these components in terms of data flow between them:

Fig. 1. The architectural design of S2S.

The Lexical Preprocessor transforms the given natural language sentence into a list containing the words of the sentence with any capital letters and punctuation removed away. As the aspects of interpretation encoded by capital letters and any kinds of punctuation fall outside the scope of this work, the input to the Natural Language Processor need be free of these orthographic elements. The output of the Natural Language Processor is a semantic representation encoding the meaning of the input sentence in a structured way. The Scene Generator decodes a semantic representation into a description of scene formulated by parameters set to appropriate values. The Renderer translates a scene description into a 3D image. To give an example, for the Turkish equivalent of the sentence The chair is on the table the semantic representation partially shown in Fig. 2 is yielded which is in turn translated to the 3D scene shown in Fig. 3. In sum, the whole process consists of an analysis phase, which extracts the semantic content of a natural language sentence, and a synthesis phase, which builds up a scene from individual semantic objects.

3

Linguistic Analysis

The main burden of the task of linguistic analysis, parsing natural language expressions in order to extract their semantic content, is carried by the Natural

4

ICAISC 2008

Fig. 2. Semantic representation for The chair is on the table.

Fig. 3. The scene generated for The chair is on the table.

Language Processor. The internal structure of this component is as shown in Fig. 4. The Natural Language Processor consists of two main parts: a parser and a grammar. The sub-component responsible for the parsing process is the Attribute Logic Engine (ALE) (version 3.2.1), which is an integrated phrase structure parsing and definite clause logic programming system in which the data structures are typed feature structures [2]. Feature structures serve as the main representational device in the framework adopted in this study. A feature structure consists of two pieces of information: a type (which every feature structure must have) and a finite set of feature-value pairs (which can possibly be empty) (see Fig. 2). A feature-value pair is defined recursively, where the value itself is a feature structure which can also be an atomic object. An important operation defined over pairs of feature structures is unification, which refers to an operation which has gained widespread recognition as a general tool in computational linguistics since Kay’s [8] seminal work. If two feature structures are consistent, unifying them results in a feature structure subsuming the information contained in both of them; otherwise, the operation fails.5 The parsing process is driven by a Head-driven Phrase Structure Grammar (HPSG) [11] which we have designed and implemented in order to handle a fairly 5

See [2] for a detailed discussion of strong typing, feature structures and unification as implemented in the Attribute Logic Engine (ALE).

ICAISC 2008

5

Fig. 4. The internal structure of the Natural Language Processor.

large fragment of Turkish.6 An HPSG grammar can be split into three units: an ontology, a lexicon and a set of principles. The ontology is a hierarchically organized inventory of universally available types of linguistic entities, together with a specification of their appropriate features and their value types. The lexicon is a system of lexical entries and lexical rules. The principles include universal and language specific constraints which every linguistic structure to be generated by the system must obey. Fig. 5 shows a description of the feature structure which the system assigns to the sentence The chair is on the table:

Fig. 5. The feature structure assigned to The chair is on the table.

The SYNSEM (SYNTAX-SEMANTICS) feature includes a complex of syntactic and semantic information about the modeled linguistic sign. The CAT (CATEGORY) feature encodes syntactic information whereas the CONT (CON6

See [9] for a semantico-pragmatically oriented grammar of a fragment of Turkish developed within a modified version of the HPSG formalism.

6

ICAISC 2008

TENT) value constitutes the sign’s contribution to context-independent aspects of the semantic interpretation. Every semantic value must be of a type subsumed by sem obj (semantic object ). A fragment of the semantic type hierarchy utilized in our grammar is given in Fig. 6.

Fig. 6. A fragment of our ontology.

4

Scene Synthesis

The Scene Generator extracts out of the CONT value of a sentence a configuration of possibly underspecified 3D entities together with the attributes they bear and the relations they stand to each other. Each entity is searched in a database of stereotypical 3D objects indexed by names. 3D objects are stored in the database with default values assigned to their features. Each 3D object is associated with four features: – Size: Each object has a default size, determined in accordance with its stereotypical size. The room, which serves as the scene for all situations described in the system, is 1/50th of a stereotypical room. The default size of each object is scaled accordingly. In addition to the default size, each object is also associated with the ratios by which it is to be scaled when its size is specified with adjectives, such as big and small. However, a lower bound is placed on the size of each object in order to preclude counterintuitive cases that might, for instance, arise with the use of comparative adjectives like smaller and much smaller.

ICAISC 2008

7

– Color: Each object comes with a default color, which is encoded in the corresponding RGB format. Objects are allowed to change color depending on relevant specifications in input sentences. To the extent that it is available in the conventionally used list of colors (e.g. red, blue, yellow, lilac), the color feature can take any value. – Texture: Each object is covered with a default texture. However, as the other features the texture of an object can be re–specified in the input expression (e.g. tartan ball, striped ball ). – Spatial tags: The exact depiction of spatial relations requires the shapes of the objects in question to be known. To this effect all objects in the database are associated with spatial tags such as canopy area, top surface, back surface, front surface, base, and cup. Each tag is further associated with a size feature to encode the information concerning its dimensional measures.

The default values of objects are kept unchanged if not specified in the input sentence; otherwise, they are overridden in favor of the specified value. Fig. 7 and 8 show the scenes corresponding to the sentences The chair is to the left of the table and The small chair is to the left of the table, respectively.

Fig. 7. The chair is to the left of the table.

The configuration of objects is restricted to spatial relations. These are the relations denoted by prepositions like in, on, under, left to, right to, in front of, behind, and next to.7 In fact, as discussed by [5], some reverse spatial relations can be defined in one integral relationship function. The on-under, left-right and in front of-behind pairs are defined as such relations in our framework. 7

The Turkish equivalents of English prepositions are right adjoined to their associated noun phrases, i.e. they are postpositions.

8

ICAISC 2008

Fig. 8. The small chair is to the left of the table.

Another important fact is that the exact positioning of objects relative to each other requires their shapes and surfaces to be taken into consideration. For instance, the ball in the scene referred to by the sentence The ball is on the chair will occupy the top surface area of the chair rather than the top of its back. (see Fig. 9)

Fig. 9. The ball is on the chair.

Moreover, the Scene Generator shows the flexibility of resizing an object if it does not fit in a specified area. For example, if a chair is instructed to be placed in the canopy area of a table (e.g. in the intended reading of the sentence The chair is under the table), an appropriately sized version of the chair is employed. See Fig. 10. However, resizing objects in this way should not lead to scenes conflicting with the common–sense understanding of the world. As the system is intended to assist the education and training of children with autism or mental retardation,

ICAISC 2008

9

Fig. 10. The chair is under the table.

equipping it with a capacity to generate non–common–sensical situations (e.g. a situation described by the sentence The house is under the table) would not be in agreement with this intention. Therefore, when the size of an object falls below the lower bound specified in the database for this object, the system will prompt an appropriate warning message, rather than producing the scene. The outcome of the process described above is a configuration data ready to be rendered. The Renderer (which runs in 3D Developer Studio v.8.0 by 3DSTATE) depicts object configurations starting from a default position in a room environment.

5

Conclusion

Many children with autism or mental retardation experience difficulties in thinking using abstract concepts. It is widely acknowledged that with the aid of methods of visual education, the abstractness of linguistic structures can be circumvented, and learning and understanding with all senses can be promoted. We believe S2S is a new and promising approach to assisting the thinking and learning process of the autistic or mentally retarded with visually concrete representations. It should be emphasized that S2S is not intended to replace 3D software tools, but rather to augment such technology in two dimensions: 1) incorporating Turkish as the natural language to be processed and 2) using the software in the field of special education.

6

Acknowledgments

We are grateful to the University of Trakya Scientific Research Project Office for providing all sorts of support for the realization of the work. We are indebted to Algı Special Education and Rehabilitation Center, Ya˘gmur C ¸ ocuklar Psychological Counseling and Special Education Center and Arma˘gan D¨onerta¸s

10

ICAISC 2008

Education, Rehabilitation and Research Center for Disabled Children for very valuable comments and co-operation at different stages of this work. We are also thankful to the Scientific and Technological Research Council of Turkey (TUBITAK) for supporting one of the authors through scholarship in conducting his MSc research covering the NLP part of the work.

References 1. Adorni, G., Di Manzo, M., & Giunchiglia, F.: Natural Language Driven Image Generation. COLING 84, (1984) 495-500 2. Carpenter, B. & Penn, G.: The Attribute Logic Engine User’s Guide, Version 3.2.1. University of Toronto, (www.cs.toronto.edu/ gpenn/ale.html) (2001) 3. Clay, S. R. & Wilhelms, J.: Put: Language-Based Interactive Manipulation of Objects. IEEE Computer Graphics and Applications, (1996) 31-39 4. Coyne B., & Sproat R.: WordsEye: An Automatic Text-to-Scene Conversion System. Siggraph Proceedings, (2001) 5. Durupınar, F., Kahramankaptan U., & Cicekli I.,: Intelligent indexing, retrieval and construction of crime scene photographs, Proceedings of the 13’th Turkish Symposium on Artificial Intelligence and Neural Networks (2004) 297–306. 6. Griswold, D. E., Barnhill, G. P., Myles, B.S., Hagiwara, T., & Simpson, R.L.: Asperger syndrome and academic achievement. Focus on Autism and Other Developmental Disabilities, vol 17(2), (2002) 94-102 7. Jacklin, A., & Farr, W.: The computer in the classroom: how useful is the computer as a medium for enhancing social interaction with young people with autistic spectrum disorders?. British Journal of Special Education (research section), 32(4), (2005) 209-217 8. Kay, M.: Functional Grammar. In Proceedings of the Fifth Annual Meeting of the Berkeley Linguistic Society, (1979) 142-158 9. Kılı¸caslan Y.: A Form-Meaning Interface for Turkish. Ph.D. Dissertation. University of Edinburgh, (1998) ¨ G¨ 10. Kılı¸caslan, Y., U¸car, O., uner, E.S., & Bal, K.: An NLP-Based Assistive Tool for Autistic and Mentally Retarded Children: an Initial Attempt. Trakya University Journal of Science, 7(2), (2006) 101-108 11. Pollard C., & Sag I. A.: Head-Driven Phrase Structure Grammar. Chicago, London: University of Chicago Press and CSLI Publications, (1994) 12. Poole, J. B., Skymcilvain, E., Jackson, L., & Singer, Y.: Education for an Information Age: Teaching in the Computerized Classroom, 5th Edition, (2005) 13. Turkish Foundation of Support and Education for Autistics Periodical. (TODEV), Volume 2, (2002) ¨ Engelli C 14. U¸car, O.: ¸ ocuklar I¸cin Yapay Zeka Tabanli Egitim-Destek Ara¸clari Geli¸stirilmesi (Development of Artificial Intelligence-Based Assistive Tools for the Education of Disabled Children). Unpublished PhD Thesis, University of Trakya, (2007)

ICAISC 2008

11

15. Winograd, T.: Understanding Natural Language. PhD thesis, Massachusetts Institute of Technology, (1972)

An NLP-based 3D Scene Generation System for ...

An NLP-based 3D Scene Generation System for ...

Suggest Documents

Screen 3D Scene ... - CiteSeerX

Real-time Automatic 3D Scene Generation from ... - Computer Science

Real-time Automatic 3D Scene Generation from ... - Computer Science

extraction of 3d scene structure from a video for the generation of 3d ...

3D Unsharp Masking for Scene Coherent Enhancement

3d visualisation of underground pipelines: best strategy for 3d scene ...

3D Scene Manipulation with Constraints

PERFEX: AN EXPERT SYSTEM FOR INTERPRETING 3D ...

An Attention-based System Approach for Scene ... - Semantic Scholar

An Attention-based System Approach for Scene ... - Semantic Scholar

Adaptive Pattern Recognition System for Scene

Semantic Scene Detection System for Baseball ...

natural language input for scene generation - Association for

Color Constancy using 3D Scene Geometry - CiteSeerX

Unconstrained vs. Constrained 3D Scene Manipulation - CiteSeerX

Constraint Based 3D Scene Construction Abstract - Wolfgang

3D scene reconstruction and parametrization from

Using Ignorance in 3D Scene Understanding

realistic 3d scene reconstruction from unsconstrained and ...

BLENDER TUTORIAL â 3D COSMOS SCENE AND

Crime Scene Representation (2D, 3D ... - Semantic Scholar

realistic 3d scene reconstruction from unsconstrained and ...

3D DIGITAL RELIEF GENERATION

Stereoscopic Scene Flow Computation for 3D Motion Understanding

An NLP-based 3D Scene Generation System for ...