Int J CARS DOI 10.1007/s11548-008-0251-4
ORIGINAL ARTICLE
An interactive 3D framework for anatomical education Pere-Pau Vázquez · Timo Götzelmann · Knut Hartmann · Andreas Nürnberger
Received: 10 January 2008 / Accepted: 17 June 2008 © CARS 2008
Abstract Object This paper presents a 3D framework for Anatomy teaching. We are mainly concerned with the proper understanding of human anatomical 3D structures. Materials and methods The main idea of our approach is taking an electronic book such as Henry Gray’s Anatomy of the human body, and a set of 3D models properly labeled, and constructing the correct linking that allows users to perform mutual searches between both media. Results We implemented a system where learners can interactively explore textual descriptions and 3D visualizations. Conclusion Our approach allows easily performing two search tasks: first, the user may select a text region and get a view showing the objects that contain the selected structures, and second, using the interactive exploration of a 3D model
P.-P. Vázquez (B) Modeling, Visualization and Graphics Interaction Group, Universitat Politècnica de Catalunya, Campus Nord, Edifici Omega. C/ Jordi Girona, 1-3, 08034 Barcelona, Spain e-mail:
[email protected] T. Götzelmann Graphics and Interactive Systems Group, University of Magdeburg, Magdeburg, Germany e-mail:
[email protected] K. Hartmann Graphics and Interactive Systems Group, University of Applied Sciences, Flenshburg, Germany e-mail:
[email protected] A. Nürnberger Information Retrieval Group, University of Magdeburg, Magdeburg, Germany e-mail:
[email protected]
the user may automatically search for the textual description of the structures visible in the current view. Keywords Computer-assisted instruction (LB 1028.5) · Medical educational materials (W 18.2) · Computer graphics (T 385)
Introduction There are many fields in medicine where understanding 3D spatial relationships is very important. Text books usually provide information in the form of illustrations and text, because both of them provide complementary information: while complicated processes and contextual information are predominantly described textually, illustrations are more suitable to provide information about visual and spatial attributes of complex-shaped objects. However, providing a large number of illustrations is very costly, and usually, illustrators try to minimize the number of pictures by including a lot of information into each of them. This makes the process of understanding actual 3D positions of the structures quite difficult. On the other hand, the simple exploration of a 3D model does not provide enough information on the structures being observed. Our proposal tries to combine the strengths of both media: text, and 3D models. On the one hand, we have a textual description of the anatomy for certain parts of the body, and on the other we have a 3D model representing that part. In order to facilitate the exploration and to aid the learning process, we link the information provided by the 3D model and the text through the use of labels. Our application allows querying the 3D model through the selection of a certain text, and querying a text through the selection of a certain view of the 3D model. As a result, when the student is reading the text book, the selection of a word or paragraph that
123
Int J CARS Fig. 1 Overview of the application
contains elements he or she finds interesting leads to automatically loading the corresponding 3D model that contains the structures of interest, which can be then inspected interactively. Conversely, a 3D view of a model may be chosen by the user, and, under request, the system looks up the paragraphs where the structures visible from the selected view are described. This paper is an extension of papers [2] and [3], where the techniques of correlating text and images were presented, and also presents the application features that may be interesting for learning Anatomy concepts. Figure 1 shows an overview of the application, which has two main regions: on the left we have the textual information, and on the right a 3D model that can be interactively explored with the use of the mouse. In Results, we give further details on the architecture of the application and the different ways to use it.
this often results in a high number of labels to be inserted. But the limited cognitive capacity makes both search tasks mentioned in the last paragraph more complicated. Furthermore, learners often have to understand complicated spatial configurations—but illustrations depict 2D projections of 3D objects. Therefore, online versions of these tutoring materials offer new possibilities to support learners: efficient retrieval mechanisms ease the access to sections containing information required for user-specific tasks in large documents while interactive 3D visualizations support the mental reconstruction of complex spatial configurations. Moreover, a dynamic layout of secondary elements can make illustrations more effective.
Illustrations and labels
This paper proposes a novel technique to coordinate the content of visual and textual elements in online tutoring systems. As the majority of the application domains define a standard terminology, unified retrieval methods are applied to a large text corpus and annotated 3D models. User interactions on both media—text and graphics—initiate queries to an information retrieval system; the retrieval results are then presented to the learner. Our system addresses both main search tasks of text ↔ image relations. Users can select text passages to obtain corresponding renditions of 3D models. The relevance of terms and their corresponding visual objects are considered to determine good views on 3D models. As a single view might not sufficiently correspond to the content of text paragraphs, these 3D visualizations can be explored interactively. User selected points of 3D views are considered as queries; corresponding text passages of the underlying
The key issue of our approach is the use of secondary elements in illustrations, such as textual annotations or labels, as they help to establish links between both media. Annotations support two search tasks: readers may use them in order to determine the corresponding visual object for unknown terms in the text; but the content of textual annotations also provides links to more elaborate descriptions for visual objects in corresponding text passages (see Fig. 2). To focus the attention of the user on salient objects, experienced illustrators also utilize many graphical abstraction techniques. The creation of expressive illustrations that carefully reflect the subject matter of the corresponding text is very expensive. In order to limit the printing costs, illustrators aim at minimizing the number of illustrations. Nonetheless,
123
Objectives
Int J CARS Fig. 2 Media coordination in tutoring material: the most relevant objects for a learning task are emphasized in descriptive texts and illustrations (Source: Tortora [1])
text corpus are automatically obtained and presented to the learner.
Related work This section presents related work on text illustration and viewpoint selection for medical models. Finally, we present our 3D framework for anatomy learning. Text illustration The automatic generation of technical documentation with coordinated content in several media (e.g., text, images, and animation) was pioneered by the COMET [4] and WIP systems [5]. These research prototypes aim at adopting the content of online documents to user specific tasks, preferences, or contextual requirements. Moreover, coherent multilingual variants should be generated from a single source. Therefore, a multitude of planning mechanisms is used in the automatic content selection and in media-specific content generators. But the large number of required formal representations, as well as the need to approve all document variants according to legal aspects, prevented the application of this powerful technique. Based on the observation that authors of scientific textbooks, technical documentations, or maintenance manuals reuse and improve large text databases with a standardized domain-specific terminology, several researchers developed (semi-)automatic systems to illustrate online texts. In order to select appropriate 3D models and to control render parameters for expressive visualizations, the TextIllustrator [6] employs a database of annotated 3D models. This system extracts exact matches between terms in the document and media descriptors and highlights those objects that are mentioned in the visible text portion in a 3D visualization. This illustrative browsing metaphor enables a quick access to large scientific texts but fails to detect morphologi-
cal and syntactical variants of terms, synonyms, and implicit semantic associations between domain objects. Therefore, the Agile system [7] integrates a morphological analysis and a shallow syntactic text parsing. Moreover, the Agile system incorporates a formal domain representation with semantic networks and thesauri for media-and language-specific realizations of formal concepts. In contrast to multimedia generation systems, Agile only requires partial formal representations. Moreover, for some application domains like human anatomy, the number of domain concepts is rather limited. Semantic associations between domain objects can be inferred without an in-depth analysis of the syntactic or semantic structure of the text. To implement this, inference mechanisms based on graph structures were applied to the same underlying semantic network: Schlechtweg and Strothotte [8] employed degree-of-interest functions [9] while Hartmann et al. [7] used spreading activation [10]. The Open Mind Common Sense project by Lieberman et al. [11] proved the wealth of these inference mechanisms on large semantic networks that have been automatically extracted from corpora. Recently, Götze [12] presented a flexible authoring tool which employs online search engines for images and 3D models. In this interactive system the domain experts select those illustrations which are appropriate in the current context. In order to adjust the retrieved results to another context, often additional textual or visual elements have to be integrated into the final illustrations. Therefore, Götzelmann et al. [13] integrated layout algorithms to enhance 2D images and 3D visualizations with secondary elements. Moreover, textual labels are considered as an implicit description of the content of visual elements and might even reflect their intended usage. The novelty of our approach is twofold: (1) it is fully automatic, and no user intervention is required for the indexing process, and (2) to our knowledge it is the first approach that uses arbitrary views, selected by the user, to query for textual information.
123
Int J CARS
Good view selection There are two main approaches for good viewpoint selection for medical models: uninformed and informed methods. Uninformed methods determine good views only using topological properties of the geometric object or evaluating visible or occluded geometric features. The proposed measures usually work on either the projected areas of visible faces, or the number of faces visible from a viewpoint. One of the proposed measures is due to Vázquez et al. [14], where the authors define the viewpoint entropy (H(V)) as the measure of the amount of geometric information captured from a view. They build this metric using Shannon entropy as a basis. It is defined as the following sum over all visible faces: (1) H (V ) = (Ai /At ) log (Ai /At ) where Ai is the relative size (solid angle) of face i, At the total area of the sphere of directions around the point, and the sum is computed over all visible faces including background. The maximum entropy is obtained when all visible faces are seen under the same solid angle. By convention, and for continuity, 0 log 0 is taken to be 0. Informed approaches take advantage of the knowledge of the scene. Viola et al. [15], for example, used contextdepended relevance values to determine appropriate views on volumetric data sets. Recently, Mühler et al. [16] presented a system, which employs an exhaustive pre-process to determine several view-dependent quality measures and to weigh their influence during run-time. All these systems as well as ours incorporate the viewpoint entropy as one parameter in a complex view metric. As these methods were not tailored to link textual and visible information, none of these approaches are able to perform text-to-image and image-to-text queries the way we present in this paper.
is the one that better shows the structures marked in the text. In the same way, once the user has finished exploring a 3D model, he can ask the application to provide textual information on the structures appearing on the rendering window. The application then queries the textual database and issues a result list of paragraphs where the structures appearing on the rendered illustration are explained, that is, the paragraphs where the significance of the structures is higher.
Application layout In this section, we present the work space of the application. In Fig. 1 we already presented a snapshot of our application, and now we explain its main parts. The work space is divided into three subparts: the rendering window (right), the text window (left) and the menu toolbar (bottom). The rendering window shows the current 3D model being analyzed and permits the interactive exploration through the use of the mouse. The user can rotate, zoom, and translate the 3D model easily (see Fig. 3). Labels indicating the parts of the model may be enabled or disabled in order to facilitate the understanding of the underlying 3D geometry. The text window shows the educational text book. In our case, the application loads the electronic version of the well known Henry Gray’s Anatomy of the human body [17]. The user may freely navigate through the book and read the descriptions and select the parts of the text he is interested to see in 3D (see Fig. 4). The toolbar menu (Fig. 5) is used to perform the different additional operations of our framework. It allows loading new
Materials and methods In this section, we present the different features of our framework. Framework We have implemented an interactive application, where users can interact with elements of both media (text and graphics). These interactions are transformed into search tasks. When the user selects a region of text, the words contained in the selection are transformed to a query to the model database. As a result, the rendering window shows the model of the database that contains the elements (or a subset of them) mentioned in the text. Moreover, the rendered view of the model
123
Fig. 3 Rendering window. The user may freely explore the 3D model
Int J CARS Fig. 4 The electronic version of the educational book. The user may read it and eventually select the concepts he or she wants to see in 3D
Fig. 5 Menu toolbar: it allows to load new models or documents and to index them. It is also used to perform document search
models and indexing them in order to make them available for the search process.
reading the book and looking at a 3D view of the concepts mentioned in the text, and the second consists of looking for textual descriptions of certain structures being visualized.
Mutual search and data inspection Once the application has been started, it automatically opens the textbook Gray’s anatomy book and a 3D model. At this moment, the interactive learning support starts. There are two ways of using our application: the first one consists of
Text to model search In the first case, we have a process similar to the classical learning steps a student would carry out. However, there is an important difference: instead of having the limited number of pictures that usually appear in the book, we have a set of 3D models that can be inspected
123
Int J CARS
from arbitrary points. This means that we are not restricted to a set of pictures that will usually emphasize just a few elements. When the student wants to see the placement of certain structure, he only has to select it on the text. Our application automatically shows the adequate 3D model that contains the chosen structure. Moreover, the model is rendered in such a way that the structure chosen by the user is shown properly. The following example shows how this process works. While the students are reading a chapter devoted to the aorta, they decide if they would like to know where it is placed and what does look like. The users only need to select the text segment they are interested in with the use of the mouse (see Fig. 5). In this case, the user selected Pulmonary artery. If a matching is achieved, the application automatically determines corresponding 3D models from the previously indexed files. Subsequently, it loads the model and shows the best view that reveals the information related to the concept that the user highlighted in the text (see Fig. 6). In order to help the student get the maximum information contained in our database, the query generated through the selection process is matched with the complete set of index files. If we have a large amount of models, it may happen that the same structure may appear in more than one model. In this case, the application provides the user not only with a view of the object that optimally matches the search, but also with a list of three alternatives that represent the selected text (as shown in Fig. 6). On the left we see two snapshots of the heart model, together with a snapshot of a vascular system model. Note that our application searches not only for a model containing the chosen information, but also the correct view that optimally shows this information (see the next section). This is the reason why the search result may originate multiple views of the same model. These three snapshots can be used to select one of the alternative models or views. If the user clicks on one of them, the model is loaded and the view is shown. On the top left of the rendering window we see a colored sphere. This sphere shows a temperature map that indicates which parts of the model are more related to the search performed. Warm colors (yellow) indicate regions that show the information looked for in a better way, while cold colors (blue) indicate the regions where the model does not show information about the structure indicated by the user. This means that if we place the camera in any of the directions indicated by warm colors, we are going to be able to see part of the information indicated on the text. This can be seen in Fig. 6, where the user selected “pulmonary Fig. 6 Text selection of a concept of interest
123
artery”, and the system automatically generates a new view (see Fig. 7) where the selected information is shown and a label indicates the position of the artery. Nonetheless, the rendering window in Fig. 7 also provides a temperature sphere that indicates with warmer colors the viewing directions that better show the structures selected in the text. Model to text search The second way of searching information is the following: when the student is inspecting a 3D model he or she may be interested in the structures being shown. The only thing that the student has to do is to click on to the Find corresponding paragraphs button. The application then builds a descriptor from the elements that are visible in the current view and compares the vector with the vectors which describe paragraphs of the documents (see the following section). The vectors are reverse ordered according to their similarity and the result is shown on the left side with a formatting similar to Google searches’ results, that is, a link to the paragraph followed by some lines of text of the referenced paragraph (see Fig. 8). Query management The key of our approach is the novel use of Information retrieval methods and view selection algorithms. Information retrieval In contrast to query languages that are designed for large structured databases (e.g., SQL), information retrieval (IR) has to cope with partial matches between terms in the query and those used within the documents in a text database. Therefore, text retrieval techniques are based on similarity measures for documents and employ different strategies to rank the retrieved results. For an efficient representation of a large corpus of documents, the standard vector space model [18] is usually employed. In the vector space model, both the query and the documents are transformed into a vector representation. Based on this vector representation IR systems can also be used to determine similarities between the query and documents in databases. While these systems efficiently handle terms in natural language, the search for semantic concepts is still very limited. In order to reduce the dimension of the vector space, some standard preprocessing methods can be applied: very frequent words (stop words) are not considered, and simple
Int J CARS
Fig. 7 Result of the searching process
transformation rules aiming at normalizing morphological variants of words are applied to the remaining terms. Text indexing Before search engines can retrieve search results, the source documents have to be indexed. We are using a popular text retrieval engine: LUCENE, which processes text documents. In our approach, we extract textual descriptors for the source documents (text of books and 3D models) which specify them. We break up the documents into a finer granularity— text documents are segmented to paragraphs, and views of 3D models are sampled from a defined number of camera positions on their bounding sphere. The extraction of the paragraph descriptors is performed as follows (see also Fig. 9): first, a simple text parser (1) breaks up the documents into single paragraphs. Subsequently, each of these paragraphs are (2) processed by another text parser, which removes unne-
cessary stop words (e.g., ‘a’, ‘and’, ‘with’) and writes the remaining terms into the paragraph descriptor (Fig. 9). Finally, we sort the set of the remaining terms, and obtain an index (or dictionary) defining a linear order over the set of terms used in the text documents. Then, based on the obtained dictionaries we compute the document vectors for each document. Each document vector d specifies weights wd for all terms t in the dictionary for a given document d. In order to obtain descriptive weight vectors, we apply a standard measure [19] that considers both the frequency t f d of the term t in the current document d as well as its frequency in the respective database, where N denotes the size of the document collection C and n t refers to the number of documents in C that contain term t. wdt = t f dt · log(N /n t )
(2)
These document vectors are created for each document for the databases containing the text documents D as well as for
123
Int J CARS Fig. 8 Results of the model to text search. The matching paragraphs are ranked in descending priority order, and the user may view the whole paragraph and the part of the book where it is placed by clicking on the title of the paragraph
Fig. 9 The extraction of paragraph descriptors
123
Int J CARS Fig. 10 The extraction of view descriptors
3D models M and, if necessary, for a given query vector q. In contrast to the standard model, we additionally integrate a boost function, which enables us to increase or decrease the importance of a specific term during user interaction by modifying the document weight for a term t, where boost(t) < 1 → de-emphasized term t; boost(t) = 1 → normal;
(3)
boost(t) > 1 → emphasized term t; The boost function is fed with the results of the parser that detects emphasized words in the paragraphs and marks these words in the paragraph descriptor with a higher importance value, which is recognized by the text retrieval indexer. Thus, the original equation of [19] is extended to: wdt = boost(t) · t f dt · log(N /n t )
(4)
This boost function allows us to describe the content of both the text documents and 3D models more exactly by their descriptors. 3D Model indexing The same procedure is applied to textual annotations associated with geometric objects in order to construct the index for 3D models. In this case, the different levels of granularity are model, view, and view descriptor. In order to build the textual description of a model a set of views around the bounding sphere of the model are analyzed, and the descriptors of the visible structures from each view are extracted and stored. This can easily be achieved by rendering a color-coded view, where each of the labeled structures is assigned a different color, and analyzing the view afterwards. Our system allows the direct manipulation of textual annotations (add, modify, etc.) for geometric objects within the 3D visualization of 3D models. This information is stored in an
annotation table that links unique color codes for geometric components to technical terms. For a given 3D model, a number of sample views located on an orbit are analyzed with color-coded renditions. We determine the visibility of their components as well as their size and position on the projection. Finally, we construct a view descriptor that contains reference terms for all visible objects, that specifies their relative size and a measure that specifies how centered these objects are located on the projection area (see Fig. 10). The search process Our system allows learners to explore comprehensive tutoring materials stored in multi-modal databases in an interactive browser. In order to coordinate the content presented in text and graphics, user interactions are transformed into queries to an information retrieval system. Two pre-computed data structures enable us to find corresponding text descriptors and view descriptors in real-time. Subsequently, a text browser and a 3D browser are used to present the retrieved results (see Fig. 11). This section highlights the need of adaptive and coordinated multimodal presentations in our application domain. As in many scientific or technical areas, students of human anatomy have to learn a large number of technical terms. Moreover, anatomy textbooks focus on descriptions of geometric properties: chapters on Osteology, for example, contain descriptions of characteristic features of complex-shaped bones, chapters on Myology employ the features of these bones as landmarks to describe the course of muscles, and Syndesmology explains the direction of movements in joints. These examples illustrate (1) the wealth of annotated illustrations needed to learn a domain-specific terminology, (2) the relevance of basic learning tasks in anatomy for almost all scientific or technical domains, and (3) the need to complement textual descriptions with expressive illustrations. The integration of a real-time label layout system and the automatic
123
Int J CARS Fig. 11 Search tasks: Text ↔ Illustration
selection of appropriate 3D models and views from a database in our system aims at supporting each of these learning tasks. Text → view queries Medical students can select text segments in descriptive texts or textual labels in the 3D visualization where they need additional background information (see Fig. 9). These interactions raise the demand to search for corresponding textual descriptions and related illustrations. The content of user-selected text fragments are transformed into query vectors. The way we build the descriptors from paragraphs has no novelty compared to the Information Retrieval literature, and is performed as presented in previous paragraphs. The view index allows retrieving views on 3D models that correspond to the most relevant terms in the query. The best fitting 3D model in the best view is loaded and presented immediately; the remaining good views of other 3D models are presented in small overview windows on the side of the 3D screen; they are loaded if they are selected by the learner. To support the interactive exploration of a 3D model, the quality of views with respect to the current context defined by the query terms is presented on a colored sphere. Moreover, the label layout within these computergenerated illustrations should reflect this new context. Hence, only relevant graphical objects are annotated. Therefore, the 3D visualization component of our system integrates a realtime layout for secondary elements that aims at minimizing the visual changes during user interactions (see [20]). Finally, the renderer employs transparency to de-emphasize unimportant objects. View → paragraph queries As many learners prefer visualizations to explore complex spatial configurations, our system also supports ‘visual’ queries. Contrary to the text indexing process, building textual descriptors from 3D renderings is a less explored path. We use the information on visible objects, their relative size and the viewpoint entropy to construct a query vector as described next.
123
Our approach uses a set of views placed around an object and analyzes the structures visible in them (see Fig. 12) by rendering each structure in a different color. For each structure, we need an importance factor that behaves similarly to the boosting functions in text analysis, that is, in each view descriptor we must give higher importance to the objects that are better captured, and the overall information revealed by that image. This is carried out in two different ways: first, the structure position and size in a certain view is determined, and we give higher importance to objects that are placed near the center of the view and with a larger size (see Fig. 13). Second, we measure the average amount of information shown in the view by means of the Viewpoint entropy measure [14].
Fig. 12 Evaluation of the relative size and centricity of the different structures
Int J CARS
Fig. 13 3D model analysis
In order to validate the usefulness of our importance measurement method, we have performed a user study. Using this view descriptor the paragraph index is used to determine appropriate text segments for user defined views. The search result is sorted according to the relevance and presented to the user. User-selected paragraphs are then presented in their original context of the chosen document.
the images, thus, those results were excluded from the study to remove distortion by uncertainty. The first part p1 of the study consisted of six single tests (t1 − t6 ). In order to avoid distortions the tests were extremely simplified, and presented a cubical arrangement of spheres. It evaluated which of the criteria (relative size or centricity) humans consider as superior for a ‘good view’ and to which extent. To determine the three alternatives, we used the measures of visible relative size and centricity, as described previously. The users had to select the view they preferred in terms of visibility of the green spheres (see Fig. 14). The second part p2 consisted of six tests (t7 − t12 ) to investigate text to image correlations with a well known 3D object. Since the approach is designed for tutoring purposes in general (i.e. in the fields of medical, natural science, engineering, etc.) we chose a model of a BMW Z3 car, as most subjects know its interior parts by their names. The tests (t7 − t9 ) assessed whether our approach algorithmically determines the same views the users would select for a given text (see Fig. 15). In the tests (t10 − t12 ), the users were presented a specific view of the car and they had to select one out of three alternative textual descriptions in order to find out if the proposed algorithm correlates with the users selections (see Fig. 15). Study results
Results We have built the software presented in this paper and shown it to different users and to medical doctors. As mentioned, the text retrieval system has no novelty compared to current literature, but the way we index views is original. Therefore, we studied the validity of the approach by performing a user study. The details are provided next. Subjects and design We designed the test in three different languages (Catalan, English, and German); the user study was carried out with 115 subjects. The subjects (47 females, 68 males) were subdivided into one test group g1 (76 participants) without prior knowledge of 3D software (e.g., 3D games, 3D modeling software), and one test group g2 (39 participants) with knowledge in that field. Both test groups had to solve the same blind tests.
In the tests 1–6, we calculated the value of both measures (visible relative size and centricity). Since the users chose one favored alternative, we were able to determine if the results of both measures corresponded to the preferences of the users. In the tests 7–9 we indexed 100 views of the car model. Then, the system was queried by 2–3 technical terms of well-known car components (e.g., steering wheel, leather seats). The top scored result, together with two lower scored results, were presented to the user in a random order. In the tests 10–12, we indexed three different descriptions (each with 2–3 technical terms of well-known car components) and queried the system with a specific view of the car, in order to find out which of the descriptions optimally fits to the image. The top scored description, as well as two lower scored descriptions were presented to the user in random order.
Materials and apparatus The user study consisted of 12 single tests, where the subjects had to choose one out of three alternatives. The test was presented in a printable document, but could also be performed on a computer monitor. The users were also asked if they had problems in understanding the test or recognizing
Fig. 14 User study question. The users have to decide which of the distributions of green spheres is better
123
Int J CARS Fig. 15 User study question. The users had to select which of the cars shows the information asked for in a better way
Hypothesis 1: Subjects with knowledge of 3D software select different images than those without. Since the approach deals with 3D objects, we first determined if participants with knowledge of 3D software (3D modeling, 3D games) have different preferences in the selection of 3D views than those without. The majority of the subjects of both groups g1 and g2 chose the same alternatives. We also applied Student’s t test to get evidence if the percentages differ significantly. Statistical significance cannot be assented with a result ≥ 5% (P ≥ 0.05). The t test neither determined a significant difference for P1 (F = 0.062, P = 0.808) nor for P2 (F = 0.278, P = 0.610). In the following, we merged the groups g1 and g2 in a single group g. Hypothesis 2: Relative visible size is more important than centricity. Part p1 aimed at determining the influence of both measures: relative visible size m 1 and centricity m 2 . We used the χ 2 test; since 115 subjects contributed to the user study and each of the tests had three alternatives, there was an expected frequency of 38.333 of choosing an alternative. For statistical significance χ 2 has to be ≥ 5.99 (a = 0.05). Figure 16 shows the answers to the first group of questions. In order to test m 1 , t1 and t2 equalized m 2 of the important objects, while m 1 differed. There was a strong significance for m 1 (t1 : χ 2 = 169.793 and t1 : χ 2 = 184.976). In t1 90.43% chose the alternative with highest value for m 1 , while 93.04% did so for t2 . For
Fig. 16 The answers given by the users to the six (t1 − t6 ) first questions
123
testing m 2 , the tests t3 and t4 had equal m 1 of the important objects, while m 2 differed. There was a significance for m 2 (t3 : χ 2 = 17.965 and t4 : χ 2 = 40.192). In t3 only 48.7% chose the alternative with highest value for m 2 , while 59.13% did for t4 . Thus, m 2 is significant, but seems to be much less important than m 1 . Finally, the t5 and t6 tested m 1 and m 2 against each other. m 1 had a higher support (t5 : 73.04% and t6 : 88.70%) than m 2 (t5 : 19.13% and t6 : 9.57%), while the remaining subjects chose the alternative, where neither m 1 nor m 2 was maximized (t5 : 7.83% and t6 : 1.74%). As we assumed, relative visible size is clearly more important than centricity. Thus, the following tests were carried out with an unequal weight on both measures m 1 and m 2 (5:1). Hypothesis 3: The approach presents desired views for a given paragraph. In the tests t7 − t9 we presented a set of common technical terms of the car and showed three alternative views of the car. The user had to select the one where the named objects were optimally visible. The higher the computed relevance of our algorithm was, the more users preferred the view. Most subjects chose the highest ranked alternative (t7 : 86.09%, t8 : 84.35%, t9 : 89.57%). At a relevancy of less than 0.8 only a few subjects preferred the view, but with a relevancy greater than 0.8 the preference increased exponentially. Thus, this seems to be a rough measure for a cutoff
Int J CARS
Fig. 17 Distribution of answers to questions t7 – t12
of retrieval results. The results of these questions are shown in Fig. 17. Hypothesis 4: The approach presents desired paragraphs for a given view. In the tests t10 − t12 we presented an image of a car and showed three alternatives with multiple common technical terms of the car. The subjects had to choose the alternative that optimally describes the image. Like in the previous test, the higher the computed relevance of our algorithm was, the more users preferred the view. Most subjects chose the highest ranked alternative (t10 : 100.00%, t11 : 95.65%, t12 : 95.65%). Again, a cutoff of 0.8 for the relevancy of retrieval results seems to be a reasonable threshold. First, the study showed that there is no difference in the preference of 3D views for subjects with or without prior 3Dcomputer knowledge. Both the tests for the relative visible size and the centricity were significant, while the relative visible size was much preferred over the centricity. In the tests for text-image queries as well as image-text queries, the relevance determined by our algorithm correlated with user preference. In each of the tests more than 80% of the subjects chose the top ranked view. Since with a relevance value less than 0.8 only a few subjects chose the view, this value seems to be a valid cutoff of the retrieval results.
text is one of our contributions. Therefore, we carried out an experiment that determines that the decisions taken in order to build the text descriptors of 3D views are correct. We have shown the application to users and, concretely, we showed it working to a couple of medical doctors, who found it very interesting for educational purposes. Our application currently uses the electronic version of Gray’s Anatomy book and the 3D models of the Viewpoint library. We want to extend it by adding labeled versions of other parts of the body extracted from medical data recently captured or from the Visible Human Project. Moreover, some extensions are possible: first, we would like to improve the text processing by detecting synonyms or name constructions (i.e., names and adjectives) that may have the same meaning. Although some work should be done in order to improve the text processing, the required knowledge is already present in Information Retrieval or Automatic Summarization literature. Furthermore, the current version only supports 3D models as visual information. If we had a database of images, we could also use them to illustrate queries. The only requirement would be the adequate annotation of such images.
References Conclusions and future work In this paper, we presented a framework for the support of Anatomy teaching. In our application, the textual information and the 3D information are linked through labels and therefore the user may query a 3D model database in order to be able to see and explore the anatomical structures described in the text. Moreover, the educational treaty can also be queried by the use of the 3D model. This permits a better understanding of the 3D relationships of the anatomical structures and eases the learning process by giving the user freedom to jump from one structure description to another following what appears on screen. The text processing presents no novelty compared to the current literature, but using 3D views to perform searches in
1. Tortora GJ (1997) Introduction to the human body: the essentials of anatomy and psychology. Benjamin Cummings, Menlo Park 2. Götzelmann T, Vázquez P, Hartmann K, Germer T, Nürnberger A, Strothotte T et al (2007) SIGGRAPH proceedings of the 23rd international spring conference on computer graphics 3. Götzelmann T, Vázquez P, Hartmann K, Nürnberger A, Strothotte T (2007) Correlating text and images: concept and evaluation. Lecture notes in computer science: 7th international conference on smart graphics. Springer, Heidelberg 4. Feiner SK, McKewon KR (1993) Automating the generation of coordinated multimedia explanations. In: Maybury MT (ed) Intelligent multimedia interfaces. AAAI Press, Menlo Park, pp 117–138 5. Wahlster W, André E, Finkler W, Profitlich HJ, Rist T (1993) Planbased integration of natural language and graphics generation. Artif Intell 63:387–427. doi:10.1016/0004-3702(93)90022-4 6. Schlechtweg S, Strothotte T (1999) Illustrative browsing: a new method of browsing in long on-line texts. Int Conf Comput Human Interact 466–473
123
Int J CARS 7. Hartmann K, Strothotte T (2002) A spreading activation approach to text illustration. In: 2nd internationl symposium on smart graphics, pp 39–46 8. Schlechtweg S, Strothotte T (2000) Generating scientific illustrations in digital books. In: AAAI spring symposium on smart graphics, pp 8–15 9. Furnas GW (1986) Generalized Fisheye views. In: Conference on human factors in computing systems, pp 16–23 10. Collins AM, Loftus EF (1975) A spreading-activation theory of semantic processing. Psychol Rev 82(6):407–428. doi:10.1037/ 0033-295X.82.6.407 11. Lieberman H, Liu H, Singh P, Barry B (2005) Beating common sense into interactive applications. AI Mag 25(4):63–76 12. Götze M, Neumann P, Isenberg T (2005) User-supported interactive illustration of text. Simulation und Visualisierung 195–206 13. Götzelmann T, Götze M, Ali K, Hartmann K, Strothotte T (2007) Annotating images through adaptation: an integrated text authoring and illustration framework. J WSCG 15:1 14. Vázquez P, Feixas M, Sbert M, Heidrich W (2001) Viewpoint selection using viewpoint entropy. In: Vision modeling and visualization conference, pp 273–280
123
15. Viola I, Feixas M, Sbert M, Gröller M (2006) Importance-driven focus of attention. IEEE Trans Vis Comput Graph 12(5):933–940. doi:10.1109/TVCG.2006.152 16. Mühler K, Neugebauer M, Tietjen C, Preim B (2007) Viewpoint selection for intervention planning. In: EG/IEEE-VGTC symposium on visualization 17. Gray H (1918) Anatomy of the human body, 20th edn. Lea & Febiger, Philadelphia 18. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/ 361219.361220 19. Salton G, Allan J, Buckley C, Singhal A (1994) Automatic analysis, theme generation, and summarization of machine-readable texts. Science 264:1421–1426. doi:10.1126/science.264.5164.1421 20. Götzelmann T, Hartmann K, Strothotte T (2006) Agents-based annotation of interactive 3D visualizations. In: 6th International symposium on smart graphics, pp 24–35