Shape-Based Retrieval of 3D Models in Scene Synthesis - CiteSeerX

Shape-Based Retrieval of 3D Models in Scene Synthesis Ioannis Andreou Technology Education and Digital Systems Dpt. University of Piraeus Greece

Nikitas-Marinos Sgouros Technology Education and Digital Systems Dpt.

University of Piraeus Greece [email protected]

[email protected] Abstract - This article describes SR-3DEditor, a complete system for creating, editing and displaying 3D scene that incorporates original technologies for fast 3D scene synthesis. The system allows the user to draw outlines of 3D shapes on the computer screen using the mouse cursor. At any time during this process the user can query a 3D model database for models with similar outlines and then select the ones s/he thinks are more relevant. The system can then automatically calculate the parameters that position the selected model in the scene, so that its outline fits the shape sketched by the user. Evaluation results demonstrate that the users are satisfied by the usability and efficiency that is added by the automatic selection facility. SR-3DEditor is freely available for download and experimentation. Keywords: Shape Similarity, 3D Model Retrieval

1

Introduction

Recently there has been a growing need for visual information retrieval technologies, i.e. systems that provide query and retrieval facilities for visual media, based on visual content. These systems need to allow the user to sketch the major features of an image and submit the resulting drawing as a query for similar images in an image database. This paper describes SR-3DEditor, a complete system for fast 3D scene synthesis that utilizes ShapeBased Retrieval technologies. The system allows the user to draw outlines of 3D shapes on the computer screen using the mouse cursor. At any time during this process the user can query a 3D model database for models with similar outlines and then select the ones s/he thinks are more relevant. The system can then automatically calculate the parameters that position the selected model in the scene, so that its outline fits the shape sketched by the user. There are many advantages associated with the use of SR-3DEditor: a) The user does not have to search through a database of maybe thousands of 3D models to select the models s/he is interested in. b) Creating storing/loading a 3D scene is very simple as the file format for storing scenes is XML, and so it can be easily processed e.g. with an XSLT script. c) The creation of a 3D scene that refers to a particular theme can be speeded up and made more robust by using a 3D model database that was specifically created for this theme.

d) SR-3DEditor can create its own model databases or generate them from previous databases or 3D-scenes, enabling reuse and exploitation of previous work. e) Automatic positioning of 3D models in the scene simplifies the editing and management of 3D scenes. f) The most important advantage of this retrieval method is that retrieval by content allows for automatic creation of a sufficient database, i.e. no kind of human intervention is required to manage the database contents (e.g. to provide metadata elements), since multimedia data are now automatically analyzable. SR-3DEditor employs a shape matching and retrieval engine that retrieves relevant 3D models (for human-drawn sketches or imported sketches) using a content-based shape matching method. Whenever the user wants to add a new 3D model to a scene, s/he draws on the screen a polygon that matches the outline of a view of the desired model. The engine is then queried and the retrieved results consist of polygons and corresponding 3D models from which they had been previously extracted. The system can place a retrieved 3D model in front of the user, using appropriate rotation and translation transformations, so that the outline of that model can be properly aligned with the polygon sketched by the user. SR-3DEditor is self-sufficient in terms of shape/model database management as it provides its own facilities for extracting 2D shapes out of 3D models and for production and management of the database files. The rest of the paper is organized as follows. Section 2 presents related work, while section 3 describes the process used to extract 2D polygons by rendering a 3D model. Section 4 describes the 3D scene architecture and section 5 presents the implementation of the tool. Section 6 presents a user evaluation. Finally, section 7 draws overall conclusions.

2

Related Work

During the last few years there has been great progress in the area of content based multimedia retrieval. Some of the new technologies for retrieval by content are [2, 3, 4]. Some applications have also included sketch-based retrieval of media: [6, 7, 8, 9]. QBIC [10] exploits techniques for shape-based multimedia retrieval. A technique for retrieving 3D models using shape distributions is described in [11]. The current period in the content-based retrieval research is signified by the applications that actually utilize these techniques to creative and productive ends.

Applications for (web-based) 3D model retrieval and multimedia authoring exploiting shape-based retrieval have emerged [12, 13, 14]. The system presented extends its 2D equivalent, SRSketch [5]. SR-Sketch allowed the users to create 2D sketches and correct them using shapes proposed by a previous version of the shape-search engine. In contrast to SR-Sketch, the actual shapes are not used by themselves or as part of an image, but a relation to another multimedia element provides the input as to the source of the information and how it should be used. To implement this relation and make the storage of data as applicationtransparent as possible, all the data formats introduced by this new framework are based on XML.

3

Retrieval of Shapes and 3D Models

The goal of SR-3DEditor is to act as a front-end to a visual 3D model retrieval system by allowing the user to draw the types of models s/he is interested in. The user simply draws a polygon on the screen and the engine is queried with it to retrieve relevant shapes and models. This section will describe the individual processes that compose the retrieval process from the moment s/he draws a polygon on-screen up till the moment that the retrieved shapes and models appear. 3.1

Preparing the query

As the user draws a shape on the screen, using mouse clicks, a vector of 2D point integer-valued coordinates is filled, representing the query shape, in terms of window client area pixel coordinates. However, in order to pass a shape as query to the shape-matching engine, it must be a closed shape with at least three vertices, as it represents the outline of a shape. If the above criterion is satisfied, the user-drawn shape goes through a preparation stage, to create a sampled version, which in turn is fed into the engine. The sampled version of the drawn shape also appears on the screen. This phase consists of the following steps: a) Convolve the polygon with a Gaussian matrix. b) Sample the polygon into N equally spaced (on the initial contour) points. N will be referred to as resolution, from now on. 3.2

Core 2D shape matching procedure

The 3D model matching method compares a 2D model outline drawn by the user with 2D outlines associated with 3D models, i.e. its main component is a 2D shape based retrieval engine. The GCV library site [15], created by the first author, provides an implementation of this engine. The shape-matching engine consists mainly of a list of shapematching methods, referred to as matchers, that are able to calculate a similarity factor between two shapes. The engine uses the matchers to retrieve the shapes that are most similar to one given shape, the query shape, according to their overall similarity scores. The proposed system extends the 2D engine with a manyto-one association between 2D shapes and 3D models, so

that when a 2D-shape is retrieved, the associated 3D model is retrieved, too. A shape-matching algorithm (matcher) is designed to: a) Produce a description of 2D shapes from them, which is their internal representation of a shape, i.e. their descriptors. b) Compare two different descriptors, and produce a comparison result. c) Produce a matching score from a comparison result in the [0, 1] range that denotes 'how similar' the original shapes, from which the descriptors were produced, are. For each matcher in the list, the matching engine maintains associated parameters, consisting of a weight and a minimum score, minScore. When the engine is triggered with a query shape, the shape descriptors for that query are constructed for each matcher in the matchers list. Then, for each shape in the database (current shape), the matchers list is iterated, so that for every matcher, the descriptor of the query is compared against the descriptor of the current shape (the descriptors for the database shapes already exist) using the matcher's internal algorithm. The comparison of the two descriptors returns a real-valued score in the [0, 1] range. This score represents the similarity of the query shape with the current database shape, in the way similarity is perceived by each matcher's internal algorithm. If this score is below the matcher's minScore, the process stops at this matcher and the final similarity score for this database shape is set to a negative value, to designate that the match between the query and the current shape has failed. A more orthogonal design would require the iteration of the matchers to continue and let the final score be computed, regardless of the partial score returned by an individual matcher, instead of interrupting the process and producing no results for a shape, because otherwise, the returned result scores are not straightforwardly comparable. Yet the proposed iteration process allows the engine to be optimized; by placing the fastest matchers first, it is possible to prune the search space to a much smaller subset during the first iteration steps, with only a small computation cost. Thus by sorting the matchers by order of their time complexity for shape descriptors comparison we achieve a significant time optimization. Anyway, by setting minScore to zero for all matchers, the engine prunes no shapes, making the overall scores reliable for straightforward mathematical comparisons. The final score for a database shape, assuming that the shape has gone through all the matchers in the list without any failures, is calculated as a weighted combination of the individual matchers' partial scores. SR-3DEditor simply uses the weighted average of the partial scores, trimmed inside the [0, 1] range. 3.3

The Turning Function Difference Algorithm

One of the matchers that are used inside the shape retrieval engine is based on the Turning Function Difference algorithm. TFD is a novel algorithm for the retrieval of similar shapes and the explanation of the

retrieved results, based on the comparison of polygonal curves. TFD is capable of fully describing the outline of a shape, in a translation, rotation and scale invariant manner. The basic similarity criterion resembles the Turning Function [16, 17], yet it is translation, scaling and rotation invariant. It is also suitable for matching open curves and very robust in dealing with noise. The extra matching information produced by the TFD method lets us produce explanatory visualizations of the match results. A small description of the workings of TFD follows. The basic metric of TFD on an input polygon (at a vertex, in a polygon traversal), is defined as the angle between the edge following the vertex and the edge before it. Consequently, this metric is the Difference in the Turning Function between successive vertices. This new definition embeds rotation invariance in the metric itself. For a traditionally defined turning function, a rotation corresponds to a selection of a different starting point. Practically, this includes at least one unnecessary subtraction of a reference value to compare to a rotated version of the turning function. Thus, for each input polygon, the descriptor of TFD is an array of N values representing the Difference in Turning Function. In order to minimize noise, the input polygon has to be smoothed and sampled before creating the TFD descriptor. The TFD matching process accepts as input two arrays of TFD values (of size N, since each polygon is closed and all shapes have the same number of vertices, due to the sampling process) that represent the outlines of the shapes to be compared. The output of the method consists of (i) the correspondence between regions of the polygons that are considered similar and (ii) the matching score. The matching score is defined as a number in the range [0, 1] that represents the proportion of the number of vertices that were matched to the total number of vertices in the shape's outline. Typical values for minScore, which is the score threshold, are (1/2) for strict selection of similar shapes, and (1/3) for retrieval of elements that simply contain similar regions. 3.4

Configuring the Shape Matching Engine

The 2D shape matching process is the most time-critical part of the system, as it is the most time-consuming interactive part of the application. The efficiency and speed of the matching process depends on the matchers, as well as the way they are used. SR-3DEditor configures the shapematching engine to use 4 different matchers, in the following order: Circularity, Convexity, Eccentricity [18] and Turning Function Difference. We chose to name the first three matchers feature matchers, as they consider only a simple feature of a shape. The shape descriptors for the feature matchers consist only of a single real value each, thus their demands in storage space and processing time during comparison are very low.

Thus these feature matchers are ideal for use in any engine based on iteration-and-elimination, such as the one proposed in this paper, especially in the beginning of the matchers list. On the other hand, TFD is slower during comparison and more expensive in descriptor storage than the feature matchers, but it is much more robust and accurate than the feature matchers for similar shapes retrieval (visit [15]). Thus by configuring the engine to use the above list we optimize it for speed, robustness and accuracy. This configuration provides results for mediumsized model databases (approximately 2000 models) in a matter of seconds. 3.5

Retrieval of 3D models using their 2D outlines

Each 2D shape of the search space is associated to a 3D model, from which it was extracted. When the most similar shapes are retrieved, the associated models are retrieved with them. SR-3DEditor needs the 2D shapes handy so they are always loaded into memory, while the 3D models, due to their demands in terms of memory, are only stored in the disk and are loaded when necessary, i.e. when they are retrieved by the engine, or when they are added to the 3D world. Thus, after the most relevant 2D shapes are retrieved, the necessary 3D models are loaded (each model is loaded only once) and are presented to the user, paired with the retrieved shapes, sorted by shape similarity score. The association of every 2D shape of the database to a 3D model is a direct product of the database production procedure. All the 2D shapes that are included in the search space were produced through a render-end-extract process which renders each 3D model on the OpenGL buffer, under different viewing transformations, produces memoryresident images that contain the results of the renderings and then uses shape extraction processes to extract the outlines of the 3D model for each of the views. Thus a new set of 2D shapes are produced, which are assigned to the current 3D model.

4 4.1

The 3D Scene Architecture

The 3D world rendered by SR-3DEditor is based on a tree structure, where nodes contain no data themselves and each of a node's children contains: a) Placement parameters, i.e. translation, rotation and scaling parameters. b) A renderable object, i.e. an object to be rendered, in this case a 3D model. c) Other miscellaneous information, such as selection state and render state, used by the rendering engine.

Figure 1. Architecture of a 3D scene This architecture has the following advantages: A) It supports grouping of objects in a tree-like structure. The renderable object at a child node could be either a model, making it a leaf in the tree, or another tree node. This way it possible to place many objects together and apply the same transformations to them, by only editing the properties of a common predecessor node. B) There is no duplication of information. Each model that is loaded from a specific file path or URL exists only once inside the scene tree. The system manages the models in a special way: the models are actually stored separately, their sources (URLs or paths) are unique, and a reference counting scheme is used for their disposal. For simplicity, only one level exists in the 3D scene used by SR-3DEditor; there is only one node in the tree (root node), the children of which contain references to models as renderable objects. The root node itself depends on a set of rotation, translation and scaling parameters that represent the camera view of the user. Each time an object that renders a model is added to the scene, a node child is added to the main node, with appropriate rotation, translation and scaling parameters, and its renderable object referring to that model. If a model with the same source as the model to be added already exists, then there is no need to reload it and the already existing one is used in the new child node. Otherwise a new model gets loaded from the disk, into the scene. Note that the model data itself remains constant in memory during its lifetime inside the scene, to maintain the scene's integrity. The following simple rule defines the interaction of the user with the 3D scene, when the translation, rotation e.t.c keys are used: When some scene objects are selected by the user (selection is denoted by a cube enclosing a 3D-model), then the transformation designated by a key is applied to the selected items and only them, i.e. the translation or rotation parameters at their nodes are affected. If no items are selected then the user view of the world is translated, thus affecting the rendering of every viewable object.

5

Implementation

5.1

Using SR-3DEditor

The 3D scene author navigates inside the Editor's 3D space by using the keyboard. Any type of movement along the X, Y and Z axis as well as rotations around them are activated through keystrokes. The same keys that move the camera are used to apply the corresponding transformations to the placement and rotation parameters of the models inside the scene, whenever at least one model is selected. Whenever a click of the mouse touches a 3D model, its selection state is toggled. These simple controls allow us to place our objects at any position or rotation in space, thus the standard features of a 3D scene editor are implemented. The model retrieval facilities of the program are available through the right button of the mouse. By consecutively using the right button, the user draws a polygon, which shall be used as query to the shape retrieval engine. By pressing 'M' the matching process begins, using the polygon that is currently drawn on the window as the query. The retrieved results are then presented at the lower viewport of the window, as pairs of models and their extracted outlines that matched the drawn shape. The user scrolls through the results using the left and right arrow keys to select the model of his/her interest. By pressing 'return' the selected model is added to scene, exactly in front of the camera. If the TFD method was used to match the selected outline, an additional rotation around the Z axis (viewing axis), which was calculated by the TFD method, will be applied to the model, so that its 2D projection on the viewport approaches that of the drawn shape. 5.2

Development and Distribution

SR-3DEditor was developed using only the C/C++ languages, and works on windows platforms with version Windows NT4.0 and later. The libxml2 library [19] was used for XML file input and OpenGL for 3D rendering. The application was implemented using a modular design. The interface and the underlying retrieval engine are separated, so that the implementation of the former does not affect the latter. Appropriate 'glue' code was used to extend the engine with 3D model information and provide the stubs that the main application and the engine use to communicate with each other. One of the advantages of using XML is that it is open for editing and management. Although the database and scene files are usually produced by SR-3DEditor, it is possible to create them or modify them by hand or using simple XSLT scripts or other XML-related tools. The application is provided as a package that includes documentation and example files from [1].

Figure 2. Screenshot of SR-3DEditor. The upper viewport displays the current scene, while the lower viewport displays retrieved items. Selected scene items are denoted by an enclosing cube. The user has just queried the engine with a shape that looks like a car.

6

User Evaluation

One of the most important aspects for the evaluation of SR-3DEditor is its similar model retrieval effectiveness. However, this property cannot be evaluated using any of the standard evaluation measures, such as precision or callback, since a) the retrieved elements (models) are compared against a different type, i.e. a 2D shape and b) the sketched shape of an outline is very subjective for the person that creates it. Thus to evaluate the retrieval performance, a user evaluation was conducted. A group of ten users experimented with SR-3DEditor and their reactions and comments were recorded. The users were experienced in computer usage but none of them had any experience with a Content-Based Retrieval system. Each user had to complete a questionnaire. Completion of the questionnaire was followed by a short discussion with the system designers. This questionnaire is attached, along with a summary of the results, at the end of this paper. The objective of the user trial was to find out which features the users found most useful and to pinpoint any problems that they encountered in using the integrated system. The following conclusions were drawn from the testing process: a) The users were pleased with the ease of use of the program. Although some users had to spend some time to learn the key-based interface, all of the users understood that a key-based interface was necessary to speed up working with the 3D scene. b) The fact that the users were able to view the results of their work by browsing an XML tree allowed them to better understand the system. Some users were tempted to edit the XML files using an XML editor.

c) The users found the idea of retrieving models automatically (rather than selecting them from a database) very useful and innovative. In most cases SR-3DEditor was able to retrieve several elements of the search space (from a pre-loaded database) that resembled the sketched shape. d) Although none of the users had experience with such a visual retrieval system, all of them immediately grasped the concept and were able to utilize it. One of the reasons that allowed inexperienced users to work with the system efficiently after only a small training period was the fact that, the user does not need to configure the shape-matching engine; the user just accesses its functionality through a visual interface. f) Drawing a sketch that approaches the outline of a 3D object is a subjective task. Every individual draws his/her own interpretation of the outline of e.g. a car or an airplane. Yet, the users that spent more time using the sketching interface were able to improve their performance in sketching outlines with less than 5 or 6 sketches. No artistic talent is required to be able to sufficiently use the sketch interface for retrieval, while the required experience is obtained in minutes. Many users would find a browse-andselect or a text-based search interface insufficient for visual information retrieval after the use of the sketching environment.

Acknowledgements This research was supported by the HERAKLEITOS initiative of the Hellenic Ministry of Education.

7

Conclusions and Future Work

This paper presented SR-3DEditor, a complete system for creating, editing and displaying 3D scenes consisting of 3D model components. SR-3DEditor enhances the selection of 3D models: instead of the browse-and-select process of most environments, it uses a more intuitive interface; the user simply draws a 2D representation of how a 3D model looks and the system tries to retrieve models that have such a 2D view. A retrieved model can be added to the 3D scene and be edited like the rest of the 3D scene objects. The 3D models are stored/retrieved into/out of an XML-based database, while the 3D scene file format is also XML. The system users are able to easily create their own databases using the same application. The simplicity and ease of use of the system has been greatly appreciated by the users of the system. Additional evaluation tests are currently being conducted in order to improve 2D contour matching tolerance for different sketching techniques. This application is one of the first complete systems that incorporate 3D Graphics and Shape Based Retrieval, and surely more applications will follow.

References [1] I. Andreou, “SR-3Deditor: 3D Scene Synthesis Page", http://thalis.cs.unipi.gr/~gandreou/sr_3deditor

[2] D. S. Zhang and G. Lu, “A Comparative Study on Shape Retrieval Using Fourier Descriptors with Different Shape Signatures". Intelligent Multimedia and Distance Education (ICIMADE01), pp. 1-9, Fargo, ND, USA, June 2001. [3] Paul W.H. Kwan and K. Kameyama and K. Toraichi, “On a Relaxation-Labeling Algorithm for Real-time Contour-based Image Similarity Retrieval”, Image and Vision Computing, Vol. 21, No. 3, pp. 285-294, 2003. [4] S. Belongie and J. Malik and J. Puzicha, “Shape Matching and Object Recognition Using Shape Contexts", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 21, No. 24, pp. 509-522, April 2002. [5] I. Andreou and N. M. Sgouros, “Sketch Creation utilizing Shape Matching Techniques”, IEEE International Conference on Multimedia and Expo, Baltimore, USA, July 2003. [6] P. Agouris, A. Stefanidis: “Sketch-Based Image Retrieval in an Integrated GIS Environment”, International Archives of Photogrammetry and Remote Sensing , Vol. 32, Part IV, pp. 597-604, September 1998. [7] E. Di Sciascio and M. Mongiello, “Query by Sketch and Relevance Feedback for Content-Based Image Retrieval over the Web”, Journal of Visual Languages and Computing, special issue on Distributed Multimedia Systems, Vol. 10, No. 6, 1999. [8] A. D. Bimbo, P. Pala, “Visual image retrieval by elastic matching of user sketches”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 2, 1997. [9] J. Barton, D. Love, “Retrieving Designs from a Sketch using an Automated GT Coding & Classification System”, IEPM2003, Porto, 2003 [10] W. Niblak et al., “The QBIC project: Querying images by content using color, texture, and shape", In Storage and Retrieval for Image and Video Databases, SPIE, Vol. 1908, pp. 173-182, 1993. [11] R. Osada and T. Funkhouser and B. Chazelle and D. Dobkin, “Matching 3D models with shape distributions”, International Conference on Shape Modeling and Applications, pp. 154-166, May 2001. [12] P. Shilane, and P. Min and M. Kazhdan and T. Funkhouser, “The Princeton Shape Benchmark”, Shape Modeling International, Genova, Italy, June 2004. [13] D.-Y. Chen and X.-P. Tian and Yu-Te Shen and M. Ouhyoung, “On Visual Similarity Based 3D Model Retrieval”, EUROGRAPHICS, Vol. 22, No. 3, 2003.

[14] A. Henrich and G. Robbert, “MARS: A Retrieval Service for Multimedia Authoring Environments", ADBISDASFAA Symposium, pp.99-98, 2000. [15] I. Andreou, “G Computer Vision http://thalis.cs.unipi.gr/~gandreou/gcv/.

library”,

[16] W. Niblack and J. Yin, “A Pseudo-Distance measure for 2-D Shapes Based on Turning Angle”, Image Processing, Washington DC, 1995. [17] E. M. Arkin et al, “An Efficiently Computable Metric for Comparing Polygonal Shapes”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991. [18] “CVonline: Vision Geometry and Mathematics”, http://homepages.inf.ed.ac.uk/rbf/CVonline/geom.htm [19] “Libxml2 library”, http://www.xmlsoft.org/

APPENDICES 1. User Questionnaire Q1) How easily did you learn and use the system? Q2) Considering the options available from the selected database, how often do you think the system retrieves the models you consider more likely to the drawn shape? Q3) How much does Automatic Shape Retrieval serve your goal of synthesizing a 3D scene? Q4) Does Automatic Model Retrieval make Scene Synthesis faster? Q5) Does Automatic Model Retrieval make Scene Synthesis easier? 2. Questionnaire results

Shape-Based Retrieval of 3D Models in Scene Synthesis - CiteSeerX

Shape-Based Retrieval of 3D Models in Scene Synthesis - CiteSeerX

Suggest Documents

Content-Based Retrieval of 3D Models - CiteSeerX

Screen 3D Scene ... - CiteSeerX

Interactive Retrieval of 3D Shape Models using Physical ... - CiteSeerX

Color Constancy using 3D Scene Geometry - CiteSeerX

Unconstrained vs. Constrained 3D Scene Manipulation - CiteSeerX

structured text retrieval models - CiteSeerX

Single Shot Scene Text Retrieval

Configurable 3D Scene Synthesis and 2D Image ... - UCLA Statistics

Content-Based Retrieval of 3D Models - Semantic Scholar

Content-based 3D Neuroradiologic Image Retrieval - CiteSeerX

Using Ignorance in 3D Scene Understanding

3D Geospatial modeling of accident scene using Laser ... - CiteSeerX

3D Scene Manipulation with Constraints

Natural scene classification and retrieval using Ridgelet ... - CiteSeerX

Error-Resilient Transmission of 3D Models - CiteSeerX

Automatic Reconstruction of Textured 3D Models - CiteSeerX

Robust Watermarking of 3D Mesh Models - CiteSeerX

embedded coding of 3d graphic models - CiteSeerX

3d reconstruction of building models - CiteSeerX

Signatures of 3D Models for Retrieval1 - CiteSeerX

Error-Resilient Transmission of 3D Models - CiteSeerX

Parsimonious Language Models for Information Retrieval - CiteSeerX

Term Context Models for Information Retrieval - CiteSeerX

Enhanced Models for Expertise Retrieval Using ... - CiteSeerX