Editorial Manager(tm) for Multimedia Tools and Applications Manuscript Draft Manuscript Number: Title: Extending MPEG-7 For Efficient Annotation of Complex Web 3D Scenes. Article Type: Semantic Multimedia Keywords: MPEG-7, ISO 15938, web-3D, X3D, 3D annotation Corresponding Author: Prof. Dr. Athanasios G. Malamos, PHD Corresponding Author's Institution: Technological Educational Institute of Crete First Author: Patti Spala, BSc Order of Authors: Patti Spala, BSc; Athanasios G Malamos, Prof. Dr.; Anastasios Doulamis, PhD; George Mamakis, BSc Abstract: In this paper, we propose an annotation scheme for web-3D scenes based on the MPEG-7 standard. We focus on the annotation of 3D scenes that are encoded with the X3D modeling language which is the descendant of VRML. X3D has been adopted by the web service industry as the appropriate framework for developing internet friendly and flexible 3D visualization applications. We introduce MPEG-7 extensions that are necessary in order to fulfill the requirements of the X3D scene structure and we adapt the MPEG-7 schema encoding accordingly. In the annotation scheme, we consider animation and interactivity issues along with geometrical and appearance characteristics of the 3D content providing a more efficient description of the scene. Thus, the extensions proposed in this paper cover all the information required for a complete and efficient description on the position and relative size of 3D objects, specific characteristics such as object type, curvature properties and available textures, combined with the objects' innate animation properties and its interactions with other objects in the scene or with the end user. The extensions are MPEG-7 Visual and Metadata Descriptors, which fully conform to the standardization restrictions, and we also provide the modifications to the corresponding schema of the ISO 15938 standard that are essential for validating against the proposed MPEG-7 implementation.
Manuscript Click here to download Manuscript: MPEG7_X3D_26_2_2010.doc
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Click here to view linked References
Extending MPEG-7 For Efficient Annotation of Complex Web 3D Scenes. Patti Spala, Athanasios G. Malamos, Dept. of Applied Informatics and Multimedia, Technological Educational Institute of Crete Email:
[email protected],
[email protected]
Anastasios Doulamis, Decision Support Lab. Technical University of Crete Email:
[email protected]
George Mamakis Faculty of Advanced Technology, University of Glamorgan And Dept. of Applied Informatics and Multimedia, Technological Educational Institute of Crete Email:
[email protected]
Abstract.- In this paper, we propose an annotation scheme for web-3D scenes based on the MPEG7 standard. We focus on the annotation of 3D scenes that are encoded with the X3D modeling language which is the descendant of VRML. X3D has been adopted by the web service industry as the appropriate framework for developing internet friendly and flexible 3D visualization applications. We introduce MPEG-7 extensions that are necessary in order to fulfill the requirements of the X3D scene structure and we adapt the MPEG-7 schema encoding accordingly. In the annotation scheme, we consider animation and interactivity issues along with geometrical and appearance characteristics of the 3D content providing a more efficient description of the scene. Thus, the extensions proposed in this paper cover all the information required for a complete and efficient description on the position and relative size of 3D objects, specific characteristics such as object type, curvature properties and available textures, combined with the objects’ innate animation properties and its interactions with other objects in the scene or with the end user. The extensions are MPEG-7 Visual and Metadata Descriptors, which fully conform to the standardization restrictions, and we also provide the modifications to the corresponding schema of the ISO 15938 standard that are essential for validating against the proposed MPEG-7 implementation.
Keywords: MPEG-7, ISO 15938, web-3D, X3D, 3D annotation 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1 Introduction Knowledge of the semantic description of the elements represented in a 3D scene provides a vast range of advanced uses of the 3D worlds described. For instance, Internet search engines, in the future, could perform customized requests through a descriptive language to extract possible shapes from 3D scenes. A potential matching procedure may refer to matching and retrieving using geometrical and structural characteristics of the scenes [1][2][3]. However, files containing complex 3D scenes are difficult to handle due to their enormous size. Thus, shape and structure matching may be inefficient in terms of computational effort and response time [4]. Alternatively, the use of textual information in order to retrieve semantic references will fail in most cases. The annotation performed by authors depends on subjective factors, such as language, culture, etc. Therefore, results based on textual matching are even more degraded than the structural matching [5]. Modeling tools allow some semantic and metadata information to be included during the authoring procedure, but there are specific application limitations. As a consequence, an efficient and complete description mechanism is considered mandatory to unify structural characteristics with human or application inserted textual descriptions in order to improve the annotation effectiveness of 3D models. With the use of this description language, the semantic information of the objects can be extracted separately into semantic libraries, allowing for easier access and examination of the individual shapes. Such descriptions can find uses in other environments apart from 3D worlds, such as science or engineering [6][7][8][9] [10]. Moreover, annotation may be further utilized for customized search in order to improve 3D model reusability, which is one of the MPEG-4 objectives [11]. In this paper, we focus on the annotation of 3D scenes that are encoded with the X3D modeling language. X3D stands for “eXtensible 3D” and is an open standard for delivering 3D content over the Internet, as developed by the Web3D Consortium, designed to replace and extend the existing VRML97 standard [12]. Its intension is to provide a new file format specification combining all necessary requirements used to efficiently display 3D interactive worlds on the web. X3D supports XML encoding, thus providing seamless integration with web services architectures and distributed networks, facilitating cross-platform 3D applications, ranging from mobile telephony to supercomputers. This open standard offers added flexibility by supporting 2D & 3D graphics, creation and implementation of user-defined objects, animation and physical simulation, through a 3D core runtime delivery engine. Since X3D is considered to be the descendant of VRML, it has been smoothly adopted by the web service industry as the appropriate framework for developing internet friendly 3D worlds and flexible visualization applications [13][14][15]. As a standard, X3D specifies sets of extensions and capabilities for various applications, known as profiles, increasing the functionality and flexibility across custom environments while enhancing user interactivity. In an effort to establish cross-platform support with the MPEG-4 multimedia standard, the Web3D Consortium has launched the MPEG-4 Interactive profile specifically designed to facilitate the needs for transparent network broadcast, PDA’s and mobile phones, whilst enabling user specific interactions and navigation of animated objects across diverse networks and improving the quality of such services. X3D scenes intend on accurately providing the geometric and environmental aspects relating to the individual objects contained within the scene with additional information about the scene itself. Conversely, semantic information relative to the objects contained within the scene is not standardized, as semantics can vary depending on the application in which they are used. The Moving Pictures Expert Group (MPEG) [16] has defined a set of standards for encoding and describing multimedia data. The most applicable standards in relation with 3D multimedia content are MPEG-4 [11] and MPEG-7 [17]. MPEG-4 specifies multimedia files by a set of methods defining compression of audio and visual (AV) digital data as multimedia objects, specifically Audio and Visual Objects (AVO’s), enabling multimedia data broadcast on the web (streaming media) and the user’s ability to interact with the generated audio-visual scene. MPEG-4 is built in parts allowing individual developers to decide on how to implement them. These parts are known as “profiles”, each of which enables additional features and capabilities to be defined according to the implemented application. As such, MPEG-4 extends the VRML/X3D standards, providing added support for 3D rendering and user interaction over the web. Recently, additions to the MPEG-4 standard have been made in the form of XMT-A, an XML-based format containing smaller sets of X3D XML nodes, but still does not allow independent content description.
2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
MPEG-4 contains profiles to implement additional features in the X3D content but does not provide any high-level semantic description capabilities as the MPEG-7 standard. MPEG-7 is a multimedia content description standard (defined as ISO/IEC 15938) aimed to provide additional functionality to the previous MPEG standards by describing the content of multimedia files in a unified format [17]. Its main objective is to allow the description of audiovisual content, such as still or moving images, audio, speech, graphics and 3D models, in conjunction with semantic information on the combination of such elements in multimedia environments. Therefore, it does not deal with issues such as the encoding of the content itself like MPEG-4, but focuses on the semantic description of the associated media, allowing fast and efficient search and retrieval of the user-requested material. It does not provide standardization for extracting audiovisual features nor does it constrain search engines or applications making use of MPEG-7 descriptions. Consequently, MPEG-7 is independent to the other MPEG standards, separating the description from the content representation itself. MPEG-7 provides well-defined description tools and mechanisms for the annotation of media data to high-level description information and ontologies [18][19][20][21] The semantic information can then be stored separately from the data creating reusable resources, enhancing customized search queries and indexing the related content through the content description without processing or modifying the actual entity files. Thus, MPEG-7 is a prime candidate for the semantic annotation of 3D objects within a scene as described by the MPEG-4 Interactive Profile for X3D [11][12]. In this paper we propose an annotation scheme of X3D scenes based on the MPEG-7 standard. We introduce MPEG-7 extensions that are necessary in order to fulfill the requirements of the X3D scene structure and we adapt the MPEG-7 schema encoding accordingly. In the annotation scheme we present, we consider animation and interactivity issues along with geometrical and appearance characteristics of the 3D content providing a more efficient description of the scene. Thus, the extensions proposed in this paper cover all the information required for complete and efficient description on the position and relative size of 3D objects, specific characteristics such as object type, curvature properties and available textures, combined with the objects’ innate animation properties and its interactions with other objects in the scene or with the end user. The extensions are MPEG-7 Visual and Metadata Descriptors, which fully conform to the standardization restrictions, and we also provide the modifications to the corresponding schema of the ISO 15938 standard that are essential for validating against the proposed MPEG-7 implementation. The rest of this paper is structured as follows. In section 2 we present the background of our research and we introduce the significance of annotating X3D with MPEG-7. In section 3 and 4 we literally present the extensions in MPEG-7 descriptors and we provide some basic examples of their application in 3D characteristics. In section 5 we refer to some technical issues and we provide an illustrative example of the application of MPEG-7 with a representative X3D sample. In section 6, we conclude our work providing a brief description of the application prospects of the adoption of the standard extensions we propose.
2 Background and Motivation MPEG-7 provides limited description tools for an efficient 3D content description, especially for XML-based encoded 3D models, as in the case of X3D. This is due to the fact that MPEG-7 provides definitions for describing 3D models analyzed in low-level complex meshes or spectrums, whereas X3D provides simplified primitives and volumetric representations of the contained objects via the XML encoding. Meanwhile, a complete description of the objects contained within an X3D scene does not solely consist of the geometry and shape characterization. In order to efficiently describe X3D scenes, the contained objects’ animation and texture properties as well as interactivity issues with other objects must also be defined. In an effort to exploit the complete potentials of the MPEG-7 and X3D standards to annotate the semantic information of an X3D scene efficiently and independently, several description extensions to the MPEG-7 standard are required. Currently, the integration of X3D content within the MPEG-7 standard is limited as far as content descriptions are concerned, highlighted by the fact that very few examples of applications using MPEG-7 for X3D semantic description exist. Previous related research literature on the semantic annotation of X3D content with MPEG-7 focuses mainly on defining methods on
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
extending MPEG-7 Description Schemes to provide multiple 3D model locators for indexing and retrieval purposes [22][23][24] or on presenting generalized methods to describe specific characteristics of a 3D model, such as object interactivity [25][26]. Bilasco et al. [22][23][24] use MPEG-7 to index 3D content within X3D files with the use of localization descriptors. As stated in their research, due to the fact that an X3D object can be stored over numerous file entries, or even a larger X3D scene may be split over several smaller files, multiple MPEG-7 MediaLocator descriptors are necessary to describe one entity by referencing all according locations. For this reason, Bilasco et al. propose new description tools, namely Structural Locator and 3D Region Locator. StructuralLocatorType aims to support the localization of objects situated over various file entries by allowing multiple URIs in the descriptor. The research then presents the 3D Annotation Framework (3DAF) built on an extensible 3D SEmantic Annotation Model (3DSEAM) developed to add semantic information to the X3D geometric modeling of the scene, allowing XML queries to be made for content retrieval based on object structure and location. While the work described above facilitates in the indexing and retrieval of X3D objects allowing the reuse of the models which are indexed in the content repository, it lacks in generating a complete semantic description profile of X3D models, as issues such as animation, textures and interactivity are not addressed. In [25][26] Chmielewski present the Multimedia Interaction Model designed to provide a solution in describing object interactions. It utilizes an Interaction Interface concept based on the fact that 3D objects have some common properties that can be grouped together through the interface. The Interface itself allows the Multimedia Interaction Model not to be limited to a particular domain of 3D graphics representation, but to provide unified methods in supporting the description of interactions of 3D content which can be implemented easily with the appropriate modifications. Again, even though this work is considered important, it provides a generalized method on describing 3D object interactions, whereas the X3D XML format already provides direct estimation of object interactivity through the XML definition itself, making the semantic annotation process faster and more efficient when generated directly from an X3D document. The association of X3D documents to an external domain-specific ontology approach [27] provides an interesting case on describing real and virtual semantic objects [28] in X3D worlds. The authors propose the solution of inserting metadata nodes into the existing X3D XML representation. With this method, sets of MetadataSet nodes are associated with the WorldInfo node, with every MetadataSet containing the semantic information on a specific object from the corresponding scene. The semantic annotations are then associated with an external RDF Schema based ontology to provide a scene-independent semantic description. Again, while the solution provides an external ontology for the semantics, at the same time it requires modifications to be made on the existing X3D files by inserting additional metadata information. In [3], Papaleo et al. introduce a framework for the segmentation of X3D complex scenes to primitives that may be annotated manually by using a semantic graph. Despite the efforts made to create external or internal RDF ontologies and semantic graphs, MPEG-7 offers far better support on the description of multimedia content and combines a wide acceptance with the capability of integration with other multimedia platforms (such as MPEG-4 and MPEG-21), especially useful for cross-platform applications. To this date and knowledge, no research aiming to solve integrating issues between X3D content and MPEG-7 content management description in an efficient and independent manner has been found. The research effort of this paper is conducted to “bridge the gap” between X3D and MPEG-7 by enhancing the MPEG-7 Schema with a set of descriptor extensions and modifications and facilitating the complete semantic annotation of X3D content in MPEG-7, enriching the standard. The provision of an efficient and complete description mechanism will unify and improve universal annotation efficiency of 3D models in the potential applications that demand it, as 3D models are widely used in a broad variety of application domains. In the commercial industry [6], end users in remote locations are able to view, customize and compare products by viewing their 3D model representations from different angles and perspectives. They can choose from a variety
4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
of products stored in model repositories to compare and evaluate similarities and characteristics before making a decision to purchase a certain product. Additional metadata and feature information on the product history and design is also included and updated dynamically for the user’s interests. This feature information is attached to the product representation through virtual annotations which can be shared by the authors or even other users in real-time. Therefore, a universal MPEG-7 description will assist in the similarity matching and customized retrieval of available products on the web regardless of their local storage location. At the same time, a unified MPEG-7 annotation description can be implemented in a variety of diverse commercial applications using 3D product representations; the same 3D content can also be used in different commercial applications with the provision of the corresponding product’s MPEG-7 description, thus solving reusability and interoperability issues between cross-platform web applications. In the medical industry, certain internal organs or even smaller cells can be graphically represented through 3D models, assisting in medical diagnosis and therapy. Knowledge gained through visual representations on how each organ works and responds to certain treatment can assist in creating new therapeutic methods for diseases without testing on human cells. At the same time, the biology field has isolated and sequenced many genomes responsible for the characterization of genes and proteins [7]. The use of visual representations of the protein structure and creation process in conjunction with semantic annotation can assist in automated prediction methods and calculations on gene behavior, providing knowledge on human disorders and inherited characteristics. Universal MPEG-7 semantic annotation on the various protein structures can resolve reusability issues of the 3D content representations, as similar proteins and molecules can be reused and multiplied graphically through their MPEG-7 descriptions within cells depending on the biomedical tests and calculations. In addition, as 3D protein structures contain large amounts of data, they can be segmented into smaller representations and stored separately in distributed network filesystems, allowing their corresponding MPEG-7 semantic description to provide annotation information exchange universally without the need to transfer the actual 3D representation structures. In education, collaboration environments are built to assist interactive education [29][30]. These environments may employ MPEG-7 to improve reusability and exchange of the educational content. In the cultural heritage domain [8], historical monuments can be restored and preserved through scientifically authenticated 3D models stored in large repositories. The combination of the model itself along with the annotation of feature and semantic information allows the development of mechanisms designed to preserve and review monuments of historical or cultural value. These mechanisms can also be used for peer review, publication, updating and dissemination of the 3D model representations. As such, historical monuments and artifacts can be segmented into smaller significant models and stored over different locations with their semantic information. MPEG-7 descriptions of the individual models can then be used to provide easier access and reusability of the content stored in the repositories, whilst the universal format of MPEG-7 enables the models to be independently integrated over diverse cross-platform web applications of historical or archaeological content. For instance, consider a piece of pottery of historical value; the same pottery model can be used in 3D representation applications of monuments or buildings of the same era, in a virtual museum guide web application or even in a online 3D encyclopedia with details of origin and usage. All of these applications utilize the MPEG-7 semantic description of the pottery piece universally without any additional cost of transforming the object to other representation formats. Furthermore, the engineering industry uses large 3D models to represent designs of new devices, automotive vehicles and other engine structures and even architectural building and landscape demonstrations. Engineers generally use CAD models which may be very large and stored in proprietary formats [9]. In a similar fashion, the computer aided manufacturing (CAM) industry employs the ISO 10303 standard for computer aided engineering in a variety of domains ranging from interior design to automotive engines. Additional functionalities and semantic annotation to enrich 3D models for similarity based content search and retrieval over diverse industrial databases in a universal format need to be addressed. In order to access and retrieve data in an efficient form without loss of detail, the use of unified data format and annotation information is required. MPEG-7 descriptions can solve interoperability issues by providing a unified description format regardless of the 3D content, like in the case of computer aided design (CAD) and computer aided manufacturing (CAM) formats. As a result, CAD or CAM models can be exchanged and reused seamlessly over different domains and applications, especially on the
5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
web, and stored as a universal MPEG-7 description format in large engineering and manufacturing repositories.
3 MPEG-7 annotation of X3D Officially called the Multimedia Content Description Interface, MPEG-7 defines a set of tools describing complex and customized metadata structures, through the XML-based Description Definition Language (DDL). Additionally, DDL allows modifications and extensions to be made on particular Description Schemes, thus providing extensibility to the standard. In order to generate a valid MPEG-7 description it must conform to the constraints imposed by the DDL Schema. It also provides a set of Descriptors (D), in order to define the syntax and semantics of the entities involved, and a set of Description Schemes (DS). Description Schemes specify the structure and corresponding semantics of the relations between such entities, including metadata concerning semantic elements (shapes, colors, objects, motion) catalogue elements (copyright rules, user and parental access, title, date, etc) and even structural elements (technical statistics). Furthermore, they are used in grouping several Descriptors and other Description Schemes together. MPEG-7 Description Schemes are categorized as Visual Description Schemes (VDS), Audio Description Schemes (ADS), and Multimedia Description Schemes (MDS). The latter refer to specific MPEG-7 metadata structures that are used to describe and annotate multimedia data, facilitating the searching, indexing and filtering of special features of the multimedia content.
3.1
MPEG-7 and 3D content
The latest trend in 3D multimedia aims in representing all 3D content in generic XML representations regardless of its origin and complexity, due to the fact that XML-based representations provide flexibility and can be efficiently broadcast over cross-platform internet applications. A relative effort has been made through COLLADA, which defines an XML database schema enabling the digital asset exchange between 3-D authoring applications without any loss of information, thus allowing authors to combine 3D multimedia content from any authoring package into unified data entities [31]. In a similar concept, the X3D standard provides an XML-based schema encoding, offering a lighter representation of the 3D content thus enhancing its use over internet applications, with optimized XML data transfers. At the same time, as X3D represents graphics in an abstract format, it can be rendered anywhere as long as the local operating system and hardware support it. MPEG-7 defines several descriptors to facilitate the description of 3D entities involving geometries, textures, animations and content metadata. However, the majority of 3D descriptors are centered on the generic representation of 3D objects, lacking in the ability to provide optimized semantic descriptions based on XML encoded 3D objects. This Section presents the current descriptors available for producing 3D content descriptions, according to the MPEG-7 standard. Section 4.3 presents the MPEG-7 descriptors that are applicable to the semantic annotation of X3D scenes and identifies the necessary extensions to the current MPEG-7 Schema to provide an efficient content description of X3D scenes.
Geometry Descriptors MPEG-7 Part 3, Visual, defines two shape descriptors for 3D objects: the Shape3D Spectrum descriptor and the MultipleView Descriptor Container. The Shape3D descriptor provides an intrinsic shape characterization of 3D mesh models [32]. The combination of contour-based and region-based shape 2D Descriptors with the MultipleView Descriptor Container facilitate in providing the 3D properties of an object. The MultipleView representation is convenient when the 3D mesh model of the object is not known or when support for queries by 2D views of the 3D object is required [32].
6
Motion Descriptors 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
MPEG-7 Part 3, Visual, defines three descriptors to describe various motion aspects for 3D objects. The Camera Motion descriptor defines basic camera operations such as panning and tilting [32]. The Motion Trajectory descriptor specifies the motion trajectory of a moving object, by defining the consecutive motion values of a key point (pixel) of a moving object or region [32]. Finally, the Motion Activity descriptor captures the viewer perspective of an objects’ motion within a sequence [32].
Texture Descriptors In order to describe textures, MPEG-7 Part 3, Visual, provides three texture descriptors: the TextureBrowsing Descriptor for browsing purposes [32], the HomogenousTexture Descriptor and EgdeHistogram Descriptor for similarity comparison and retrieval purposes, using the texture of a still or moving image region.
3.2
MPEG-7 and X3D content
In contrast to other 3D multimedia formats, the X3D XML-based encoding format represents 3D environments in an abstract definition. Therefore, the scene is already analyzed into simple structures providing a fully annotated XML representation by default. As a result, several basic MPEG-7 descriptors that provide complex and generic descriptions are considered inefficient to describe the lightweight dynamic content representations of X3D worlds especially when most semantic information can derive directly from the X3D document.
Annotating Geometries The X3D XML format specifies a simplified structure for the definition of geometry nodes by declaring the primitives (Box, Cone, Sphere, etc) that construct more complex objects. Therefore, annotation information on geometries can be directly extracted from the XML representation, minimizing the need and effort to characterize objects through mesh and spectrum entities or to extract multiple 2D views from the simplified shape. Consequently, the Shape3D Spectrum descriptor can be useful for the indexing and retrieval of elementary 3D shapes with known mesh representations [9] such as X3D IndexedFaceSet or IndexedLineSet nodes, but provide a significant drawback in proving semantic descriptions of X3D primitive geometries, especially in performance issues [4]. X3D primitive geometries are represented and defined according to their shape and volumetric properties. In order to acquire a complete semantic description concerning the rendering and allocation specifications of an X3D object, it is mandatory to provide which types of primitive geometries compose the complex elementary object and its volumetric properties, such as its position and approximate size within the scene. Geometry descriptions for primitives in X3D scenes can be applied with the provision of two new extensions, respectively BoundingBox3D and Geometry3D, as defined in section 4.1.
Annotating Interactivity and animation In X3D scenes, continuous object animations are identified through a combination of Timers and Interpolators. Timers define discrete time and cycle interval values segmenting the animation over time. Interpolators apply the translational values of the object, according to their corresponding Timers’ discrete values. Hence, the MotionTrajectory Descriptor in MPEG-7 Visual is considered functional to describe Interpolator values from X3D, by providing the keypoint values of the animated model within the scene. Knowledge on an object’s animation trajectory assists in determining the required space and motion path of animated objects, providing customized limitations in content retrieval, for instance limiting content searches of large objects with a wide motion range for respectively smaller scenes.
7
User interactivity and process triggers are considered a vital component for the semantic description of an X3D scene. Current MPEG-7 motion descriptors cannot provide accurate descriptions for interactivity node connectors (ROUTEs) or descriptions concerning a triggered animation event. As a result, additional extensions are proposed to the current MPEG-7 Schema to implement interaction descriptions, namely the Interactivity3D Descriptor (see section 4.1), to express descriptions X3D interactivity nodes and their corresponding trigger source.
Annotating Textures Current MPEG-7 visual texture descriptors (as mentioned above) facilitate browsing and similarity retrieval using the texture feature in image and video databases. As such, their implementation in X3D scene descriptions is considered dysfunctional. However, X3D textures are respectively defined within a scene by their filename. Consequently, each set of available object textures can be defined in an MPEG-7 description as a collection of MultimediaType Contents by using MediaLocator elements.
Additional Metadata elements X3D XML encoded scenes provide metadata information concerning specific properties addressing the scene. Each scene is accompanied with a certain profile providing additional functionality depending on the profile type. When describing X3D content it is imperative to include the profile type to gain information on any additional requirements concerning model structures and rendering processes, as different local renderers may be incompatible with certain profiles. At the same time, X3D Script nodes contain scripting functions that process object interactions and animations. These scripts can be included internally within the X3D Script node or referenced externally through Java or Jscript classes. It is therefore deemed valuable to include metadata information describing the scripting resources necessary for providing interactivity to 3D objects contained within X3D scenes. As a result, since no such MPEG-7 descriptions exist, further extensions to the DescriptionMetadata [34] are required to efficiently include X3D profile and scripting metadata in the MPEG-7 DescriptionMetadataType (section 4.2.), by including the Profile3DType (section 4.2) and Script3DType (section 4.2.) descriptions. The following table provides a concise usage description of each MPEG-7 extension created for the efficient annotation of X3D scenes, as proposed in this paper.
MPEG-7 Part 3 Visual
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
In 3D environments, interactive 3D objects can be categorized into two types. The first type corresponds to objects that require an internal process to modify their respective properties to create animation. In this case, the interactions are handled externally, usually by the end-user, but the object is considered static to the original scene. The second type corresponds to objects that contain interaction implementations by default, which are again handled by the interaction process. In this case the objects are considered dynamic to the environment. In the case of X3D, processes that control interactions are usually Script nodes.
Name
Usage
BoundingBox3D
Specifies the position and size of a complex 3D object in a scene, by providing the volumetric coordinates of the group of shape nodes that composite the complex model.
Geometry3D
Describes the types of primitive or complex geometries contained in the X3D scene along with curvature details.
Metadata3D
Specifies any additional metadata information provided with an X3D node (such as MatadataFloat, MetadataInteger, MetadataSet).
8
MPEG-7 Part 5 MDS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Interactivity3D
Describes how an X3D object interacts with other objects in the scene or with the end user, by extracting interaction details from the corresponding X3D ROUTE elements.
Profile3D
Describes the X3D profile in use for the file described in order to gain all additional functionality and specify browser and rendering requirements.
Script3D
If the X3D file described contains scripting nodes controlling interactions and animations, this datatype specifies the script class location (internally in the content file or in an external file location) and the scripting language used, in order to define all necessary rendering and control requirements.
Table 1 Brief usage description of the proposed MPEG-7 extensions
4 MPEG-7 proposed extensions in detail1 This research proposes a set of extensions based on the current MPEG-7 standard, in an effort to efficiently describe an X3D scene. Section 4.1 contains the schema extensions for the proposed descriptors classified in MPEG-7 Part-3: Visual. Section 4.2 contains additional schema extensions and modifications as declared in MPEG-7 Part-5: Multimedia Description Schemes (MDS).
4.1
Extensions in MPEG-7 Part 3 – Visual
MPEG-7 Part 3 – Visual, specifies tools for description of visual content, including still images, video and 3D models. These tools are defined by their syntax in DDL and binary representations and semantics associated with the syntactic elements. They enable description of the visual features of the visual material, such as color, texture, shape and motion, as well as localization of the described objects in the image or video sequence [32]. This research has followed the same definition syntax in the proposed visual extensions for X3D, with the exception of the binary representation, in accordance with the standard. The following clauses specify the proposed MPEG-7 visual descriptor extensions. Each clause presents the extended descriptor schema definition and semantic description of the syntactic elements, as well as identifying the extended descriptors’ relation to the X3D standard. Informative examples are provided on each definition for clarification purposes.
The BoundingBox3D descriptor The BoundingBox3D descriptor is introduced as an extension to the MPEG-7 visual descriptors in order to support the volumetric boundaries of the 3D children node shapes contained within an X3D Bounding Box node, by providing the size and position coordinates extracted from the related X3D node. As defined in the X3D standard, an X3D BoundingBox type node is used by grouping nodes that enclose several geometry shapes together to provide information to the X3D browser on the corresponding group's approximate size enabling rendering optimization. Bounding Box Descriptor was originally used in the Proposal for a New MPEG-7 Description Definition Language Grammar in 1999 as an example on describing parameterized 2D and 3D objects but was never implemented in the official MPEG-7 standard. Meanwhile, the proposed MPEG-7 extension BoundingBox3DType centers on describing exclusively the volumetric characteristics of 3D objects.
1
In order to establish the most efficient description of the proposed MPEG-7 extensions, this
section adopts the formal ISO standards representation format.
9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
A volumetric process of modeling consists of volume elements (Voxels), representing a value on a regular grid in three dimensional space. Therefore, the volumetric modeling of 3D models does not suffer from polygon stretching when there is not a sufficient amount of polygons in a region to achieve formation. A new topology is created on the group once the individual models are formed and details are inserted. Hence, integrating the X3D Bounding Box surrounding the contained shapes is essential in order to provide knowledge on the estimated three-dimensional size and location of a model within a scene, prior to gathering any additional information on the model itself. In this way, even without knowing the individual types of models contained within the Bounding Box or any other details, MPEG-7 BoundingBox3D descriptor extension provides the essential information on describing the volume properties to pre-define the necessary size and position of the objects in a scene. The use of BoundindBox3D descriptors also facilitate in accelerating the indexing and retrieval processes by pre-defining a rough estimate on allocation needs of models within X3D rendered scenes.
DDL representation syntax
Figure 1 MPEG-7 XSD Extension of BoundingBox3DType
Descriptor components semantics
Name
Definition
BoundingBox3D
Specifies the volumetric coordinates as obtained from the X3D BoundingBox element, binding a set of children node shapes together
BoundingBox3D
Specifies the Width, Height and Length coordinates of the BoundingBox. If the X3D grouping node contains a BoundingBox, then each of the three values contained in bboxSize is correspondingly passed into the BoxWidth, BoxHeight and BoxDepth attributes. If no values are present, then the default values of (-1, -1, -1) are assumed, as defined in the X3D specification.
BoundingBox3D
Specifies the Center coordinates of the BoundingBox. If the X3D grouping node contains a BoundingBox, then each of the three values
Type
Size
10
contained in bboxCenter is correspondingly passed into the BoxCenterW, BoxCenterH and BoxCenterD attributes. If no values are present, then the default values of (0, 0, 0) are assumed, as defined in the X3D standard.
Center
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Descriptor components example
…
(a)
(b)
Example 1 BoundindBox3D Description transformation, (a) X3D XML Bounding Box node, (b) Corresponding MPEG-7 BoundingBox3D Descriptor
The Geometry3D descriptor The Geometry3DType descriptor corresponds to a geometry node in an X3D XMLencoded scene. X3D uses primitives modeling for the objects contained in a scene. Primitives’ modeling is a procedure that considers geometric primitives like Spheres, Cylinders, Cones or Cubes as building blocks for more complex models. As a result, complex objects are constructed quickly and easily and the forms are mathematically defined and thus absolutely precise. Additionally, the definition language is much simpler, as is the case with XML. As such, geometry nodes in X3D scenes are fully described by obtaining their primitive object type and corresponding object name. The Geometry3D element incorporates two attributes to define an X3D geometry, ObjectType and DEF. ObjectType contains the geometric primitive type (e.g. Box, Sphere, Cone, Text, IndexedFaceSet) and the DEF attribute contains the associated name (DEF/USE). Two optional attributes are also presented in the Geometry3D element, namely convex and creaseAngle, which are used to efficiently describe complex X3D shape representations, as in the case of IndexedFaceSet and Extrusion. The convex attribute provides a Boolean value indicating whether the shape described contains convex faces, as specified in the X3D common geometry fields’ specification clause. The creaseAngle attribute specifies a float value determining the angle at which a crease appears between faces. Additional metadata elements accompanying the geometry node can be incorporated via the Metadata3D element, as defined in section 4.1. DDL representation syntax
11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Figure 2 MPEG-7 XSD Extension of Geometry3DType
Descriptor component semantics
Name
Definition Describes the types of primitive or complex shapes contained in an
Geometry3 X3D file.
DType Geometry3
Each Geometry3D element describes one shape from the corresponding X3D file. The type of each primitive geometry node is specified in the ObjectType attribute and its name definition is specified in the DEF attribute. In the case of complex shapes defined by sets of polygons, the convex and creaseAngle attributes can be used to efficiently describe the object type. The Boolean attribute convex specifies the curvature of the shape (true for planar surfaces and false for non planar or intersecting polygon shapes). The creaseAngle attribute provides a float value determining at what angle a crease appears between faces. These two attribute values can be directly extracted from the X3DGeometryNode for complex geometry nodes. If the USE attribute is used in the X3D file for the definition of a geometry node, it can be described with the Geometry3D element as well, by describing the geometry node that the USE attribute refers to.
Metadata3
Specifies the additional metadata values that an X3D geometry node may carry, such as MetadataDouble, MetadataFloat, MetadataSet as defined in the X3D standard. Metadata3D is defined separately as an extension in section 4.1, Metadata3DType.
D
D
Descriptor component example
…
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
(a)
(b)
Example 2 Geometry3D Description transformation, (a) X3D XML Geometry node, (b) Corresponding MPEG-7 Geometry3D Descriptor
The Metadata3D descriptor Geometry nodes in X3D scenes can contain metadata elements, necessary for the complete geometric content definition. As a result, the formation of a Metadata3D descriptor in conjunction with the Geometry3D descriptor assists in fully capturing all geometry semantics from an X3D document. All elements contained within the MPEG-7 Schema Extension correspond to their equivalent elements in the X3DMetadataObject node as specified in X3D, with the exception of element Ref (depicted in Figure 3), which corresponds to the XPath location of the metadata described. X3D MetadataSet elements, which contain a set of simple metadata nodes such as MetadataDouble, MetadataInteger, MetadataFloat, can also be described by reusing the value attribute to include the internal metadata nodes.
DDL representation syntax
Figure 3 MPEG-7 XSD Extension of Metadata3DType
Descriptor components semantics
Name
Definition
Metadata3
Specifies any additional metadata values contained within an X3D node, such as MatadataFloat, MetadataInteger, MetadataSet.
name
Specifies the name of the metadata type being described, as derived from the corresponding name attribute from the X3D metadata node.
DType
13
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
type
Specifies the type of metadata being described, such as MatadataFloat, MetadataInteger, MetadataSet.
reference
Contains the value of the reference field from the X3D Metadata node. As defined in the X3D standard, the specification of the reference field is optional. If provided, it identifies the metadata standard or other specification that defines the name field. If the reference field is not provided or is empty, the meaning of the name field is considered implicit to the characters in the string.
value
Contains all metadata values as derived from the corresponding X3D metadata node. In the case of the MetadataSet element, the MPEG-7 value element contains all internal metadata elements and values bound by the MetadataSet in the X3D file.
Ref
Provides the XPath location from the X3D XML graph of the metadata element being described, for indexing and retrieval purposes.
Descriptor components example
…
/X3D/Scene/…/Shape[1]/Cylinder[1]
MetadataSet
MetadataDouble 1.0 MetadataFloat
MetadataString
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Internal MetadataSet
/X3D/Scene/…/Cylinder[1]/MetadataSet[1]/Metad ataSet[1]
(a)
(b)
Example 3 Geometry3D with Metadata3D Description transformation, (a) X3D XML Geometry node with attached Metadata elements, (b) Corresponding MPEG-7 Geometry3D and Metadata3D Descriptor
The Interactivity3D Descriptor The Interactivity3D Descriptor facilitates in defining the source of an interaction event along with the ROUTE values from the X3D file. X3D object interactions are triggered events, handled by the user (i.e. Sensor, Trigger nodes) or internally through scripting nodes (Script) and internal Triggers, dependant on Timers. Events are passed between nodes using the ROUTE statement, connecting the output from the event of one node to the input of the relative node’s event. Knowledge on the source of the event assists in providing information on the type of trigger that enables the interaction in order to create an animation on an object. The Route element is necessary in order to gain information on which types of objects are involved in an animation and how they are connected to other object animations by providing their event relation through input and output fields. Moreover, the ROUTE’s XPath location in the scene is extracted for faster indexing and retrieval of the connected objects and animations.
DDL representation syntax
15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Figure 4 MPEG-7 XSD Extension of Interactivity3DType
Descriptor components semantics
Name
Definition
Interactivity3D
Describes the interactions that occur within an X3D file, specifically from the X3D ROUTE elements.
TriggerSource
Specifies the source that triggers a specific animation. It depends on the ROUTE node. If the fromNode attribute of an X3D ROUTE node refers to a node that requires the user to trigger it in order to animate, then it is considered a UserTrigger. If the node is triggered internally depending on other nodes or script functions, the TriggerSource is defined as an InternalTrigger.
Route
Describes the corresponding X3D ROUTE node. Specifies the nodes it connects together and their equivalent node types.
Type
fromNode: Specifies the node that produces the event output. It corresponds to the X3D fromNode attribute in the ROUTE element. the event output
fromNodeType: Specifies the type of node that produces
toNode: Specifies the node that requires the event output produced by the fromNode as an event input parameter. It corresponds to the X3D toNode attribute in the ROUTE element. toNodeType: Specifies the type of node that requires the event output produced by the fromNode as an event input parameter. The Route element provides the XPath location from the X3D
16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
XML graph of the ROUTE element being described, for indexing and retrieval purposes of the associated animation.
Descriptor components example
UserTrigger /X3D/Scene/ROUTE[1]
InternalTrigger /X3D/Scene/ROUTE[2] (a)
(b)
Example 4 Interactivity3D Description transformation, (a) X3D XML ROUTE node, (b) Corresponding MPEG-7 Interactivity3D Descriptor
4.2
MPEG-7 Part 5 – Multimedia Description Schemes
The root element of the MPEG-7 description allows multiple instances of complete description top-level types in a single description. In order to achieve a complete multimedia description, metadata related to the complete top-level type description must be included. Description metadata describes the metadata elements for the descriptions contained within the instance of the complete description top-level type [34]. The DescriptionMetadata Descriptor has been extended in this proposal to include two necessary metadata elements, namely Profile3D and Script3D.
The DescriptionMetadata header The DescriptionMetadata Header describes metadata concerning a description, such as identifying the description (privately or publicly) and describing the creation of the description and version of the description [34]. Metadata descriptions concerning X3D metadata information have been implemented by the additional Profile3D and Script3D datatypes.
17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Profile3D is necessary to be included in the metadata description in order to specify the additional functionalities provided with the multimedia content in the X3D file. The Script3D Description Scheme incorporates the scripting class locations in order to achieve a complete description of the X3D Scene, by providing knowledge on whether external classes are needed to specify the various interactions that occur.
Extended DescriptionMetadata header syntax
…
Figure 5 MPEG-7 XSD Extension of DescriptionMetadataType
Extended DescriptionMetadata header components semantics
Name
Definition
DescriptionMetadata
Header describing metadata concerning a description including information identifying the description (privately or publicly), and describing the creation of the description, the version of the description, and the rights associated with the description. DescriptionMetadataType extends HeaderType.
Profile3D
Specifies the profile of the X3D file to which the description metadata is attached, as defined in section 4.2.
Script3D
Specifies the Script type and Script access information of the X3D file to which the description metadata is attached, as defined in section 4.2.
Type
Extended DescriptionMetadata header syntax example
3.1
… Immersive …
JavaScript
18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
(a)
(b)
Example 5 Extended DescriptionMetadata transformation, (a) X3D XML root node and Script node, (b) Corresponding MPEG-7 DescriptionMetadata header description
The Profile3D type Profile3D Type corresponds to the value of the profile attribute in the X3D root element as defined in an XML-encoded X3D file. X3D files require the definition of the profile in use, in order to gain all of the additional functionality associated with the scene and content. Consequently, it is essential to integrate the profile type in the MPEG-7 description as well, informing an X3D browser on additional requirements.
Definition of Profile3D datatype
Figure 6 MPEG-7 XSD Extension definition of Profile3D
Profile3D datatype semantics
Name Profile3 DType
Definition Indicates the profile type as defined in the corresponding X3D file. The types of profiles are defined as follows. • Core – defines all of the data types, and basic structure of X3D. • Interchange – is the basic profile for communicating between applications. It supports geometry, texturing, basic lighting, and animation. There is no run time model for rendering, making it very easy to use and integrate into any application. • Interactive – enables basic interaction with a 3D environment by adding various sensor nodes for user navigation and interaction (e.g., PlanseSensor, TouchSensor, etc.), enhanced timing, and additional lighting (Spotlight, PointLight). • MPEG_interactive – is a small footprint version of the Interactive profile designed for broadcast, handheld devices and mobile phones and is designed for use by the MPEG-4 specification.
19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
• Immersive – enables full 3D graphics and interaction, including audio support, collision, fog, and scripting. It is the closest match to VRML97. • Full – includes all defined nodes in the specification, including NURBS, H-Anim and GeoSpatial components. • CADInterchange – corresponds to the CDF (CAD Distillation Format) profile which is in development to enable translation of CAD data to an open format for publishing and interactive media.
Script3D X3D files may contain Script elements that define interactions and animation of objects within the scene. Scripts can be defined either internally as part of the X3D Script node (Javascript) or externally through other Java or Jscript classes. The proposed Script3D datatype extension is defined to specify whether the script class is contained internally in the X3D content file described, or situated in an external file location. Apart from the location, the datatype contains the type of scripting language involved. Knowledge on the location of the Script is an essential metadata element, as it assists in indexing all associated data with the X3D multimedia file, especially external Script classes which can be located separately from their respective X3D content. Additionally, the Script3D element provides information on the compiling requirements that an X3D browser must conform to, dependant on the scripting language.
Definition of Script3D datatype
Figure 7 MPEG-7 XSD Extension definition of Script3D
Script3D datatype semantics
Name
Definition Describes the scripting type of the X3D Script node. Script3DType extends
Script3 DType
DSType.
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
extern
Specifies the external scripting class that performs the corresponding script functionalities on the X3D file described. When an X3D Script node contains a url attribute referencing an externally located class, this element specifies the type of scripting class described. External scripting classes can be Java classes (.class) or Jscript classes (.js).
interna
Specifies that the Script node contains internal script functions and does not contain an outer scripting class reference. The internal script type is Javascript by default, as specified by the X3D standard.
alScript
lScript
Script3D datatype example
… JavaScript …
Java
(a)
(b)
Example 5 Extended Script3D transformations, (a) X3D XML Script node, (b) Corresponding MPEG-7 Script3D Description
5 Implementation issues and results Concluding this research, an XML XSD Schema Definition of the MPEG-7 extensions for X3D has been created, based on the proposals defined in the Section above. Additionally, an XSLT based application performing the automatic transformation from X3D to MPEG-7 XML files has been developed according to the extended MPEG-7 XSD Schema. This application utilizes an XSLT (Extensible Stylesheet Language Transformations) algorithm to provide automatic transformation from the X3D XML encoded file to the equivalent MPEG-7 XML description file. The generated XML file is subsequently validated against the extended MPEG-7 XSD Schema, through this application, compelling the generated XML file to conform to the extended MPEG-7 standard. A complete example of the MPEG-7 annotation of an X3D scene follows, where the X3D scene consists of two doors attached to a wall. Both doors are activated through a TouchSensor node without the use of Script function controls and textures are applied to the wall and door shapes. In the following example, the X3D scene consists of three shapes; an IndexedFaceSet node creates the wall structure and another IndexedFaceSet node creates the door shape. The door shape is then applied twice, both to the left door transformation where the original shape structure is found (DEF), and to the right door by referencing the Grouping Node “Door” where the shape resides (USE). The corresponding MPEG-7 geometry description provides a DescriptorCollection relating to each Grouping or Transformation node that contains geometry properties. For each such X3D node, the MPEG-7 DescriptorCollection contains one BoundingBox3D descriptor enclosing the volumetric characteristics of the elementary shape as well as one Geometry3D descriptor per
21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
equivalent geometry node found within the Grouping node. For instance, the X3D Shape node defining the Wall shape consists of one IndexedFaceSet without any additional X3D BoundingBox declared. Therefore, the corresponding MPEG-7 geometry DescriptorCollection for the wall consists of one BoundingBox3D descriptor with the default values defined (X3D bboxSize [-1, -1, -1], X3D bboxCenter [0, 0, 0]) and of one Geometry3D descriptor where the IndexedFaceSet is defined as the ObjectType along with its name reference, “Wall” (DEF). With a similar technique, MPEG-7 DescriptorCollections are created for each shape that contains interactivity properties. For instance, the Wall shape is immobile and provides no interaction properties. Therefore, no DescriptorCollection is created. On the other hand, both doors contain separate TouchSensors. When clicked, the corresponding door will open, thus resulting in an animation. As seen in the MPEG-7 annotation example, two DescriptorCollections are created, one for each door. For instance, the “Interaction_DoorLeft” DescriptorCollection contains one MotionTrajectory descriptor annotating the X3D Orientation Interpolator properties describing the animation that occurs once the door is clicked. The X3D Orientation Interpolator will rotate the door open, with the hinges as the center of rotation. The MotionTrajectory descriptor describing the Interpolator consists of the key and keyvalue attributes as obtained from the Interpolator itself, annotated respectively as the MotionTrajectory MediaRelIncrTimePoint and KeyValues elements. To complete the DescriptorCollection, all X3D ROUTE nodes are annotated via the Interactivity3D descriptor. In the example, the first ROUTE node found after the Group Door connects the events between the TouchSensor clicked and a BooleanFilter. As the TouchSensor is the first involved event in the ROUTE node, the corresponding MPEG-7 Interactivity3D descriptor contains “UserDefined” as a TriggerSource description, due to the fact that TouchSensors are activated when clicked on by end-users. At the same time, the Interactivity3D descriptor contains the X3D ROUTE XPath location in the annotated Route element. The Interactivity3D Route element also contains the node names and node types involved in this particular interaction (namely TouchSensor and BooleanFilter). In order to allow for efficient content retrieval all geometries are indexed through their corresponding XPath location in the X3D scene by creating MPEG-7 ContentCollections for each X3D Grouping type node. Each ContentCollection created for a Grouping type node contains the XPath locations for all children nodes that contain any type of Shape node. In a similar manner, MPEG-7 ContentCollections are formed to index all textures involved in each X3D shape appearance. MPEG-7 Collection descriptions contain all available textures for a certain shape by providing a MediaLocator descriptor with each corresponding texture URL of filename as found in the X3D Appearance node.MPEG-7 ContentCollections serve the purpose of faster indexing and retrieval of X3D content and provide customized retrieval queries. For instance, after a certain shape is chosen by the end-user, applicable textures can be referenced through the MPEG-7 Collection descriptor containing the set of available textures regarding that particular shape.
Picture 1 Two doors activated by different fields from a TouchSensor node
... ...
22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65