Using a Mobile Phone for 6 DOF Mesh Editing

12 downloads 9455 Views 187KB Size Report
Mobile Phone Augmented Reality, content creation, 3D interfaces, mobile computer ...... Proceedings of the conference on Graphics interface '92, p.151-156 ...
Using a Mobile Phone for 6 DOF Mesh Editing Anders Henrysson VITA Linköping University ITN, Campus Norrköping 601 74 Norrköping, Sweden

Mark Billinghurst HIT Lab NZ University of Canterbury Ilam Road Christchurch, New Zealand

ABSTRACT

the establishment of the desktop metaphor, existing 2 DOF interaction devices were used for 3D input as well. However, interaction with 3D data requires that the interaction device supports 6 DOF as 3D manipulation includes both positioning and rotation, each with 3 DOF. The inappropriateness of existing desktop interfaces for 3D data interaction has driven researchers to look at other types of interaction devices supporting more degrees of freedom. One of the primary reasons for this is that the more degrees of freedom of interaction, the more advanced sensor technology is required to obtain data about each degree.

This paper describes how a mobile phone can be used as a six degree of freedom interaction device for 3D mesh editing. Using a video see-through Augmented Reality approach, the mobile phone meets several design guidelines for a natural, easy to learn, 3D human computer interaction device. We have developed a system that allows a user to select one or more vertices in an arbitrary sized polygon mesh and freely translate and rotate them by translating and rotating the device itself. The mesh is registered in 3D and viewed through the device and hence the system provides a unified perception-action space. We present the implementation details and discuss the possible advantages and disadvantages of this approach.

One family of devices that has not really been considered for 6 DOF input is the mobile phone. However, the combination of a built-in camera, computer vision software, and a reference coordinate system allows a mobile phone to be tracked in three dimensions and used as a 6 DOF input device. In addition to the camera some mobile phones are equipped with accelerometers, gyros and digital compasses. This potentially allows the device to be tracked with higher fidelity and over a larger range than with cameras alone. With the migration of Augmented Reality (AR) technology to the mobile phone domain, a mobile phone can be tracked in 3D and turned into a magic lens through which the user sees 3D graphics superimposed over the real world.

Author Keywords

Mobile Phone Augmented Reality, content creation, 3D interfaces, mobile computer graphics ACM Classification Keywords

H5.2. User Interfaces: Input devices and strategies, Interaction styles. INTRODUCTION

The desktop metaphor in human computer interaction has been a great success. The vast majority of computing platforms use this metaphor for their user interface and hence input devices have been optimized to support 2D interaction. Being a 2D plane, the desktop only requires the input device to support two degrees of freedom (DOF) corresponding to the two dimensions. This is true for both 2D navigation, i.e. establishing the location and identity of structures in 2D space, and 2D manipulation, i.e. creating, modifying and positioning objects in 2D space.

In this paper we look at how a camera equipped mobile phone can be used for the complex task of editing the geometry of a 3D polygonal mesh. This is the first paper that reports on the use of a mobile phone for geometry editing, and the work is significant because it describes interaction tools that could be come common on phones as they are used for more 3D graphics, AR applications and gaming. In this paper we first review related work and 3D interaction guidelines. We then introduce the system we have developed and the mesh editing application details. Next we discuss impressions from using the interface, important areas for future work, and our conclusions.

With more advanced graphics hardware computers became capable of representing data in three dimensions. Since this development reached mainstream computer systems after

RELATED WORK

Our application involves a tight integration between computer vision and 3D graphics on a mobile phone platform. There have been a number of earlier examples of computer vision based input on mobile phones that we can draw from in this research.

1

Two of the best known examples of camera-based interaction with mobile phones are the games "Mosquito Hunt" [9] and "Marble Revolution" [12]. In "Mosquito Hunt", virtual mosquitoes are superimposed over a live video image from the camera and simple motion flow techniques are used to allow the user to shoot the mosquitoes by moving the phone. Similarly, in the "Marble Revolution" game the player can steer a marble through a maze by moving the phone and using motion flow techniques. Some applications use more complex interaction techniques. The virtual soccer game of KickReal [13] allows people to see a virtual ball superimposed over video of the real world and kick it with their feet, but there is no 3D object manipulation. The "Symball" application [4] allows users to hit balls at each other, although with limited 3D tracking. On their phone screen players can see a table tennis table and a virtual paddle. They select a real color that they would like their phone to track and as they move the phone relative to this color, the paddle moves in the x-y direction on the screen. Although these examples are interesting, they do not support true six degree of freedom input. In contrast, there are several examples of handheld devices tracked in a global reference frame with 6 DOF input. For example, in Rekimoto's Transvision interface [14] two users sit across the table and see shared AR content shown on handheld LCDs panels attached to fixed desktop PCs. They can select objects by ray casting and once selected objects are fixed related to the LCD and can be moved. In this case the LCD panel has a magnetic trackers attached to it to provide 6 DOF input. The ARPAD interface [11] is similar, but it adds a handheld controller to the LCD panel and uses computer vision input. Selection is performed by positioning virtual cross hairs over the object and hitting a controller button. Once selected, the object is fixed in space relative to the LCD panel and so can be moved by moving the panel. The object can also be rotated using a trackball input device, thus ARPAD decouples translation and rotation. Both of these interfaces show 3D interaction methods that are useful for handheld displays. However they also both were not stand alone systems as the LCD panels were connected to desktop PCs. More recently there are examples of self contained stand alone handheld applications that combine graphics and computer vision. For example, the Invisible train [17] uses a PDA to view and interact with augmented reality content. In this case users can select virtual models directly by clicking on the model with a stylus and drive around a virtual train model. Hachet [3] has developed a 3 DOF bimanual camera based interface for interaction both on the device itself and for using a PDA as a 3D mouse. The approach is similar to ours in that it establishes the position and orientation of the device by analyzing the video stream captured by the camera. Roh’s Visual Codes [15] is an example of mobile phone barcode reading. By recognizing

and tracking a pattern, the phone movements can be estimated and used as input. The pattern can also be associated with phone functions and act as a menu item. Hansen’s Mixed Interaction Spaces [5] uses a similar approach by tracking a circle. By visually tracking real objects, the camera phone can be used for 6 DOF input. The first researchers to explore selfcontained AR on mobile phones were Moehring and Henrysson. Moehring [10] used 3D markers on which a coordinate system was printed. Henrysson ported ARToolKit1 to the Symbian operating system and created an application that augmented a map with the current tram positions derived from a timetable [6]. A first step to towards interaction with 3D data using an AR enabled mobile phone was AR Tennis [7] where two players sitting face-to-face played tennis using the mobile phones as racquets. The interaction was limited to the collision between the device and a virtual ball in the marker space between the players. Henrysson also conducted a user study [8] comparing different interaction techniques for translation and rotation of 3D objects using a mobile phone with AR. In this case the user selected a virtual object by positioning cross hairs over it and using a button on the keypad. Once selected the object was locked relative to the device. Translation and rotation could then be performed by translating and rotating the device respectively. Translation and rotation could also be performed using the keypad. Each axis was mapped to two buttons for decrementing and incrementing the transformation. Their work showed how atomic actions (selection, positioning and rotation) could be implemented on a mobile phone using camera tracking to create a full 6 DOF interface. With their system they were able to manipulate virtual objects and assemble a scene consisting of several distinct objects. However it was not possible to select individual vertices to actually edit the objects. The goal of our current research is to apply these earlier methods to mesh editing on a mobile phone. To do this a dynamic mesh needs to be implemented and the system must also support selection of more than 256 elements. Further, multiple selection needs to be implemented for uniform transformation of several vertices. To avoid sharp corners the application must also support mesh smoothing where the effect of an action applied to one vertex is distributed over its neighbors. Manipulating individual vertices is more challenging than merely moving a handful of individual objects as in earlier applications. 3D INTERACTION GUIDELINES

Constructing a device that is tracked with 6 DOF is a nontrivial engineering task. Such a device must be able to sense both rotation and translation in 3D in order to be useful. Even though it is possible to construct such devices there 1

ARToolKit website: www.hitl.washington.edu/artoolkit/

are no established standard for 3D input devices since there is limited knowledge about what properties a good 6 DOF input device should have. Zhai [18] lists six aspects to the usability of 6 DOF input devices: • Speed • Accuracy • Ease of learning

DOF. For tracking we use the Symbian port of the ARToolKit library which uses computer vision techniques to detect squares in the image and match the interior pattern with known templates to establish the main orientation and identity of the marker. It then uses the corner points of the square to iterate the camera matrix relative in the coordinate system defined by the marker. The marker paper is thus the x-y plane of the 3D application running on the phone. The camera image is converted to a texture and rendered as the background of the scene. This provides reference points from the real world when manipulating virtual content. It also helps the user to stay within range since at least one marker needs to be visible at all times. Failing to frame an entire square will cause the application to only render the camera image and by so giving the user as much feedback as possible to help her to get back within tracking range. There is no limit for how many markers can be used to extend the tracking range. Currently we use three markers (8 by 8 cm) on an A4 paper.

• Fatigue • Coordination • Device persistence and acquisition

Coordination can be measured by the ratio between the actual and the most efficient trajectory. To maximize coordination a device must be able to alter all degrees of freedom at the same time and at the same pace. Device persistence is the extent to which the device stays in position when released. We can see that using a 2D mouse for 3D interaction is suboptimal since it would score low on coordination due to it only supporting two degree of freedom input at the same time. Swapping between the DOF pairs prohibits a coordinated movement in 6 DOF space. Even with sophisticated mappings such as the ArcBall technique [16] the mouse is still inferior to integrated 6 DOF devices.

The system supports three atomic actions: selection, positioning, rotation. The selection and positioning of individual vertices are fundamental requirements for mesh editing. There is no point in rotating individual vertices since they are nothing but points. However when multiple vertices are selected and the user wants to keep the relationship between them, rotating the entire selection is an important feature. Rotation adds 3 DOF to the 3 DOF required for positioning. This 6 DOF requirement rules out simpler tracking approaches for mesh editing.

The naturalness of an interaction device can be obtained by comparing it to how we interact with the real world. In general a device meets the requirement for naturalness if the atomic actions provided by the device match the atomic actions required to perform a task. If the device fails to provide the atomic actions e.g. the device requires the user to switch between pairs of DOF it is considered unnatural. In [1] Aliakseyeu outlines five design guidelines for natural interaction devices: • Two-handed interaction is preferred to one-handed interaction. • Visual feedback is important for creating a feeling of spatial awareness. • The action and perception space should coincide. • Minimal use of intrusive devices, such as headmounted displays, should be preferred. • Wireless props are preferred to wired ones.

User Interface

When the application is started and the camera frames an entire square, the user sees a white mesh overlaying the marker in a 160 by 120 pixel window (See Figure 1).

From this we can see that a camera tracked mobile phone using AR should achieve a high degree of naturalness. It is a wireless device operated by one hand. It gives visual feedback and by using AR the action and task spaces coincide. Since only one hand is needed to operate the device the other can be used for interaction as well, so it does support two-handed interaction. In our scenario we have a physical marker, usually a piece of paper that can be manipulated by the other hand. Typically the dominant hand is used to operate the phone (micrometric function) while the non-dominant hand is used for the stabilizing action of moving the paper marker (macrometric function).

Figure 1. The mesh editing application on a mobile phone

Individual vertices are represented by small red cubes that can be selected for manipulation. Selection is made by positioning the phone so that blue square at the center of the screen intersects or includes the vertex to be selected and

SYSTEM OVERVIEW

Our setup consists of a mobile phone (Nokia 6680) and a paper marker. The camera is used to track the marker with 6

3

then pressing the joypad button. When successfully selected, the vertex turns white. The vertex is now locked in a fixed relation to the phone defined by their relation when selected. Positioning is performed by moving the phone itself while keeping the joypad button pressed. The resulting effect on the mesh is perceived in real-time as it is deformed. Shading the mesh provides important 3D cues since the mesh otherwise would be rendered with a uniform color making it difficult to see the applied vertex displacements. When released the vertex turns red again. Several vertices can be moved one after the other to create complicated mesh deformations (figure 2).

Figure 2. A more complex mesh deformation.

To select multiple vertices at the same time the user uses the '2' key in a similar way. The selected vertices turn yellow and the user can "paint" the mesh by holding down the '2' key while moving the device relative to the mesh. When the joypad button is pressed all selected vertices turn white and are locked relative to the camera. The phone now acts as a true 6 DOF device allowing the user to freely translate and rotate the selection of vertices while maintaining their spatial relation. Figure 3 shows the result of several mesh nodes being selected and moved. Currently only vertices can be selected. Neither the mesh itself nor individual polygons or edges can be selected. In the future it would be useful to be able to manipulate the mesh as an object and change its position and orientation relative to the global coordinate system defined by the marker. This could be easily achieved by adding a transformation node for the object. However selecting the entire mesh object could be problematic if all the vertices needed to be selected. Selecting a thin edge would be even harder. To do this the user must be able to explicitly choose selection mode to avoid selecting the wrong element. IMPLEMENTATION DETAILS

The system consists of a tracking module to calculate the camera matrix and a graphics module to represent and render the mesh and vertex glyphs. In addition to this we need to solve how to select objects and to discriminate between large numbers of them.

Figure 3. Moving several mesh nodes at the same time. Tracking

For tracking we use a port of ARToolKit to Symbian. As stated above ARToolKit is a computer vision based tracking system that establishes the 6 DOF camera pose relative to a 2D marker. It was chosen for being easy to deploy while providing accurate 3D tracking. To port it we wrote a C++ wrapper class in order to get rid of global variables, which are prohibited by Symbian. However, the mobile phones we are targeting lack a floating point unit, making floating-point arithmetic orders of magnitude slower than integer arithmetic. To overcome this, we wrote our own fixed-point library featuring variable precision. We did extensive performance tests to select the algorithms that ran fastest on the mobile phone. The average speed-up compared to corresponding floating-point functions was about 20 times. The resulting port runs several times faster (at 10 fps on a Nokia 6680) than the original port that used only double precision floating point. Some accuracy was lost when converting to fixed point but was perceived as acceptable. The estimated camera matrix is then used as the modelview matrix in the 3D graphics pipeline.

Mesh Representation

Selection

The Nokia 6680 ships with a software implementation of OpenGL ES. In comparison to desktop OpenGL, memory and processor demanding functions such as 3D texturing and double precision floating point values have been removed. A 16:16 fixed-point data type has been added to increase performance while retain some of the floatingpoint precision. The most noticeable difference is the removal of the immediate mode in favor of vertex arrays. To edit a mesh it must be dynamic allowing the user to position individual vertices. Since Symbian does not permit any global variables the vertex and normal arrays are usually declared constant, which limits the dynamic properties of objects. The solution is to calculate the mesh data for each frame from an array of vertex objects.

To be able to select an individual object it needs to have a unique property that can be obtained when performing selection. In desktop Open GL applications a ray is cast to check for intersection in 3D. This is not easily implemented in OpenGL ES. Alternatively, selection can be performed in the 2D color buffer by assigning a unique color to each object and sample the color value at the pixel coinciding with the pointing device. In the previous work this approach is taken and a unique alpha value is assigned to each object. This works fine for scenes without transparency and with a limited number of objects. The OpenGL ES glReadPixels() function can be used to obtain the color value for a certain pixel, returning a 32 bit integer normally (8 bits are used for alpha), which limits the number of distinct objects to 256. This is too few to represent a polygon mesh as software rendering on mobile phones is capable of rendering thousands of polygons per second.

So far we have only edited meshes created, for experimental purposes, by the application. To do this we declare the width and height of the planar mesh. A procedural algorithm distributes the vertices uniformly in the x-y plane forming a 2D grid. It also provides each vertex with a unique id. To be able to render the mesh spanned by the vertices we need to create a vertex list and a list of indices indicating which three vertices belong to the same triangle. The vertex list is easily calculated by traversing the array of vertex objects and copying their position. The indices are calculated for two triangles at the same time. The triangles are part of the same quad for which the current vertex is the upper left corner. Traversing the vertex list makes it easy to produce the indices.

The proposed workaround is to use the entire RGBA vector for discrimination. This allows discrimination of 232-1 objects. Storing nothing but the position of that many objects would require a memory of more than 50 GB. Letting one vertex occupy one pixel on a Nokia 6680 (screen resolution: 176 x 208) less than 1/100000th of the maximal mesh would be visible in any single frame. We think this approach to discriminate between objects will be sufficient. The pixel value is defined by an unsigned 32 bit integer while the color vector consists of four fixed-points each represented by a 32 bit integer where 16 bits are used for precision. To create a color vector from a 32 bit id we mask out four 8 bit integers and shift them to the integer position in the fixed-point data type.

To provide shading we need to calculate normal vectors. In OpenGL ES normals are defined per vertex and not per polygon. To compute the normals we first calculate the normals for each polygon using the cross product of the edge vectors calculated using the indices and the vertex list. We then traverse the vertex list and create a normal vector by calculating the average of the normal vectors from the polygons containing the vertex. We verified the resulting normalized vectors by rendering them as lines (Figure 4).

To keep the integrity of the id when used as a color no transparency or shading can be used. We use a two pass rendering stage to work around this. In the first step shading and transparency are turned off and the vertex glyphs (cubes) are rendered into the color buffer. After that we sample the pixel values at the locations of interest. The next step is to render the glyphs visible to the user with optional shading and transparency and with a uniform color. This is done by simply scaling up the cubes and rendering them on top of the color-coded ones. As mentioned before it is not enough to sample only one pixel since the vertices are relatively small and the combination of possible human movement difficulties and tracking errors makes it hard to intersect a vertex occupying only a few pixels with a cross hair. In a 10 by 10 pixel square in the center of the screen we sample five pixels in a five spot pattern found on a dice. Ideally the background texture is rendered after the color-coded cubes so that no object gets selected if any of the sampled pixels contains a color value identical to one of the vertices. The list of objects is traversed to see if any of the sampled color values match the object id. If there is a match the corresponding vertex is selected. Selecting two neighboring

Figure 4. Vertex normals shown as vertical lines.

5

vertices - both inside the square - must avoided, so the different sampling points can be ranked with the central pixel being the most important. Selecting multiple objects he works in the same way. The selection routine described above is carried out only when the user presses either the joypad button or the '2' key. Thus there might be a one frame delay between pressing the selection button and sampling the color buffer. If the tracking is unstable this might compromise the selection precision. To prevent this we have implemented a frame-toframe coherency threshold to make the tracking more stable and faster. If the aggregated length of the displacement vectors of the marker corner points is below the threshold the camera pose from the former frame is used in the rendering pipeline. Manipulation

It is important to have a direct visual feedback from the performed actions. In the previous work the selected object was only transformed back from the camera space to the marker space when it was released. During interaction it was only represented in camera space and could not interact with an object in the marker space. Since the resulting mesh is in the marker space we must transform it back into the marker space at each frame to see the effect of the displacement during the last frame. To calculate the final transformation of each selected vertex the transformation at the selection event or previous frame must be stored for comparison. To keep the selected object locked to the camera we cannot perform a genuine release event when calculating the effect of the ongoing transformation since the tracking error would introduce a slight displacement and make the selected object jitter. This would diminish the feeling of having total isomorphic control of the selected object. Thus we need yet another representation of the object's position. Storing this information for each object increases the memory requirements and is one of the current bottlenecks in our implementation.

appear smoother. We currently only affect the closest neighbors (d=1) and apply half the displacement to them, but it would be possible to iterate the algorithm to spread the effect over a larger area and also use a non-linear function such as a Gaussian function. When multiple vertices are selected only vertices not being selected must be affected. EXPERIMENTS AND RESULTS

We have not currently carried out formal user studies yet, but we have tested the application informally to get a feeling for how it performs and where the bottlenecks are. We have tested the following configurations: • • • • • •

Selection of individual vertices Translation of individual vertices Translation of individual vertices with smoothing Selection of multiple vertices 6 DOF transformation of multiple vertices 6 DOF transformation of multiple vertices with smoothing

From testing with these configurations we found that it was sometimes difficult to perceive depth with a uniformly colored mesh, even with shading. To overcome this problem we decided to also render the edges of the polygons using a darker shade of gray (figure 5). This improved the understanding of the mesh geometry but also made the display more cluttered and made it harder to perceive when a vertex was selected in the case of multiple selections.

Mesh Smoothing

Editing one vertex at a time is a tedious task. If the user wants several vertices to be moved while their spatial relationship to each other remains constant, the multiselection option described above would be ideal. Another scenario in mesh editing is that the user wants the displacement of one vertex to proportionally affect the vertices in the neighborhood defined by some distance metrics. One example is to make sure the mesh does not contain any sharp, unnatural corners. Using parametric surfaces solves this problem but is too demanding for current mobile phone CPUs. We find the adjacent vertices in a way similar to the one used when calculating the normal for each vertex. The current displacement vector of a selected vertex is scaled down and added to the adjacent vertices making the mesh

Figure 5. Polygon edges being shown.

In desktop animation packages the user has the option to make a high quality rendering to get a better understanding of the surface appearance. Such an option makes little sense on a limited device such as the mobile phone. To produce a clutter free rendering we added the option to see only the mesh without vertices or edges. The combination of these options made it easier to understand the surface geometry.

Most tests were performed with a small mesh consisting of 256 vertices (16 by 16). When the number of vertices were increased to 1024 (32 by 32) it was hard to separate distant vertices from edges. The tracking range also proved to be too restrictive and the frame rate dropped to below an acceptable level (less than 5 fps). The slightly unstable computer vision tracking combined with human error sometimes made it very hard to select distant vertices.

Though not being the limiting factor, the vertex representation should be improved. The only data that is necessary for each vertex is the position and identity. The identity could in fact be computed from the position in the vertex array. This will bring down the memory requirements for large meshes. It is also unnecessary to triangulate the entire mesh for each frame despite it not being edited or only partially modified.

Selection worked well but not flawlessly. It might need more than five samples within the square not to miss the target intended for selection. Selecting multiple vertices also worked well but was time consuming when many adjacent vertices were to be selected. Selecting a row of vertices was easy using the non-dominant hand to orient the paper markers so that one coordinate axis was parallel to the users line of sight and then tilting the phone so that the camera z-axis swept one row of vertices while the selection key was pressed.

We need to perform a formal user study to evaluate the mobile phone as an interface tool and compare it to current solutions for 3D interaction such as the PC mouse and the Phantom pen. However it would not be fair to compare it to more mature technologies before tracking and graphics issues have not been addressed. Since there is no point in implementing anything but simple modeling applications on the phone, there must be a way to connect the phone to a PC for sharing data with animation packages and act as another input device perhaps for manipulating control points defining a NURBS surface. One such platform is CMAR [2] that shares a scene-graph between devices connected using Bluetooth. One CMAR client can run on a PC to fully exploit the rendering performance and ability to output to large displays while another can run on an AR-enabled mobile phone acting as a 6 DOF interaction device.

Manipulation worked as intended and while the isomorphic interaction made translation very fast and intuitive it was still hard to perform rotation due to the user’s limited range of arm motion. Traditional clutching was not possible since all objects are unselected once the joypad button is released. The user also needs to be able to view the display with a reasonable viewing angle in order to get the necessary visual feedback, and so cannot rotate the display far away from them. This is the drawback with a unified perceptionaction space on a handheld device and a constraint for the interaction range.

Besides potentially high rendering quality and providing a 6 DOF interface, a camera-equipped phone is interesting due to the tight coupling between the camera and the CPU, both of which are steadily improving. In the future this will enable the exploration of new research areas, such as live facial capture and model reconstruction, using the phone to control external resources in a ubiquitous computing scenario, and mobile outdoor AR applications.

DISCUSSION AND FUTURE WORK

From the design guidelines we could see that a camera tracked mobile phone using AR has the potential for being a 6 DOF interaction device with a high level of naturalness. This impression was confirmed after testing our system for mesh editing. Users found it a fast and intuitive interaction device where the mixing of real and virtual information provides useful cues. However, as an interaction device for mesh editing it has its limitations, which should be remedied with faster processors, better tracking algorithms, and dedicated graphics hardware.

CONCLUSION

In this paper we have demonstrated a system for editing a 3D polygon mesh by using a mobile phone that acts as a 6 DOF isomorphic interaction device. This application shows that it is indeed possible to run complex graphics applications on consumer level mobile phones, and the interaction techniques used for handheld interfaces can be successfully applied.

When it comes to tracking there are stability issues with computer vision based tracking that can become a severe problem for selection of objects with a small area on the screen. These issues might be solved using filtering, higher camera resolution and adoption of accelerometers. The tracking range is also a bottleneck that might be removed by constructing nested markers and by combining markerbased tracking with motion estimation or accelerometers.

Our initial experiments have shown that a camera phone can be used as a 6 DOF interaction device, however there are improvements that could be made. We identify the tracking range, rendering quality and tracking stability as bottlenecks for this type of interaction metaphor. Resolving the issues for mesh editing would benefit scene assembly and lay the foundation for future mobile phone AR games, and applications.

The rendering quality will no doubt increase with higher screen resolutions (geometric and photometric) and a GPU with vertex and pixel shaders. The current frame rate is about 10 fps and interaction will benefit from smoother animation and less delays.

Our preliminary tests have been promising and in the near future we will conduct formal user studies to evaluate our selection and manipulation techniques and improve them.

7

REFERENCES

1. Aliakseyeu, D., Subramanian, S., Martens, J.-B., Rauterberg, M. 2002. Interaction techniques for navigation through and manipulation of 2d and 3d data. In EGVE ’02: Proceedings of the workshop on Virtual environments 2002, Eurographics Association, Aire-laVille, Switzerland, Switzerland, 179–188. 2. Andel, M., Petrovski, A., Henrysson, A., Ollila, M. 2006. Interactive collaborative scene assembly using AR on mobile phones. In Proceedings of ICAT 2006 (to appear). 3. Hachet, M., Pouderoux, J., Guitton, P. 2005. A camerabased interface for interaction with mobile handheld computers. In SI3D ’05: Proceedings of the 2005 symposium on Interactive 3D graphics and games, A. Press, Ed., 65–72. 4. Hakkarainen, M., Woodward, C. 2005. SymBall Camera driven table tennis for mobile phones. In ACM SIGCHI International Conference on Advances in Computer Entertainment Technology (ACE 2005). 5. Hansen, T. R., Eriksson, E., Lykke-Olesen, A. 2005. Mixed interaction space: designing for camera based interaction with mobile devices. In CHI ’05: CHI ’05 extended abstracts on Human factors in computing systems, ACM Press, New York, NY, USA, 1933–1936.

9. Marble Revolution wesbite. http://www.bitside.com/ entertainment/MOBILE%20GAMES/Marble 10. Moehring, M., Lessig, C., Bimber, O. 2004. Video SeeThrough AR on Consumer Cell Phones. In International Symposium on Augmented and Mixed Reality (ISMAR’04), 252–253. 11. Mogilev, D., Kiyokawa, K., Billinghurst, M., Pair, J. 2002. AR Pad: an interface for face-to-face AR collaboration. In CHI ’02: CHI ’02 extended abstracts on Human factors in computing systems, ACM Press, New York, NY, USA, 654–655. 12. Mosquito Hunt website:http://w4.siemens.de/en2/html/ press/newsdesk_archive/2003/foe03111.html 13. Paelke, V., Reimann, C., Stichling, D. 2004. Footbased mobile interaction with games. In ACE ’04: Proceedings of the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology, ACM Press, New York, NY, USA, 321–324. 14. Rekimoto, J. 1996. Transvision: A Hand-held Augmented Reality System for Collaborative Design. In Virtual Systems and Multi-Media (VSMM)’96. 15. Rohs, M. 2004. Real-World Interaction with Camera Phones. In 2nd International Symposium on Ubiquitous Computing Sysmtes (UCS2004).

6. Henrysson, A., Ollila, M. 2004. UMAR: Ubiquitous mobile augmented reality. In MUM ’04: Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia, ACM Press, New York, NY, USA, 41–45.

16. Shoemake, K. ARCBALL: a user interface for specifying three-dimensional orientation using a mouse, Proceedings of the conference on Graphics interface '92, p.151-156, September 1992, Vancouver, British Columbia, Canada

7. Henrysson, A., Billinghurst, M., Ollila, M. 2005. Face to face collaborative AR on mobile phones. In ISMAR ’05: Proceedings of the Fourth IEEE and ACM International Symposium on Mixed and Augmented Reality, IEEE Computer Society, Washington, DC, USA, 80–89.

17. Wagner, D., Pintaric, T., Ledermann, F., Schmalstieg, D. 2005. Towards Massively Multi-User Augmented Reality on Handheld Devices. In Third International Conference on Pervasive Computing (Pervasive 2005).

8. Henrysson, A., Billinghurst, M., Ollila, M. 2005. Virtual object manipulation using a mobile phone. In ICAT ’05: Proceedings of the 2005 international conference on Augmented tele-existence, ACM Press, New York, NY, USA, 164–171.

18. Zhai, S. 1998. User performance in relation to 3d input device design. SIGGRAPH Comput. Graph. 32, 4, 50– 54.