2.1 SfM. In two test runs recordings are taken in video mode with Canon 5D MKII (50 mm), GoPro Hero3 (Wide Mode) and iPhone 4S each with a defined small ...
Practical Usefulness of Structure from Motion (SfM) Point Clouds Obtained from Different Consumer Cameras Patrick Ingwer, Fabian Gassen, Stefan P¨ ust, Melanie Duhn, Marten Sch¨alicke, Katja M¨ uller, Heiko Ruhm, Josephin Rettig, Eberhard Hasche, Arno Fischer, Reiner Creutzburg Fachhochschule Brandenburg - Brandenburg University of Applied Sciences, Fachbereich Informatik und Medien, P.O.Box 2132, D-14737 Brandenburg an der Havel, Germany {ingwerp|gassen|puest|duhn|schaelic|muellerk|ruhm|rettigj| hasche|fischer|creutzburg}@fh-brandenburg.de ABSTRACT This paper deals with the usefulness and accuracy of point clouds obtained by different consumer cameras out of structure from motion (SfM) algorithms. It summarizes some research results of the practical usage of the SfM method in applications where highly accurate point clouds are required. Keywords: structure from motion, point cloud, 3D modelling, match moving, camera tracking
1. INTRODUCTION – PROBLEM STATEMENT Key technologies are moving a lot in areas like medicine, structural engineering, automotive development and other. There is yet another field where modern technologies are significantly affecting workflows and results - the entertainment sector. In this field particular technologies are essential, without those no profitable results would be possible. For instance camera tracking, chroma keying, image based lighting with its photorealistic lighting and rendering, motion capturing, fur / smoke / fire simulation, and so on. One very new technology is helping the new content creation. This is modeling with the help of point clouds and also point cloud supported camera tracking. One way to create a usable point cloud is the utilization of LiDAR scanning equipment like those by the FARO company. A LiDAR scan is very precise but on the downside the scanner must always stand on hard ground. Under some circumstances it is not possible to have a solid stand or some positions can not be taken due to height or danger for the operator. Another way is the calculation of a point cloud using the structure from motion (SfM) method. Here, point clouds are calculated out of a series of pictures from small objects or large areas. In areas LiDAR scanning is not possible, SfM is a good solution, maybe in conjunction with cranes, air drones, and other equipment with high mobility. One question is now how accurate and precise are point clouds out of those SfM calculations to make use in high quality entertainment productions or other applications like structural engineering. Then the most important question is, which kind of consumer cameras are most reasonable to get best and most useful results. One of the most important topics in the entertainment sector is camera tracking (match moving) which corresponds to the SfM method (see figure 1). Without camera tracking compositing of 3D elements and live action footage would not be possible. The technology behind camera tracking is similar to the SfM procedure. Here the most important feature is generating a virtual camera, mimicking exactly the movement of the real camera, but point clouds are determined as well using photogrammetry concepts. Because of the high quality standard in film making camera tracking must deliver a very accurate result, usually less than a one pixel error. To improve the result of the match moving process the use of external high precision point clouds is desirable. This paper examines different methods of camera tracking, its accuracy and suitability for the entertainment sector. The most important question is which kind of consumer cameras are sufficient to get best possible results for further use with autonomous drones. This paper deals with the research of the practical use of the SfM method in cases where high precision point clouds are mandatory.
Figure 1. From point observation and internal knowledge of camera parameter, the 3D structure of the scene is computed from the estimated motion of the camera [5].
2. TEST ENVIRONMENT 2.1 SfM In two test runs recordings are taken in video mode with Canon 5D MKII (50 mm), GoPro Hero3 (Wide Mode) and iPhone 4S each with a defined small object (box with markers) as well as a bigger object (building exterior) with almost similar perspective and parallax-properties. The point clouds are calculated from every tenth frame of the video footage without any preprocessing. The Canon lens EF 50 mm f/1,4 with an approximated angle of 46 degrees (in conjunction with the Canon 5D MKII) has moderate distortions. The GoPro Hero3 in the wide mode with an angle of approximately 150 degrees has distortions comparable to a fish-eye lens. For the iPhone 4S there is no information about the optical system of the camera. The point clouds themselves are generated by Agisoft Photoscan and VisualSFM. Those results are fed into Cloudcompare and compared to LiDAR scans from the same objects captured by a FARO 3D X330 which is accurate to a millimeter. A same angle of the objects are compared for each point cloud and each camera (with Photoscan and VisualSFM) with a reference point cloud (LiDAR). This process is repeated three times with the different angles to reduce any measuring errors sufficiently. The arithmetic mean of the angle differences is taken for a statement regarding the quality of the SfM point clouds. The three different angles of the objects must at least cover one surface angle and two interior angles.
2.2 Camera Tracking In a test series a dollied sample shot has been taken with a GoPro in all three available modes (narrow, mid, wide), a Nikon Cool Pix, an iPhone, a Canon 5D MKIII (20 mm) and a Red Scarlett (20 mm). The shot itself is an outside shot of an building. Additionally, for the test a LiDAR scan has been captured from the building. With the footage from each camera four methods of tracking are performed. The software in use is Syntheyes by Andersson Technologies (Build 14.09.12). For the evaluation the RMS error is noted in all trackings and its sub steps to be able to compare the different methods. The first tracking is with lens distortion and the automatic tracking from Syntheyes; the original footage is used. After that all trackers with an error higher than 30 are deleted, subsequently errors higher than 15 and 5. The next run is identical with the first tracking except the footage is undistorted in Syntheyes by removing the lens distortion. For the next two test runs the LiDAR scan is utilized to improve the tracking. To bind the LiDAR scan to the tracking, twelve tracking markers are manually added to the footage and constrained to the scan. The first tracking is again done with lens distortion. The tracking starts with the seed solver, which
takes only into account the 12 trackers and their constraint to the points in the LiDAR point cloud. After that additional blips are generated by processing the live action footage and used to refine the solve. Then errors higher than 30, 15, 5 are step by step removed from the tracking analogical to the first two test runs. For visual inspection, screenshots of every tracking are taken in the top view. Because the shot features a building‘s corner, we know that it should be 90 degrees.
3. CONCLUSION 3.1 SfM Taking into account the determined values and the visual results, there has to be a differentiation between technical and artistic relevance of the generated point clouds. It is necessary to accept that the internal processing (with regard to lens distortion) of the footage in the different SfM applications lead to different results. The determined values (figure 6) show that point clouds generated from GoPro footage for small objects generally have a higher error, both for Photoscan and VisualSFM. Point clouds from larger objects produce an acceptable result, despite the enormous lens distortion. Figure 2 and 3 showcase exemplary screenshots of the angle measurements. In case of point cloud based camera tracking this method is definitely a legit alternative to expensive LiDAR scans. Nevertheless, this circumstance has to be evaluated further. To reproduce small objects with high precision in 3D applications, apparently it is wise to use footage from cameras (and their optical systems) with minor lens distortion. When comparing point clouds generated by Agisoft Photoscan and VisualSFM from footage from the Canon 5D MKII with the 50 mm lens, the result from Photoscan with a larger object in the test run is nearly useless. Reasons for that might result from the reduced angle of view compared to the other camera types. Another reason could be insufficient image information. This result has to undergo further evaluation. Generally, the values in figure 6 show in evidence that there are huge differences in the precision of the generated point clouds using Agisoft Photoscan and VisualSFM with identical footage. That might be due to different inspection and processing of the lens distortion. It is possible that excessive preprocessing of the footage regarding lens distortion might be an improvement. Considering the exploitation of the point clouds from an artistic point of view, there are huge differences respectively the used kind of camera, as well as SfM application. By artistic point of view is meant that the results are used to generate 3D models based on point clouds. There are several algorithms to convert point clouds into a mesh to use them in 3D Applications such as Autodesk Maya, Autodesk 3ds Max or ZBrush. In this case it is mandatory to have a dense, consistent and low-noise point cloud. Otherwise, the generated meshes are visually falsified, respectively need more effort in postprocessing. With smaller objects Agisoft Photoscan provides better results than VisualSFM using a camera with high image quality (in this case the Canon 5D MKII / 50 mm). Capturing larger objects with cameras like the GoPro or the iPhone 4S, Agisoft Photoscan leads to usable result. VisualSFM produces less noisier results but with larger ‘holes‘. Point clouds from Agisoft Photoscan on the other side are more consistent. The increased noise can undergo further processing by specialized applications like Geomagic Studio. Concluding, after all the previous tests generating point clouds with Agisoft Photoscan and VisualSFM fed by three very different consumer cameras‘ footage, it is possible to state: It depends on the use of the point cloud and the kind of object that needs to be scanned. Thus, the camera and SfM application have to be chosen accordingly.
3.2 Camera Tracking For reference, Andersson Technologies recommends an error value at least lower than 1.0. The value represents the divergence of a pixel from its designated position. Figure 7 shows the error values for all five cameras and the four methods. At first glance, it is obvious that the automatic tracking with lens distortion is useless. By filtering the high error tracking points out, the error drops drastically and is in the norm given by Andersson Technologies. However, with regard to the geometry representation (figure 4) a compositing would still not work. As mentioned earlier, the angle of the building should be 90 degrees which is not achieved using the camera
Figure 2. Example of a measured angle on a small object.
Figure 3. Example of a measured angle on a large object.
Figure 4. Nikon Cool Pix, automatic tracking, all tracking points with an error greater than 5 removed. The highlighted angle should measure 90 degrees.
tracking technology. Removing the lens distortion lowers the RMS error even more. But in the visual inspection the angle still does not match the angle in reality. Taking the LiDAR scan into account without removing the lens distortion from the footage, the error values also drop when sorting out the high error markers, but they never reach the low error values of the first two methods. Nevertheless, the geometry representation matches better the angle of the building‘s corner. The higher error values are caused by not removing the lens distortion and to a certain degree by manually constraining the twelve tracker points to the LiDAR scan point cloud. Using the seed solve with additional blips increases the RMS error. Due to the manual tracking points set at the beginning to constraint the scan to the footage, Syntheyes has a conflict between the manual set tracking points and the generated blips by the software. The result is a compromise between the software‘s calculations and the user‘s tracking points. By removing the lens distortion, the error drops in the recommended scale. It fails to match the quality of the values in experiment two, because the human is not capable of matching the tracking points to the scan accurately enough, but provides a capable compromise between accuracy of the geometry representation and the live action only calculations. Taking a closer look at the GoPro and the three modes (narrow, mid, wide) it is possible to state that the lenses and the focal length are important, too. The error values increase from narrow to wide. The GoPro uses for the narrow mode only the center of the lens, increasing the radius for mid and wide. Like every lens the distortion in the center is always the lowest. The focal length behaves similar. With a shorter focal length the image is getting more distorted and it is getting harder to rectify the image. Thus, the tracking is not as accurate as it should be for further production. Syntheyes is not capable of removing the lens distortion completely. Comparing the GoPro with the Canon 5D MKIII (figure 8), all the described influences on the tracking can properly be seen in figure 5. Only in the narrow mode the GoPro comes close the Canon 5D MKIII concerning the error value. With regard to the geometry representation the Canon 5D MKIII is very close to the 90 degree angle. In contrast to that it is not even possible to find the angle in the GoPro wide footage. Even the GoPro narrow shot does not get even close to the 90 degrees. It is necessary to remove the lens distortion in the first place to achieve acceptable results. In a nutshell, it is not only important to have low error value. A higher value might lead to a better result for further processing in the pipeline if the geometry representation is more accurate.
Figure 5. Comparison between Canon 5D MKIII and GoPro. The highlighted angle should measure 90 degrees.
Figure 6. Three measured angles of each camera with almost identical positions for a small and large object each in Agisoft Photoscan and Visual SFM compared to the LiDAR scans.
Figure 7. Comparison between all tested cameras. Figure shows the RMS error values for each method and sub step.
Figure 8. Overview of the Canon 5D MKIII tracking results.
REFERENCES [1] Dobbert, T.: Matchmoving - The Invisible Art of Camera Tracking. John Wiley & Sons; 2. Auflage, 2012 [2] Report – Bildsequenzen. ETH Z¨ urich, Institut f¨ ur Geod¨asie und Photogrammetrie, Report, 2009 [3] Koutsoudis, A., et. al.: Multi-image 3D reconstruction data evaluation. University of Ljubljana, Report 2013 [4] Bartos, K.: Analysis of low-cost photogrammetric procedures in the process of historical objects survey. Technical University of Kosice, Report 2014 [5] SfM: Structure-from-Motion, http://openmvg.readthedocs.org/en/latest/software/SfM/SfM/