Document not found! Please try again

Rapid 3D Reconstruction for Image Sequence Acquired from ... - MDPI

0 downloads 0 Views 10MB Size Report
Jan 14, 2018 - simultaneous localization and mapping (SLAM) [1–3], structure from motion (SfM) ... orientations of monocular camera and sparse point map can be obtained from ... can be found in the open-source software openMVS [16].
sensors Article

Rapid 3D Reconstruction for Image Sequence Acquired from UAV Camera Yufu Qu *, Jianyu Huang and Xuan Zhang Department of Measurement Technology & Instrument, School of Instrumentation Science & Optoelectronics Engineering, Beihang University, Beijing 100191, China; [email protected] (J.H.); [email protected] (X.Z.) * Correspondence: [email protected]; Tel.: +86-010-8231-7336 Received: 23 November 2017; Accepted: 11 January 2018; Published: 14 January 2018

Abstract: In order to reconstruct three-dimensional (3D) structures from an image sequence captured by unmanned aerial vehicles’ camera (UAVs) and improve the processing speed, we propose a rapid 3D reconstruction method that is based on an image queue, considering the continuity and relevance of UAV camera images. The proposed approach first compresses the feature points of each image into three principal component points by using the principal component analysis method. In order to select the key images suitable for 3D reconstruction, the principal component points are used to estimate the interrelationships between images. Second, these key images are inserted into a fixed-length image queue. The positions and orientations of the images are calculated, and the 3D coordinates of the feature points are estimated using weighted bundle adjustment. With this structural information, the depth maps of these images can be calculated. Next, we update the image queue by deleting some of the old images and inserting some new images into the queue, and a structural calculation of all the images can be performed by repeating the previous steps. Finally, a dense 3D point cloud can be obtained using the depth–map fusion method. The experimental results indicate that when the texture of the images is complex and the number of images exceeds 100, the proposed method can improve the calculation speed by more than a factor of four with almost no loss of precision. Furthermore, as the number of images increases, the improvement in the calculation speed will become more noticeable. Keywords: UAV camera; multi-view stereo; structure from motion; 3D reconstruction; point cloud

1. Introduction Because of the rapid development of the unmanned aerial vehicle (UAV) industry in recent years, civil UAVs have been used in agriculture, energy, environment, public safety, infrastructure, and other fields. By carrying a digital camera on a UAV, two-dimensional (2D) images can be obtained. However, as the requirements have grown and matured, 2D images have not been able to meet the requirements of many applications such as three-dimensional (3D) terrain and scene understanding. Thus, there is an urgent need to reconstruct 3D structures from the 2D images collected from UAV camera. The study of the methods in which 3D structures are generated by 2D images is an important branch of computer vision. In this field, many researchers have proposed several methods and theories [1–17]. Among these theories and methods, the three most important categories are the simultaneous localization and mapping (SLAM) [1–3], structure from motion (SfM) [4–14] and multiple view stereo (MVS) algorithms [15–17], which have been implemented in many practical applications. As the number of images and their resolution increase, the computational times of the algorithms will increase significantly, limiting them in some high-speed reconstruction applications. Two major contributions in this paper are methods of selecting key images selection and SfM calculation of sequence images. Key images selection is very important to the success of 3D Sensors 2018, 18, 225; doi:10.3390/s18010225

www.mdpi.com/journal/sensors

Sensors 2018, 18, 225

2 of 20

reconstruction. In this paper, a fully automatic approach to key frames extraction without initial pose information is proposed. Principal Component Analysis (PCA) is used to analyze the correlation of features over frames to automate the key frame selection. Considering the continuity of the images taken by UAV camera, this paper proposes a 3D reconstruction method based on an image queue. To ensure the smooth of two consecutive point cloud, an improved bundle-adjustment named weighted bundle-adjustment is used in this paper. After using a fixed-size image queue, the global structure calculation is divided into several local structure calculations, thus improving the speed of the algorithm with almost no loss of accuracy. 2. Literature Review The general 3D reconstruction algorithm without a priori positions and orientation information can be roughly divided into two steps. The first step involves recovering the 3D structure of the scene and the camera motion from the images. The problem addressed in this step is generally referred to as the SfM problem. The second step involves obtaining the 3D topography of the scene captured by the images. This step is usually completed by generating a dense point data cloud or mesh data cloud from multiple images. The problem addressed in this step is generally referred to as the MVS problem. In addition, the research into Real-time simultaneous localization and mapping (SLAM) and 3D reconstruction of the environment have become popular over the past few years. Positions and orientations of monocular camera and sparse point map can be obtained from the images by using SLAM algorithm. 2.1. SfM The SfM algorithm is used to obtain the structure of the 3D scene and the camera motion from the images of stationary objects. There are many similarities between SLAM and SfM. They both estimate the localizations and orientations of camera and sparse features. Nonlinear optimization is widely used in SLAM and SfM algorithms. Researchers have proposed improved algorithms for different situations based on early SfM algorithms [4–6]. A variety of SfM strategies have emerged, including incremental [7,8], hierarchical [9], and global [10–12] approaches. Among these methods, a very typical one was proposed by Snavely [13], who used it in the 3D reconstruction of real-world objects. With the help of feature point matching, bundle adjustment, and other technologies, Snavely completed the 3D reconstruction of objects by using images of famous landmarks and cities. The SfM algorithm is limited in many applications because of the time-consuming calculation. With the continuous development of computer hardware, multicore technologies, and GPU technologies, the SfM algorithm can now be used in several areas. In many applications, the SfM algorithm has higher requirements for the computing speed and accuracy. There are several improved SfM methods such as the method proposed by Wu [8,14]. These methods can improve the speed of the structure calculation without loss of accuracy. Among the incremental SfM, hierarchical SfM, and global SfM, the incremental SfM is the most popular strategy for the reconstruction of unordered images. Two important steps in incremental SfM are the feature point matching between images, and bundle adjustment. As the resolution and number of images increase, the number of matching points and parameters optimized by bundle adjustment will increase dramatically. This results in a significant increase in the computational complexity of the algorithm and will make it difficult to use it in many applications. 2.2. MVS When the positions and orientations of the cameras are known, the MVS algorithm can reconstruct the 3D structure of a scene by using multiple-view images. One of the most representative methods was proposed by Furukawa [15]. This method estimates the 3D coordinates of the initial points by matching the difference of Gaussians and Harris corner points between different images, followed by patch expansion, point filtering, and other processing. The patch-based matching method is used to match other pixels between images. After that, a dense point data cloud and mesh data cloud can be

Sensors 2018, 18, 225

3 of 20

obtained. Inspired by Furukawa’s method, some researchers have proposed several 3D reconstruction algorithms [16–18] based on depth-map fusion. These algorithms can obtain reconstruction results with an even higher density and accuracy. The method proposed by Shen [16] is one of the most representative approaches. The important difference between this method and Furukawa’s method is that it uses the position and orientation information of the cameras as well as the coordinates of the sparse feature points generated from the structure calculation. The estimated depth maps are obtained from the mesh data generated by the sparse feature points. Then, after depth–map refinement and depth–map fusion, a dense 3D point data cloud can be obtained. An implementation of this method can be found in the open-source software openMVS [16]. Furukawa’s approach relies heavily on the texture of the images. When processing weakly textured images, it is difficult for this method to generate a dense point cloud. In addition, the algorithm must repeat the patch expansion and point cloud filtering several times, resulting in a significant increase in the calculation time. Compared to Furukawa’s approach, Shen’s method directly generates a dense point cloud using depth-map fusion. This method can easily and rapidly obtain a dense point cloud. Considering the characteristics of the problems that must be addressed in this study, we use a method similar to Shen’s approach to generating a dense point data cloud. 2.3. SLAM SLAM mainly consists in the simultaneous estimation of the localization of the robot and the map of the environment. The map obtained by SLAM is often required to support other tasks. The popularity of SLAM is connected with the need for indoor applications of mobile robotics. As the UAV industry rises, SLAM algorithms are widely used in UAV applications. Early SLAM approaches are based on Extended Kalman Filters, Rao-Blackwellised Particle Filters, and maximum likelihood estimation. Without priors, MAP estimation reduces to maximum-likelihood estimation. Most SLAM algorithms are based on iterative nonlinear optimization [1,2]. The biggest problem of SLAM is that some algorithms are easily converging to a local minimum. It usually returns a completely wrong estimate. Convex relaxation is proposed by some authors to avoid convergence to local minima. These contributions include the work of Liu et al. [3]. Kinds of improved SLAM algorithms have been proposed to adapt to different applications. Some of them are used for vision-based navigation and mapping. 3. Method 3.1. Algorithm Principles The first step of our method involves building a fixed-length image queue, selecting the key images from the video image sequence, and inserting them into the image queue until full. A structural calculation is then performed for the images of the queue. Next, the image queue is updated, several images are deleted from the front of the queue, and the same number of images is placed at the end of the queue. The structural calculation of the images in the queue is then repeated until all images are processed. On an independent thread, the depth maps of the images are calculated and saved in the depth-map set. Finally, all depth maps are fused to generate dense 3D point cloud data. Without the use of ground control points, the result of our method lost the accurate scale of the model. The algorithm flowchart is outlined in Figure 1.

Sensors 2018, 18, 225

4 of 20

Sensors 2018, 18, 225

4 of 20

Figure 1. Algorithm flowchart. Figure 1. Algorithm flowchart. 3.2. Selecting Key Images

3.2. Selecting Key Images

In order to complete the dense reconstruction of the point cloud and improve the computational

images the (which are reconstruction suitable for the structural calculation) mustimprove first be selected from a Inspeed, orderthe tokey complete dense of the point cloud and the computational large number of UAV video images captured by a camera. The selected key images should have a from speed, the key images (which are suitable for the structural calculation) must first be selected good overlap of area for the captured scenes. For two consecutive key images, they must meet the a large number of UAV video images captured by a camera. The selected key images should have a key image constraint (denoted as R (I1,I2)) if they have a sufficient overlap area. In this study, we good overlap of area for the captured scenes. For two consecutive key images, they must meet the key propose a method for directly selecting key images for reconstructing the UAV camera’s images (the imageGPS constraint (denoted R (Ican )) if they a sufficient overlap In thisby study, propose 1 , I2only equipped on the as UAV reachhave an accuracy on the orderarea. of meters; usingwe GPS a method for directly selecting key images for reconstructing the UAV camera’s images (the information as a reference for the selection of key images, discontinuous images will form). The GPS equipped on the canimages only reach anestimated accuracybyon the order of meters; by using GPS information as overlap areaUAV between can be the correspondence between the feature points of the images. order to reduce computational complexity of feature point matching, we propose a reference for theInselection of keythe images, discontinuous images will form). The overlap area between method of compressing feature points based on principal component analysis (PCA).In It order is imagesa can be estimated by thethe correspondence between the feature points of the images. to assumed that the images used for reconstruction are rich in texture. Three principal component points reduce the computational complexity of feature point matching, we propose a method of compressing (PCPs) can be generated from PCA, each reflecting the distribution of the feature points in different the feature points based on principal component analysis (PCA). It is assumed that the images used images. If the two images are captured almost at the same position, the PCPs of them almost coincide for reconstruction are rich in texture. Three principal component points (PCPs) can be generated from in the same place. Otherwise, the PCPs will move and be located in different positions on the image. PCA, each reflecting distribution ofwe theuse feature points in different images. (SIFT) If the[19] twofeature images are The process stepsthe are as follows. First, the scale-invariant feature transform captured almost at the same position, the PCPs of them almost coincide in the same place. detection algorithm to detect the feature points of each image (Figure 2a). There must be at leastOtherwise, four feature feature points can be calculated as follows: the PCPs willpoints, move and andthe be centroid located of in these different positions onthen the image. The process steps are as follows. 1 First, we use the scale-invariant feature transform (SIFT) [19] feature detection algorithm to detect the 𝑝 = ∑𝑛𝑖=1 𝑃𝑖 , 𝑃𝑖 = (𝑦𝑥 ) (1) feature points of each image (Figure 2a). There𝑛 must be at least four feature points, and the centroid of where Ppoints i is the pixel coordinate of the feature point, and 𝑝 is the centroid. The following matrix is these feature can then be calculated as follows: formed by the image coordinates of the feature points:

1 p= n

n

∑ Pi ,

i =1

Pi =

x y

! (1)

where Pi is the pixel coordinate of the feature point, and p is the centroid. The following matrix is formed by the image coordinates of the feature points:

Sensors 2018, 18, 225

5 of 20



P1 − P .. .

 A=  

T 

Pn − P

Sensors 2018, 18, 225

T

  

(2) 5 of 20

Then, the singular value decomposition (SVD) of matrix A yields two principal component vectors. The principal component points (PCPs) are from these vectors (Equations (3) and (4)). (𝑃1obtained − 𝑃)𝑇 To compress a large number of feature points (Figure 2b), 𝐴 =into [ three ] (2) ⋮ PCPs (𝑃𝑛 − 𝑃)𝑇

U ∑ V(SVD) = svd A) Then, the singular value decomposition of (matrix A yields two principal component vectors. The principal component points (PCPs) are obtained from these vectors (Equations (3) and (4)).   pm1 = P, ppoints V1 + P , PCPs pm3 =(Figure V2 +2b), P m2 = into To compress a large number of feature three ∗

(3) (4)

where pm1 , pm2 , and pm3 are the three PCPs,𝑈∑𝑉 and∗ = V 1𝑠𝑣𝑑(𝐴) and V 2 are the two vectors of V*. The PCPs can (3) reflect the distribution of the feature points in the image. After that, by calculating the positional (4) 𝑝𝑚1 = 𝑃, 𝑝𝑚2 = (𝑉1 + 𝑃), 𝑝𝑚3 = (𝑉2 + 𝑃) relationship of the corresponding PCPs between two consecutive images, we can estimate the overlap where pm1, pm2, and pm3 are the three PCPs, and V1 and V2 are the two vectors of V*. The PCPs can area between images. The average displacement (d p ) between PCPs, as expressed in Equation (5), can reflect the distribution of the feature points in the image. After that, by calculating the positional be calculated as follows: d p reflects the relative displacement of feature points; when d p < Dl , it is relationship of the corresponding PCPs between two consecutive images, we can estimate the overlap likelyarea thatbetween the twoimages. images are almost captured at(𝑑the same position; and when d p > Dh , the overlap The average displacement 𝑝 ) between PCPs, as expressed in Equation (5), can area of imagesasbecomes small. In this paper, we use 1/100 of the resolution as the value of Dl betwo calculated follows: 𝑑too 𝑝 reflects the relative displacement of feature points; when 𝑑𝑝 < 𝐷𝑙 , it is and 1/10 of the resolution as the value Dh . When d psame is within the certain range in overlap Equation (6), likely that the two images are almostof captured at the position; and when 𝑑𝑝 >given 𝐷ℎ , the the two will meet the key constraint I2 )1/100 : ( I1 ,use areaimages of two images becomes tooimage small. In this paper,Rwe of the resolution as the value of 𝐷𝑙 and 1/10 of the resolution as the value of 𝐷ℎ . When 𝑑𝑝 is within the certain range given in Equation 1 3 (6), the two images will meet d the = key image [( pconstraint − p )T𝑅(𝐼 ×1(, p𝐼2 ): − p )]0.5 p

i =1 3∑ 1

𝑑𝑝 = ∑ 3

1i

2i

1i

2i 0.5

[(𝑝1𝑖 − 𝑝2𝑖 ) × (𝑝1𝑖 − 𝑝2𝑖 )]

3 𝑖=1

𝑇

(5)

(5)

R( I1 , I2 ) : Dl < d p < Dh

(6) (6)

𝑅(𝐼1 , 𝐼2 ): 𝐷𝑙 < 𝑑𝑝 < 𝐷ℎ

where p1i is the ith PCP of the first image (I1 ), and p2i is that of the second image (I2 ). The result is wherein p1iFigure is the ith first image ), and p2i is that of the second (I2). The resultand is it is presented 2c.PCP Thisofisthe a method for(I1estimating the overlap areasimage between images, presented in Figure 2c. This is a method for estimating the overlap areas between images, and it is not necessary to calculate the actual correlation between the two images when selecting key images. not necessary to calculate the actual correlation between the two images when selecting key images. Moreover, the algorithm is not time-consuming for either the calculation of the PCPs or the estimation Moreover, the algorithm is not time-consuming for either the calculation of the PCPs or the estimation of the distance between PCPs. Therefore, this method is suitable for quickly selecting key images from of the distance between PCPs. Therefore, this method is suitable for quickly selecting key images a UAV camera’s image from a UAV video camera’s videosequence. image sequence.

(a)

(b)

(c)

Figure 2. Feature point compression. (a) Detecting the feature an image; calculating Figure 2. Feature point compression. (a) Detecting the points featureofpoints of an(b)image; (b) the calculating the principal component points (PCPs) of the feature points; and (c) matching principal component points (PCPs) of the feature points; and (c) matching the PCPs. the PCPs.

Sensors 2018, 18, 225 Sensors 2018, 18, 225

6 of 20

6 of 20

3.3. Image Queue SfM 3.3. Image Queue SfM This study focuses on the 3D reconstruction of UAV camera’s images. Considering the This study focuses on the 3D reconstruction of UAV camera’s images. Considering the continuity continuity of UAV camera’s images, we propose a SfM calculation method based on an image queue. of UAV camera’s images, we propose a SfM calculation method based on an image queue. This method This method constructs a fixed-size image queue and places key images into the queue until full. constructs a fixed-size image queue and places keyisimages into the until full. Then, the structure Then, the structure of the images in the queue computed, andqueue the queue is updated with new of images. the images in the queue is computed, and the queue is updated with new images. Eventually, Eventually, we will complete the structural calculation of all images by repeating thewe will completecomputation the structural of allThe images by queue repeating structural and structural andcalculation queue update. image SfM the includes two computation steps. The first queue update. The calculation image queue SfMimages includes two steps.The Thesecond first involves SfM calculation involves the SfM of the in the queue. involvesthe updating the imagesofinthe images in thequeue. queue. The second involves updating the images in the image queue. the image

3.3.1. SfM Calculation 3.3.1. SfM Calculationfor forthe theImages Imagesin in the the Queue Queue WeWe propose the incremental incrementalSfM SfM algorithm. process is illustrated in Figure proposethe theuse use of of the algorithm. TheThe process is illustrated in Figure 3. The 3. The collection images used reconstruction is first recorded The total number collection of of allall images used forfor thethe reconstruction is first recorded as as setset C. C. The total number of of images inin CC is isassumed fixedqueue queueisismm(it(itisispreferred preferred that any two images assumedtotobe beN. N.The Thesize size of of the the initial initial fixed that any two images queue have overlappingareas, areas,and andmmcan canbe bemodified modifiedaccording according to to the the requirements requirements of images in in thethe queue have overlapping the calculation speed. When m chosen is chosen smallernumber, number, the speed the precision theofcalculation speed. When m is asasa asmaller speedincreases, increases,but but the precision decreases correspondingly). In order to keep the stability of the algorithm, the value of m is generally decreases correspondingly). In order to keep the stability of the algorithm, the value of m is generally taken greater than 5, and is less of Then, m. Then, m key images are inserted the image taken greater than 5, and k iskless thanthan halfhalf of m. m key images are inserted into into the image queue. of theinimages in thequeue imageare queue are recorded Cq,the andstructure the structure of the images Allqueue. of theAll images the image recorded as Cq , as and of allofofallthe images in Cq in C q is calculated. is calculated. Image queue

1

2

9

...

10

k =2

11

m =10

12

New images

Image queue

3 1

2

Discarded image

4

...

11

12 13

14

New images

Figure 3. 3. Structure ofthe theimages imagesininthe thequeue. queue. Figure Structurefrom frommotion motion (SfM) (SfM) calculation calculation of Considering the accuracy and speed of the algorithm, the SfM approach used in this study uses the accuracy speed of the algorithm, approachbelow. used in this study uses anConsidering incremental SfM algorithmand [7]. The steps of the algorithmthe areSfM summarized an incremental SfM algorithm [7]. The steps of the algorithm are summarized below. 1. The SIFT [19] feature detection algorithm is used to detect the feature points on all images in the queue, correspondence of the feature points are then by theon feature point in 1. The SIFTand [19]the feature detection algorithm is used to detect the obtained feature points all images matching [20] between every two images in the queue. the queue, and the correspondence of the feature points are then obtained by the feature point 2. matching Two images selected fromtwo the images queue as image pair using the method proposed in [20] are between every inthe theinitial queue. [21]. The fundamental matrix of the two images is obtained byusing the random sample consensus 2. Two images are selected from the queue as the initial image pair the method proposed in [21]. (RANSAC) method [22], and the essential matrix between the two images is then calculated The fundamental matrix of the two images is obtained by the random sample consensus (RANSAC) when the intrinsic matrix (obtained by the calibration method proposed in [23]) is known. The method [22], and the essential matrix between the two images is then calculated when the intrinsic first two terms of radial and tangential distortion parameters are also obtained and used for matrix (obtained by the calibration method proposed in [23]) is known. The first two terms of image rectification. After remapping the pixels onto new locations on the image based on radial and tangential distortion parameters are also obtained and used for image rectification. distortion model, the image distortion caused by lens could be eliminated. Then, the positions After remapping the pixels onto new locations on the image based on distortion model, the image

Sensors 2018, 18, 225

3. 4. 5.

6.

7.

7 of 20

distortion caused by lens could be eliminated. Then, the positions and orientations of the images can be obtained by decomposing the essential matrix according to [24]. According to the correspondence of the feature points in different images, the 3D coordinates of the feature points are obtained by triangulation (the feature points are denoted as Pi (i = 1, . . . , t)). The parameters calculated in the previous steps are passed into the bundle adjustment [25] for nonlinear optimization [26]. The structure of the initial image pair is calculated, and one of the coordinate systems of the cameras taking the image pair is set as the global coordinate system. The image of the queue that has completed the structure calculation is placed into the set CSFM (CSFM ⊂ Cq ). The new image (Inew ) is placed into the set (CSFM ), and the structural calculation is performed. The new image must meet the following two conditions. First, there should be at least one image in CSFM that has common feature points with Inew . Second, at least six of these common feature points must be in Pi (i = 1, . . . , t) (in order to improve the stability of the algorithm, this study requires at least 15 common feature points). Finally, all of the parameters from the structure calculation are optimized by bundle adjustment. Repeat step 6 until the structure of all of the images inside the queue is calculated (CSFM = Cq ).

3.3.2. Updating the Image Queue After the above steps, the structural calculation of all of the images in Cq can be performed. In order to improve the speed of the structural calculation of all of the images in C, this study proposes an improved SfM calculation method; the structural calculation of the images is processed in the form of an image queue. Figure 4 illustrates the process of the algorithm. We delete k images at the front of the queue, save their structural information, and then place k new images at the tail of the queue; these k images are then recorded as a set Ck . The (m−k) images left in the queue are recorded as a set Cr (Cq = Cr ∪ Ck ), so now CSFM = Cr . The structure of the images in Cr is known, and the structural information contains the coordinates of the 3D feature points (marked as Pr ). The corresponding image pixels of Pr are marked as a set Ur , and the projection relationship is expressed as P : Pr → Ur . Then, the pixels of the feature points (marked as Uk ) of the images in Ck are detected, and the pixels in Uk and Ur are matched. We obtain the correspondence M : UrC ↔ Ukc (Urc ∈ Ur , Ukc ∈ Uk ), and UrC and Uk c are the image pixels of the same object points (marked as Pc ) in different images from Cr and Ck , respectively, expressed as P : Pc → Ukc , Pc → Urc , where Pc is the control point. The projection matrix of the images in Ck can be estimated by the projection relationship between Pc and Ukc ; then, the positions and orientations of the cameras can be calculated. In contrast, Pc can be used in the later weighted bundle adjustment to ensure the continuity of the structure. Then, we repeat step 6 until CSFM = Cq . Finally, the structure of all of the images can be calculated by repeating the following two procedures alternately: calculate the SfM of the images in the queue and update the image queue.

Sensors 2018, 18, 225

8 of 20

Sensors 2018, 18, 225

8 of 20

Figure 4. 4. Updating Updating the Figure theimage imagequeue queue(m(m= =5,5,k k= =1).1).

3.3.3. Weighted Bundle Adjustment 3.3.3. Weighted Bundle Adjustment AnAn important Ourmethod methoddivides dividesa alarge large importantpart partofofthe theSfM SfMalgorithm algorithmisisbundle bundle adjustment. adjustment. Our number images into smallgroups groupsofofimages imagesin in the the form form of an image thethe number of of images into small imagequeue. queue.When Whencalculating calculating structure queue,optimization optimizationof ofthe thebundle bundle adjustment adjustment causes thethe structure byby thethe queue, causesthe theparameters parameterstotoreach reach subregion optimum rather thanthe theglobal globaloptimum. optimum. Small differences thethe subregion optimum rather than differencesin inthe theparameters parametersbetween between subregions will result discontinuousstructures. structures. This This problem control subregions will result inindiscontinuous problem can canbe beaddressed addressedbybyusing using control points, which arethe thepoints pointsconnecting connecting two points of the image, as shown in points, which are two sets setsofofadjacent adjacentfeature feature points of the image, as shown Figure 5. When we use bundle adjustment to optimize the parameters, we must keep the control in Figure 5. When we use bundle adjustment to optimize the parameters, we must keep the control points unchanged with littlechange changeas aspossible. possible. This This is term of of points unchanged oror with asaslittle is achieved achievedby byweighting weightingthe theerror error term control points. After the first update of the image queue, the formula for the projection error of thethe control points. After the first update of the image queue, the formula for the projection error of the the bundle adjustment used in step 6 will be altered. bundle adjustment used in step 6 will be altered. For a single image, Equation (7) is the projection formula of the 3D point to the image pixel, and For a single image, Equation (7) is the projection formula of the 3D point to the image pixel, and Equation (8) is the reprojection error formula: Equation (8) is the reprojection error formula: 𝑓 𝑣𝑖 ! 𝑝𝑖 ! (7) ( ) = 𝐾[𝑅, 𝑡] ( ) = 𝑓(𝑅, 𝑇, 𝑃𝑖 ) f 𝑓 v i 𝑢𝑖 p1i = K R, t = f R, T, P (7) [ ] ( ) i ui f 1 T

f

f

vi vi v vi  eprojrct = ∑ni=1 !if))} ! ! ! {((uii) − (u!! f )) T × ((ui ) − (u i   f f vi vi vi vi n − × − eprojrct = ∑ i=1 T f f c n T uvii uvii f vi vi f  vi f ui vi vui i vi f  eprojrct = ∑ {(( ) − ( f )) × (( ) − ( f ))} + wj ∑{(( ) − ( f )) × (( ) − ( f ))} n

 

i=1



eprojrct = ∑

ui

i=1!

vi ui



u

vi f ui f

u

!!iT

×

vi ui

!i



ui ! ! 

vi f ui f

 

j=1 



c

+ wj ∑

j=1



ui

vi ui

u ! i −

vi f ui f

u !!T i

×

ui !

vi ui



(8)

(8) (9) vi f ui f

! !  

(9)

Sensors 2018, 18, 225

9 of 20

2018,internal 18, 225 9 of 20 whereSensors K is the matrix of the camera, R and T are the external parameters, Pi is the 3D feature ! ! v vi f of the camera, and T are the external i is the 3D coordinate feature point, wherei K isisthe theinternal actualmatrix pixel coordinate of Rthe feature point, and parameters, isPthe pixel f 𝑓i ui 𝑣𝑖 𝑣𝑖u point, (𝑢 ) is the actual pixel coordinate of the feature point, and (𝑢 𝑓) is the pixel coordinate 𝑖 𝑖 calculated from the structural parameters. The number of control points is k. The calculation of the  

calculated from the structural parameters. The number of control points is k. The calculation of the

bundle adjustment is a nonlinear least-squares problem. The structural parameters R, T, Pi(i=1,...,n) bundle adjustment is a nonlinear least-squares problem. The structural parameters (𝑅, 𝑇, 𝑃𝑖(𝑖=1….𝑛) ) can becan optimized by minimizing eprojrct changing thethe value be optimized by minimizing 𝑒 afterafter changing valueofofthe theparameters. parameters. 𝑝𝑟𝑜𝑗𝑟𝑐𝑡

Points seen by view1, view2 and view3 Points seen by view3

Image point

view1 [R1 T1] Projection error

View3(new view)

Control points Object 3D point

[R2 T2]

New object 3D point

Calculate projection point Actual projection point

view2 [R2 T2]

Figure5.5.Weighted Weightedbundle bundleadjustment. adjustment. Figure The difference between the weighted bundle adjustment the bundle adjustment is the The difference between the weighted bundle adjustment and and the bundle adjustment is the weight weight of the control points’ projection error. The weight is 𝑤𝑗 (after an experimental comparison, a of the control points’ projection error. The weight is w j (after an experimental comparison, a value of value of 20 is suitable for 𝑤𝑗 ). Equation (9) is the reprojection error formula of the weighted bundle 20 is suitable for w j ). Equation (9) is the reprojection error formula of the weighted bundle adjustment. adjustment.

3.3.4. MVS 3.3.4. MVS For the dense reconstruction of the of object, considering the characteristics of the problem addressed For the dense reconstruction the object, considering the characteristics of the problem in thisaddressed study, we use the method based on depth-map fusion to obtain the dense point cloud. in this study, we use the method based on depth-map fusion to obtain the dense point cloud. The method to is similar to that proposed [16]. algorithm The algorithm first obtainsthe the feature feature points The method is similar that proposed in [16].in The first obtains points in in the structure calculated the SfM. By using Delaunaytriangulation, triangulation, we thethe mesh datadata the structure calculated by thebySfM. By using Delaunay wecan canobtain obtain mesh from the 3D feature points. Then, the mesh is used as an outline of the object, which is projected onto from the 3D feature points. Then, the mesh is used as an outline of the object, which is projected the plane of the images to obtain the estimated depth maps. The depth maps are optimized and onto the plane of the images to obtain the estimated depth maps. The depth maps are optimized and corrected using the pixel matching algorithm based on the patch. Finally, dense point cloud data can corrected using the pixel matching algorithm based on the patch. Finally, dense point cloud data can be obtained by fusing these depth maps. be obtained by fusing these depth maps. 4. Experiments

4. Experiments

4.1. Data Sets

4.1. Data Sets

In order to test the accuracy and speed of the algorithm proposed in this study, real outdoor

Inphotographic order to test the accuracy anda speed the on algorithm proposed in images this study, real with outdoor images taken from cameraof fixed a UAV and standard together photographic taken from a camera fixed on a[27] UAV images together with standard standardimages point cloud provided by roboimagedata areand usedstandard to reconstruct various dense 3D point The objectby models and images provide roboimagedata are scanned a high precision point clouds. cloud provided roboimagedata [27] areby used to reconstruct various with dense 3D point clouds. structured light consisting ofby two Point Grey Research GS3-U3-91S6C-C The object models andsetup images provide roboimagedata are scanned with a highindustrial precisioncameras structured with resolution andGrey a LG-PF80G DLP projector with aindustrial resolutioncameras of 1140 × 912 resolution pixels light setup consistingofof9.1 twoMp Point Research GS3-U3-91S6C-C with of 9.1 Mp and a LG-PF80G DLP projector with a resolution of 1140 × 912 pixels mounted on a rigid

Sensors 2018, 2018, 18, 225 Sensors

10 of of 20 20 10

mounted on a rigid aluminum frame. In addition, a high precision New-mark Systems RT-5 turntable aluminum frame. In addition,rotation a high precision New-mark RT-5some turntable is outdoor used to provide is used to provide automatic of the object). Figure Systems 6a–e present of the images automatic rotation of the object). Figure 6a–e present some of the outdoor images (different resolution (different resolution images taken with the same camera) taken from a camera carried by the DJI images taken the(camera same camera) taken from camera Effective carried by12.4 themillion DJI Phantom Pro FOV UAV Phantom 4 Prowith UAV hardware: 1/2.3 incha CMOS, pixels. 4Lens: ◦ 20 mm (35 mm format (camera hardware: 1/2.3 inch CMOS, Effective 12.4 million pixels. Lens: FOV 94 94° 20 mm (35 mm format equivalent) f/2.8 Focal point at infinity). Figure 6f presents some images of equivalent) Focal point infinity). Figure 6f presents images of an academic an academicf/2.8 building takenatby a normal digital camerasome which moves around the building buildingtaken (the by a normal digital camera which moves around the building (the camera’s depth of field is near infinity). camera’s depth of field is near infinity). Figure 6d, e present some of the standard images [28] taken Figure 6d,e present of thearm standard taken by camera fixed which to a robotic arm (with by a camera fixed tosome a robotic (with images known [28] positions anda orientations) is provided by known positions and orientations) which is provided by roboimagedata. Table 1 lists all of the information roboimagedata. Table 1 lists all of the information for the experimental image data and the for the experimental image data andWe theused parameters usedrunning in the algorithm. used awith computer running parameters used in the algorithm. a computer WindowsWe 7 64-bit 8 GB of RAM Windows 7 64-bit with 8 GB of RAM and a quad-core 2.80-GHz Intel (R) Xeon (r) CPU. and a quad-core 2.80-GHz Intel (R) Xeon (r) CPU.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 6. experiment (a) Garden; (b) Village; (c) Building; (d) Botanical Garden; (e) Factory Figure 6.Images Imagesforfor experiment (a) Garden; (b) Village; (c) Building; (d) Botanical Garden; land; (f) Academic building; (g) Pot; and (h) House. (e) Factory land; (f) Academic building; (g) Pot; and (h) House. Table 1. 1. Information Information for for the the Experimental Experimental Image Image Data. Data. Table Image Name Name Image GardenGarden 126 126 VillageVillage 145 145 Building 149 Building Botanical Garden 149 42 Land Botanical Factory Garden 42 170 Academic Building 128 Factory Land 170 49 Pot House Academic Building 128 49 Pot 49 House 49

Resolution Resolution 1920 × 1080 1920 × 1080 1920 × 1080 1920 × 1080 1280 × 720 1280 720 1920× × 1080 1280 × 720 1920 × 1080 1920 × 1080 1280 720 1600× × 1200 1600 × 1200 1920 × 1080 1600 × 1200 1600 × 1200

(m,(m, k)

k)

(15, 7)(40, 15) (15,6)(20, 6)(20, 7)(40, (15, 6)(20, 7)(40, 15) (15, 6)(20, 7)(40, (15, 6)(20, 7)(40, 15) (15,6)(20, 6)(20, 7)(40, (15, 7)(40, 15) (15, 6)(20, 7)(40, 15) (15, 6)(20, 7)(40, (15, 6)(20, 7)(40, 15) (15, 6)(20, 7)(40, (8, 3)(10, 4)(15, 6) (8, 3)(10, 4)(15, 6) (15, 6)(20, 7)(40,

15) 15) 15) 15) 15) 15) (8, 3)(10, 4)(15, 6) (8, 3)(10, 4)(15, 6)

Dl 25 25 20 25 20 25 20 20

𝑫D𝒍 h 150 25 150 25 150 20 150 150 25 150 20 200 200 25 20 20

𝑫𝒉 150 150 150 150 150 150 200 200

Sensors 2018, 18, 225

11 of 20

Sensors 2018, 18, 225

11 of 20

4.2. Precision Evaluation

4.2.InPrecision order toEvaluation test the accuracy of the 3D point cloud data obtained by the algorithm proposed in this study, we compared cloud bycloud our algorithm (PC)by with standard point cloud In order to testthe thepoint accuracy of generated the 3D point data obtained the the algorithm proposed in PCthis which we is captured bythe structured light scans (The error of all ground truth poses ispoint within STL study, compared point cloud generated by RMS our algorithm (PC) with the standard 0.15 mm)PC provided [27]. The accuracy of theRMS algorithm determined by calculating cloud STL whichby is roboimagedata captured by structured light scans (The error ofisall ground truth poses is thewithin nearest neighbor distance of the two point clouds [28]. First, the position of the point cloud 0.15 mm) provided by roboimagedata [27]. The accuracy of the algorithm is determined by is registered by the thenearest iterative nearest distance point method. For point the common partFirst, of PCthe and PCSTL ,ofeach point p1 calculating neighbor of the two clouds [28]. position the point is registered by the iterative common part of PC and PC , each in cloud the PC, PCSTL is searched for the nearest nearest point pointmethod. p1 ’, and For the the Euclidean distance between p1STLand p1 ’ is point p1 inThe thedistance PC, PCSTL is searched the nearest p1’, andcalculation the Euclidean distance p1 calculated. point cloud isfor obtained afterpoint the distance of each pointbetween and marked and p 1 ’ is calculated. The distance point cloud is obtained after the distance calculation of each point with different color. We compare the results of our method to those of openMVG [7], openMVS [16] and marked[29–31] with different color. We compare thepackages). results of our to thoseofofopenMVG openMVG is [7], and MicMac (three open-source software Themethod main concern SfM openMVS [16] and MicMac [29–31] (three open-source software packages). The main concern of calculation, while the main concern of openMVS is dense reconstruction. MicMac is a free open-source openMVG is SfM calculation, theinmain concern of reconstruction openMVS is dense reconstruction. MicMac photogrammetric suite that can while be used a variety of 3D scenarios. They both achieved is a free open-source photogrammetric suite that can be used in a variety of 3D reconstruction state-of-the-art results. An open source software named Cloud Compare [32] is used for the test. scenarios. They both achieved state-of-the-art results. An open source software named Cloud The results are presented in Figures 7–12. Compare [32] is used for the test. The results are presented in Figures 7–12. In the first experiment. As shown in Figure 7, point cloud shown in Figure 7a is generated by our In the first experiment. As shown in Figure 7, point cloud shown in Figure 7a is generated by method from 49 images (m = 15, k = 6). The number of points in the point cloud is 2,076,165. Figure 7b our method from 49 images (m = 15, k = 6). The number of points in the point cloud is 2,076,165. Figure the number of points in point cloud of openMVG + openMVS is 2,586,511. Figure 7c the number of 7b the number of points in point cloud of openMVG + openMVS is 2,586,511. Figure 7c the number points in point cloudcloud generated by MicMac is 270,802. And Figure 7d is standard point cloud of points in point generated by MicMac is 270,802. And Figure 7d is standard pointprovided cloud byprovided roboimagedata. The number of points is 2,880,879. by roboimagedata. The number of points is 2,880,879.

(a)

(b)

(c)

(d) Figure 7. 7. Point cloud comparison. (a) Point cloud of our (m = (m 15, =k = Point cloud of Figure Point cloud comparison. (a) Point cloud of method our method 15,6);k (b) = 6); (b) Point openMVG openMVS;+(c) Point cloud MicMac; cloud. point cloud. cloud of +openMVG openMVS; (c)of Point cloud(d)ofStandard MicMac;point (d) Standard

The distance point clouds areare shown in in Figure 8a–c. The calculation ofof distance is is performed only The distance point clouds shown Figure 8a–c. The calculation distance performed ononly the common part ofpart the of two clouds. Different colorcolor means different value of distance. on the common thepoint two point clouds. Different means different value of distance.

Sensors 2018, 18, 225 Sensors 2018, 18, 225

12 of 20 12 of 20

Sensors 2018, 18, 225

12 of 20

(b)

(a)

(b)

(a)

(c) (c)

Figure 8. (a) Distance point cloud between theproposed proposed method’s result and thestandard standard Figure Distance point cloud between proposed method’s result Figure 8. 8. (a)(a) Distance point cloud between thethe method’s result andand thethe standard point point cloud; (b) Distance point cloud between openMVG + openMVS’s result and the point(b)cloud; (b)point Distance cloud between openMVGresult + openMVS’s result point and the cloud; Distance cloud point between openMVG + openMVS’s and the standard cloud; standard point cloud; (c) Distance point cloud of MicMac and the standard point cloud. point cloud cloud;of(c) Distance cloud ofpoint MicMac and the standard point cloud. (c)standard Distance point MicMac andpoint the standard cloud. Distance histograms inin Figure 9a–c isisstatistics results of distance point cloud Figure 8a–c. For Distance histograms Figure 9a–c isstatistics statistics results of distance point cloud in Figure 8a–c. Distance histograms in Figure 9a–c results of distance point cloud ininFigure 8a–c. For the pot experiment, most distances are less than 1.5 cm when thethe pot than 200 cm (the For the potexperiment, experiment, most distances than when pot ishigher higher than 200 cm (the the pot most distances areare lessless than 1.51.5 cmcm the pot isis higher than 200 cm (the relative error isisless than 1%). relative error isless less than 1%). relative error than 1%).

(a) (a)

(b) (b)

(c)

(c)

Figure 9. (a) Distance histogram of our result; (b) Distance histogram of openMVG + Figure 9.9.(a) histogram of our of result; Distance of openMVG + openMVS’s result; Figure (a)Distance Distance histogram our(b) result; (b)histogram Distance openMVS’s result; (c) Distance histogram of MicMac’s result. histogram of openMVG + (c) Distance histogram MicMac’s result. openMVS’s result; (c)ofDistance histogram of MicMac’s result.

In the second experiment. As shown in Figure 10, Figure 10a point cloud is generated by our Inthe thesecond secondexperiment. experiment.As As shownininFigure Figure 10,Figure Figure 10apoint point cloudisisgenerated generatedby byour our In method from 49 images (m = 10, kshown = 5). The number 10, of points in10a the point cloud cloud is 2,618,918. Figure method from from 49 49 images (m == 10, 5). number the point cloud is 2,618,918. Figure 10b method images (min 10,kk==cloud 5).The The numberofof+points pointsinin the point cloud is 2,618,918. Figure 10b the number of points Point of openMVG openMVS is 2,695,354. Figure 10c the number the number of points in Point cloud of openMVG + openMVS is 2,695,354. Figure 10c the number of 10bofthe number of points Point cloud of openMVG + openMVS Figurecloud 10c the number points in point cloudingenerated by MicMac is 321,435. And (d)isis2,695,354. standard point provided points in point cloud generated by MicMac is 321,435. And (d) is standard point cloud provided by of points in point cloud MicMac is 321,435. And (d) is standard point cloud provided by roboimagedata. Thegenerated number ofby points is 3,279,989. roboimagedata. The number of points is 3,279,989. by roboimagedata. The number of points is 3,279,989.

Sensors 2018, 18, 225

13 of 20

Sensors 2018, 18, 225

13 of 20

Sensors 2018, 18, 225

13 of 20

(a) (a)

(b) (b)

(c)

(c)

(d) (d) Figure 10. Point cloud comparison. (a)(a) Point cloud of our (m = 15, 6);kk(b) Point Figure Point cloud comparison. (a)Point Point cloud ofmethod method (m ==6); Point Figure 10.10. Point cloud comparison. cloud of our method (mk===15, 15, 6);(b) (b)cloud Pointof openMVG + openMVS; (c) Point cloud of MicMac; (d) Standard point cloud. cloud of openMVG + openMVS;(c) (c)Point Pointcloud cloudof of MicMac; MicMac; (d) Standard cloud of openMVG + openMVS; Standardpoint pointcloud. cloud.

Distance histograms in Figure 12a–c are statistics resultsofofdistance distance point clouds in Figure 11a–c. Distance histograms Figure12a–c 12a–care arestatistics statisticsresults results Distance histograms inin Figure of distance point pointclouds cloudsininFigure Figure11a–c. 11a–c. ForFor thethe house experiment, most distances are less than 1cm when the house is higher than 150 cm (the house experiment, mostdistances distancesare areless less than than 1cm 1cm when when the For the house experiment, most the house houseisishigher higherthan than150 150cm cm relativerelative error iserror less is than 1%). less than 1%). (the(the relative error is less than 1%).

(a)

(a)

(b)

(b)

(c)

Figure 11. (a) Distance point cloud between the proposed method’s result and the standard

(c) Figure 11. (a) Distance point cloud between the proposed method’s result and the standard point point cloud; (b) Distance point cloud between openMVG + openMVS’s result and the cloud; (b) pointpoint cloud cloud between openMVG openMVS’s result and the standard cloud; Figure 11.Distance (a) Distance between the+proposed method’s result and thepoint standard standard point cloud; (c) Distance point cloud between MicMac’s result and the standard (c) Distance point cloud between MicMac’s result and the standard point cloud. point cloud; (b) Distance point cloud between openMVG + openMVS’s result and the

point cloud. standard point cloud; (c) Distance point cloud between MicMac’s result and the standard point cloud.

Sensors 2018, 18, 225 Sensors 2018, 18, 225

14 of 20 14 of 20

(a)

(b)

(c)

Figure 12. of our result; (b) Distance histogram of openMVG + openMVS; Figure 12. (a) (a) Distance Distancehistogram histogram of our result; (b) Distance histogram of openMVG + (c) Distance histogram of MicMac. openMVS; (c) Distance histogram of MicMac.

The number of points of the point clouds generated by our algorithm are almost the same as The number of points of the point clouds generated by our algorithm are almost the same as openMVG + openMVS’s results, and much more than those of MicMac. MicMac’s result is smoother openMVG + openMVS’s results, and much more than those of MicMac. MicMac’s result is smoother but less dense. The accuracy of our method is almost the same as openMVG + openMVS and MicMac but less dense. The accuracy of our method is almost the same as openMVG + openMVS and MicMac (state-of-the-art methods), but the speed is much faster than them. (state-of-the-art methods), but the speed is much faster than them. 4.3. Speed Evaluation 4.3. Speed Evaluation In order to test the speed of the proposed algorithm, we compared the time consumed by our In order to test the speed of the proposed algorithm, we compared the time consumed by our method with those consumed by openMVG and MicMac. Different m and k values for the algorithm method with those consumed by openMVG and MicMac. Different m and k values for the algorithm are selected, and the same image data are used to run the program under the same hardware conditions. are selected, and the same image data are used to run the program under the same hardware The running times of the algorithm are recorded in Table 2, and the precision is 1 s. conditions. The running times of the algorithm are recorded in Table 2, and the precision is 1 s. Table 2. Running Time Comparison. Table 2. Running Time Comparison. Name Name Garden Garden Village Village Building Building Botanical Garden Botanical Factory GardenLand Factory Land Academic building Academic

building

Images Images

Our Method Time Our Method Time (s)(s) mm = 15, k=6 = 15,

126 126145 145 149 149 42 42 170

1920 × 1080 1920 × 1080 1920 × 1080 1920 × 1080 1280 × 720 1280 × 720 1920 × 1080 19201280 × 1080 × 720

k284.0 =6 284.0 169.0 169.0 171.0 171.0 77.0 77.0 170.0

170 128 128

1280 × 720 1920 × 1080 1920 × 1080

170.0 124.0 124.0 m = 15, k = 6

49

1600 × 1200

49

1600 × 1200 1600 × 1200 1600 × 1200

35.0 m = 15, k=6 59.0 35.0 59.0

Pot House Pot House

Resolution Resolution

49 49

m =m 20,=k20, =7

OpenMVG OpenMVG Time (s) Time (s)

MicMac MicMac Time(s) Time(s)

m =m40, k = 15 = 40,

k=7 291.0 291.0 209.0 209.0 164.0 164.0 82.0 82.0 207.0

k336.0 = 15 336.0 319.0 319.0 268.0 268.0 99.0 99.0 343.0

1140.0 1140.0 857.0 857.0 651.0 651.0 93.0 93.0 1019.0

3072.0 3072.0 2545.0 2545.0 2198.0 2198.0 243.0 243.0 3524.0

207.0 182.0

343.0 277.0 277.0 m = 8, k = 3 m47.0 = 8, k=3 54.0 47.0 54.0

1019.0 551.0 551.0

3524.0 4597.0 4597.0

182.0 m = 10, k = 4 m =39.0 10, k = 4 53.0 39.0 53.0

56.0

351.0

74.0 56.0 74.0

467.0 351.0 467.0

The accuracy of our result is almost the same as result of openMVG and MicMac, but the speed of The accuracy of ourthan result is almost the same as result our algorithm is faster them. As is shown in Table 2. of openMVG and MicMac, but the speed of ourThere algorithm is faster than them. Asthe is shown 2. are two aspects that affect speed in of Table the algorithm. For most feature point matching There are two aspects that affect the speed of the algorithm. For most feature point matching algorithms, all images must match each other; thus, the time complexity of matching is O(N2 ). n 2). After algorithms, must match each thus, the time complexity of matching After usingall theimages methods proposed in thisother; study, the time complexity becomes O(m ×isk)O(N because the 𝑛 k using the calculation methods proposed in for thisthe study, theinside timethe complexity becomes 𝑂(𝑚 × 𝑘)k because the matching occurs only images image queue. Although m and are fixed and 𝑘 their values are generally much N, the speed the matching is greatly improved. matching calculation occurs onlysmaller for thethan images inside theofimage queue. Although m and k areSecond, fixed for the SfM calculations, most much of the smaller time is spent bundle adjustment. Bundleisadjustment itself is a and their values are generally than on N, the speed of the matching greatly improved. nonlinear problem most that optimizes the structural parameters; theadjustment calculation Second, forleast-squares the SfM calculations, of the time is camera spent onand bundle adjustment. Bundle time is will increase because of the increase theoptimizes number ofthe parameters. Thestructural proposedparameters; method divides itself a nonlinear least-squares probleminthat camera and the calculation time will increase because of the increase in the number of parameters. The proposed method divides the global bundle adjustment, which optimizes a large number of parameters, into

Sensors 2018, 18, 225

15 of 20

the global bundle adjustment, which optimizes a large number of parameters, into several local bundle adjustments so that the number of the parameters remains small and the calculation speed of the algorithm improves greatly. 4.4. Results The result is shown in Figure 13 (m = 15, k = 6).The scene in this case is captured by an UAV camera in a garden of YanJiao. The flight height is about 15 m from the ground and is kept unchanged. The flight distance is around 50 m. The images’ resolution is 1920 × 1080. And the number of points in point cloud is 4,607,112. The result is shown in Figure 14 (m = 15, k = 6). The scene in this case is captured by a UAV camera in a village. The UAV is launched from the ground and flies over the house. The maximum flight 2018, height Sensors 18, is 225around 6 m. The flight distance is around 20 m. The images’ resolution is 1920 × 151080. of 20 And the number of points in the point cloud is 3,040,551. several adjustments of the parameters remainsbysmall the Thelocal resultbundle is shown in Figure 15so(mthat = 15,the k = number 6). The scene in this case is captured a UAVand camera calculation speed of the algorithm improves greatly. in a village. The UAV flight over the top of the buildings. The flight height is around 80 m and is kept unchanged. The flight distance is around 150 m. The images’ resolution is 1280 × 720 and the number 4.4. Results of points in point cloud is 2,114,474. The result shown in Figure 16=(m k = 3). In this case, theisUAV flightby is an over a botanical The result isis shown in Figure 13 (m 15,=k =10, 6).The scene in this case captured UAV camera garden. The flight blocks are integrated for many parallel strips. The flight height is around 40 m The and in a garden of YanJiao. The flight height is about 15 m from the ground and is kept unchanged. kept unchanged. The flight distance is around 50 m. The images’ is 1920of×points 1080 and the flight distance is around 50 m. The images’ resolution is 1920 × 1080.resolution And the number in point number of points in point cloud is 2,531,337. cloud is 4607112.

(a)

(b)

(c)

Figure 13. result of a garden. (a) Part the images for reconstruction; (b) Structure Figure 13.Reconstruction Reconstruction result of a garden. (a)ofPart of theused images used for reconstruction; calculation of image queue SfM (greenqueue points SfM represent thepoints positions of the camera); (c) Denseofpoint (b) Structure calculation of image (green represent the positions the cloud of the camera); (c)scene. Dense point cloud of the scene.

The result is shown in Figure 14 (m = 15, k = 6). The scene in this case is captured by a UAV camera in a village. The UAV is launched from the ground and flies over the house. The maximum flight height is around 6 m. The flight distance is around 20 m. The images’ resolution is 1920 × 1080. And the number of points in the point cloud is 3,040,551. The result is shown in Figure 15 (m = 15, k = 6). The scene in this case is captured by a UAV camera in a village. The UAV flight over the top of the buildings. The flight height is around 80 m and is kept unchanged. The flight distance is around 150 m. The images’ resolution is 1280 × 720 and the number of points in point cloud is 2,114,474. The result is shown in Figure 16 (m = 10, k = 3). In this case, the UAV flight is over a botanical garden. The flight blocks are integrated for many parallel strips. The flight height is around 40 m and kept unchanged. The flight distance is around 50 m. The images’ resolution is 1920 × 1080 and the number of

Sensors 2018, 18, 225

16 of 20

Sensors 2018, 18, 225

16 of 20

Sensors 2018, 18, 225

16 of 20

(a) (a)

(b) (b)

(c) (c) Part of the images used for reconstruction; Figure 14.Reconstruction Reconstruction result of a village. Figure 14. result of a village. (a) Part(a) of the images used for reconstruction; (b) Structure (b) Structure calculation of image queue SfM (green represent the positions the calculation image queue SfM (green represent theofpoints positions of the camera); (c) Denseofpoint Figure 14. of Reconstruction result of apoints village. (a) Part the images used for reconstruction; camera); (c) Dense point cloud of the scene. cloud of the scene. (b) Structure calculation of image queue SfM (green points represent the positions of the camera); (c) Dense point cloud of the scene.

(a) (a)

(b) (b)

(c)

Figure 15. Reconstruction result of buildings.(c)(a) Part of the images used for reconstruction; (b) Structure calculation of image queue SfM (green points represent the positions of the Figure 15. Reconstruction of buildings. Figure 15. Reconstruction result result of buildings. (a) (a)Part Partofofthe theimages imagesused usedfor forreconstruction; reconstruction; camera); (c) Dense point cloud of the scene. (b) Structure calculation of image queue SfM (green points represent the positions of the (b) Structure calculation of image queue SfM (green points represent the positions of the camera); camera); (c) Dense (c) Dense point cloudpoint of the cloud scene. of the scene.

Sensors 2018, 18, 225

17 of 20

Sensors 2018, 2018, 18, 18, 225 225 Sensors

17 of of 20 20 17

(a) (a)

(b) (b)

(c) (c)

Figure16. 16.Reconstruction Reconstruction result of botanical botanical garden. (a) Part of used the images images used for for Figure 16. Reconstruction result of of the used Figure result of botanical garden. garden. (a) Part of(a) thePart images for reconstruction; reconstruction; (b) Structure Structure calculation of image image queue SfM (green points represent the (b) Structure calculation of image calculation queue SfM (green pointsqueue represent the(green positions of camera); (c) Dense reconstruction; (b) of SfM points represent the positions of camera); (c) Dense point cloud of the scene. point cloudofofcamera); the scene.(c) Dense point cloud of the scene. positions result isis shown in Figure Figure 17 (m (m17== 20, 20, 5). In In this case, the case, UAVthe flight is over over factory land. The The resultis shown in Figure (m kk===20, k= 5).case, In this UAV flight is over land. a factory The result shown in 17 5). this the UAV flight is aa factory The flight height is around 90 m and is kept unchanged. The flight distance is around 300 m. The images’ land. The flight height is around 90 m and is kept unchanged. The flight distance is around 300 m. flight height is around 90 m and is kept unchanged. The flight distance is around 300 m. The images’ resolution is 1280 × 720 and the number of points in point cloud is 9,021,836. The images’ resolution is 1280 × 720 and the number of points in point cloud is 9,021,836. resolution is 1280 × 720 and the number of points in point cloud is 9,021,836.

(a) (a)

(b) (b)

(c) (c)

Figure 17. Reconstruction result of botanical garden. (a) Part of the images used for

Figure17. 17.Reconstruction Reconstruction result of botanical of used the images used for Figure result of botanical garden. garden. (a) Part of(a) thePart images for reconstruction; reconstruction; (b) (b) Structure Structure calculation calculation of of image image queue queue SfM SfM (green (green points points represent represent the the reconstruction; (b) Structure calculation of image queue SfM (green points represent the positions of camera); (c) Dense positions of camera); (c) Dense point cloud of the scene. positions point cloudofofcamera); the scene.(c) Dense point cloud of the scene. The result result isisshown shown inin Figure 18 (m (m 25,=kk25, 8).k In In thisIncase, case, ground-based camera instead of UAV UAV The in Figure 18 25, == 8). this ground-based camera instead of The resultis shown Figure 18 ==(m = 8). this aa case, a ground-based camera instead of camera is used to move around the academic building and taken images. The images’ resolution is camera is used to move around the academic building andand taken images. TheThe images’ resolution is UAV camera is used to move around the academic building taken images. images’ resolution 1920 × 1080 and the number of points in point cloud is 23,900,173. The result shows that our algorithm can be 1920 × 1080 and the number of points point cloud is 23,900,173. result shows our shows algorithm canour be is 1920 × 1080 and the number ofinpoints in point cloud is The 23,900,173. Thethat result that used in in reconstruction reconstruction from normal digital digital camera camera imagesdigital as long longcamera as the the images images areastaken taken continuously. used from normal images as as are algorithm can be used in reconstruction from normal images longcontinuously. as the images are taken continuously.

Sensors 2018, 18, 225

18 of 20

Sensors 2018, 18, 225

18 of 20

(a)

(b)

(c)

Figure18. 18.Reconstruction Reconstruction result of botanical of used the images used for Figure result of botanical garden. garden. (a) Part of(a) thePart images for reconstruction; reconstruction; (b) Structure calculation of image queue SfM (green points represent the (b) Structure calculation of image queue SfM (green points represent the positions of camera); (c) Dense positions of camera); (c) Dense point cloud of the scene. point cloud of the scene. The results of experiment images used in this paper are present in Figures 13–18. For each The results of experiment images used in this paper are present in Figures 13–18. For each example, Figure 18a shows some of the images used for 3D reconstruction. In the Figure 18b four example, Figure 18a shows some of the images used for 3D reconstruction. In the Figure 18b four most most representative views of SfM, calculation results are selected to present the process of image representative views of SfM, calculation results are selected to present the process of image queue SfM. queue SfM. Green points represent the positions of camera, and red points are control points, white Green points represent the positions of camera, and red points are control points, white points are points are structural feature points. Positions and orientations of cameras together with object feature structural feature points. Positions and orientations of cameras together with object feature points are points are derived in the order of camera movement. As is shown in Figure 18c, the 3D point cloud derived in the order of camera movement. As is shown in Figure 18c, the 3D point cloud is generated is generated by depth–map fusion. Accurate result can be obtained by using our method as long as by depth–map fusion. Accurate result can be obtained by using our method as long as the images are the images are captured continuously. The final results accurately reproduce the appearance of the captured continuously. The final results accurately reproduce the appearance of the scenes. scenes. 5. Conclusions 5. Conclusions In order to reconstruct the 3D structure of scenes using image sequences, we propose a rapid and In order to reconstructmethod the 3D based structure of scenes using image we proposeanalysis a rapid accurate 3D reconstruction on an image queue. First, asequences, principal component and accurate 3D reconstruction method based on an image queue. First, a principal component method of the feature points is used to select the key images suitable for 3D reconstruction, which analysis that method the feature points is used to select speed the keywith images suitable for of 3Daccuracy. reconstruction, ensures the of algorithm improves the calculation almost no loss Then, which ensures that the algorithm improves the calculation speed with almost no loss of accuracy. considering the continuity and relevance of the UAV camera’s images, we propose a method based Then, theOur continuity relevance of the UAVadjustment camera’s images, we propose a method on an considering image queue. methodand divides a global bundle calculation into several local based on an image queue. Our method divides a global bundle adjustment calculation into several bundle adjustment calculations, greatly improving the calculation speed of the algorithm and making localstructures bundle adjustment calculations, greatly thedata calculation speedare of obtained the algorithm and the continuous. Finally, dense 3Dimproving point cloud of the scene by using making the fusion. structures densethat 3D point data of of the the images scene are obtainedand by depth–map Thecontinuous. experimentsFinally, demonstrate when cloud the texture is complex using depth–map fusion. The experiments demonstrate that when the texture of the images is the number of images exceeds 100, the proposed method can improve the calculation speed by more complex and the number of images exceeds 100, the proposed method can improve the calculation than a factor of four with almost no loss of calculation accuracy. Furthermore, when the number of speed by more than factor of fourin with no loss of calculation accuracy. Furthermore, when images increases, theaimprovement the almost calculation speed will become more noticeable. the number of images increases, the improvement in the calculation speed willstructure become of more When the scene is too long, such as the flight distance is more than 300 m. The the noticeable. reconstruction will be distorted due to accumulated errors. This problem is solved in global SfM [7] by the scene is too long, as work the flight distance than 300 m. The structure ofwill the usingWhen loop closure constraint. Oursuch future will be aimed is at more cumulative errors elimination and reconstruction will be distorted due to accumulated errors. This problem is solved in global SfM [7] obtain higher accuracy. With the rise of artificial intelligence research, the parameters of m and k can by using loop closure constraint. Our future work will be aimed at cumulative errors elimination and will obtain higher accuracy. With the rise of artificial intelligence research, the parameters of m and k can be selected automatically by using deep learning and machine learning. Improving the performance of the algorithm in parameter selection is also part of our future work.

Sensors 2018, 18, 225

19 of 20

be selected automatically by using deep learning and machine learning. Improving the performance of the algorithm in parameter selection is also part of our future work. Acknowledgments: This work was financially supported by Natural National Science Foundation of China (NSFC) (51675033). Author Contributions: Yufu Qu analyzed the weak aspects of existing methods and set up the theoretical framework. Jianyu Huang designed the method of selecting key images from image sequence and SfM calculation for the UAV camera’s images, then programmed to achieve the methods, performed the experiment. Xuan Zhang collected the experimental image data and helped improving the performance of the algorithm and analyzed the result. Jianyu Huang wrote the paper and Yufu Qu made the modification. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3.

4.

5. 6.

7.

8. 9.

10.

11.

12.

13. 14. 15. 16.

Polok, L.; Ila, V.; Solony, M.; Smrz, P.; Zemcik, P. Incremental Block Cholesky Factorization for Nonlinear Least Squares in Robotics. Robot. Sci. Syst. 2013, 46, 172–178. Kaess, M.; Johannsson, H.; Roberts, R.; Ila, V.; Leonard, J.J.; Dellaert, F. iSAM2: Incremental smoothing and mapping using the Bayes tree. Int. J. Robot. Res. 2012, 31, 216–235. [CrossRef] Liu, M.; Huang, S.; Dissanayake, G.; Wang, H. A convex optimization based approach for pose SLAM problems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 1898–1903. Beardsley, P.A.; Torr, P.H.S.; Zisserman, A. 3D model acquisition from extended image sequences. In Proceedings of the European Conference on Computer Vision, Cambridge, UK, 14–18 April 1996; pp. 683–695. Mohr, R.; Veillon, F.; Quan, L. Relative 3-D reconstruction using multiple uncalibrated images. Int. J. Robot. Res. 1995, 14, 619–632. [CrossRef] Dellaert, F.; Seitz, S.M.; Thorpe, C.E.; Thrun, S. Structure from motion without correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC, USA, 15 June 2000; Volume 552, pp. 557–564. Moulon, P.; Monasse, P.; Marlet, R. Adaptive structure from motion with a contrario model estimation. In Proceedings of the Asian Conference on Computer Vision, Daejeon, Korea, 5–9 November 2012; pp. 257–270. Wu, C. Towards linear-time incremental structure from motion. In Proceedings of the International Conference on 3DTV-Conference, Aberdeen, UK, 29 June–1 July 2013; pp. 127–134. Gherardi, R.; Farenzena, M.; Fusiello, A. Improving the efficiency of hierarchical structure-and-motion. In Proceedings of the Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1594–1600. Moulon, P.; Monasse, P.; Marlet, R. Global fusion of relative motions for robust, accurate and scalable structure from motion. In Proceedings of the IEEE International Conference on Computer Vision, Portland, OR, USA, 23–28 June 2013; pp. 3248–3255. Crandall, D.J.; Owens, A.; Snavely, N.; Huttenlocher, D.P. SfM with MRFs: Discrete-continuous optimization for large-scale structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2841–2853. [CrossRef] [PubMed] Sweeney, C.; Sattler, T.; Höllerer, T.; Turk, M. Optimizing the viewing graph for structure-from-motion. In Proceedings of the IEEE International Conference on Computer Vision, Los Alamitos, CA, USA, 7–13 December 2015; pp. 801–809. Snavely, N.; Simon, I.; Goesele, M.; Szeliski, R.; Seitz, S.M. Scene reconstruction and visualization from community photo collections. Proc. IEEE 2010, 98, 1370–1390. [CrossRef] Wu, C.; Agarwal, S.; Curless, B.; Seitz, S.M. Multicore bundle adjustment. In Proceedings of the Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3057–3064. Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1362–1376. [CrossRef] [PubMed] Shen, S. Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes. IEEE Trans. Image Process. 2013, 22, 1901–1914. [CrossRef] [PubMed]

Sensors 2018, 18, 225

17. 18. 19. 20. 21. 22. 23. 24. 25.

26. 27.

28. 29.

30.

31. 32.

20 of 20

Li, J.; Li, E.; Chen, Y.; Xu, L. Bundled depth-map merging for multi-view stereo. In Proceedings of the Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2769–2776. Schönberger, J.L.; Zheng, E.; Frahm, J.M.; Pollefeys, M. Pixelwise View Selection for Unstructured Multi-View Stereo; Springer International Publishing: New York, NY, USA, 2016. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef] Moulon, P.; Monasse, P. Unordered feature tracking made fast and easy. In Proceedings of the European Conference on Visual Media Production, London, UK, 5–6 December 2012. Moisan, L.; Moulon, P.; Monasse, P. Automatic homographic registration of a pair of images, with a contrario elimination of outliers. Image Process. Line 2012, 2, 329–352. [CrossRef] Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Read. Comput. Vis. 1987, 24, 726–740. Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [CrossRef] Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2003. Triggs, B.; Mclauchlan, P.F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle adjustment—A modern synthesis. In Proceedings of the International Workshop on Vision Algorithms: Theory and Practice, Corfu, Greece, 21–22 September 1999; pp. 298–372. Ceres Solver. Available online: http://ceres-solver.org (accessed on 14 January 2018). Sølund, T.; Buch, A.G.; Krüger, N.; Aanæs, H. A large-scale 3D object recognition dataset. In Proceedings of the Fourth International Conference on 3D Vision, Stanford, CA, USA, 25–28 October 2016; pp. 73–82. Available online: http://roboimagedata.compute.dtu.dk (accessed on 14 January 2018). Jensen, R.; Dahl, A.; Vogiatzis, G.; Tola, E. Large scale multi-view stereopsis evaluation. In Proceedings of the Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 406–413. Pierrot Deseilligny, M.; Clery, I. Apero, an Open Source Bundle Adjusment Software for Automatic Calibration and Orientation of Set of Images. In Proceedings of the ISPRS—International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XXXVIII-5/W16, Trento, Italy, 2–4 March 2012; pp. 269–276. Galland, O.; Bertelsen, H.S.; Guldstrand, F.; Girod, L.; Johannessen, R.F.; Bjugger, F.; Burchardt, S.; Mair, K. Application of open-source photogrammetric software MicMac for monitoring surface deformation in laboratory models. J. Geophys. Res. Solid Earth 2016, 121, 2852–2872. [CrossRef] Rupnik, E.; Daakir, M.; Deseilligny, M.P. MicMac—A free, open-source solution for photogrammetry. Open Geosp. Data Softw. Stand. 2017, 2, 14. [CrossRef] Cloud Compare. Available online: http://www.cloudcompare.org (accessed on 14 January 2018). © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Suggest Documents