Image-based 3D reconstruction and Recognition ... - Civil Engineering

4 downloads 242 Views 3MB Size Report
Aug 11, 2011 ... collection methods for highway assets are time consuming, ... 3D image-based reconstruction and integrated recognition of color, shape, and ...
Image-based 3D reconstruction and Recognition for Enhanced Highway Condition Assessment Berk Uslu1, Mani Golparvar-Fard2, and Jesus M. de la Garza3 1 Graduate Student, Construction Engineering and Management Group. Via Dept. of Civil and Environmental Engineering, Virginia Tech, Blacksburg, VA; PH (540) 905-8525; FAX (540) 231- 7532; email: [email protected] 2 Assistant Professor, Construction Engineering and Management Group. Via Dept. of Civil and Environmental Engineering, and Myers-Lawson School of Construction, Virginia Tech, Blacksburg, VA; PH (540) 231-7255; FAX (540) 231- 7532; email: [email protected] 3 Vecellio Professor, Construction Engineering and Management Group. Via Dept. of Civil and Environmental Engineering, and Myers-Lawson School of Construction, Virginia Tech, Blacksburg, VA; PH (540) 231-7255; FAX (540) 231- 7532; email: [email protected]

ABSTRACT Frequent and accurate condition assessment is essential for an effective transportation system operation and asset management. Despite the importance, current manual data collection methods for highway assets are time consuming, subjective and sometimes unsafe. There is a need for an automated and efficient data collection method that does not have a significant cost impact and can achieve automation, accuracy, and safety in condition assessment. Over the past few years, advances in technology such as cheap and high-resolution digital cameras and availability of vast data storage has allowed a number of computer vision models to be developed that can detect and assess condition of some individual assets. However, none of these vision-based methods recognize, locate, assess condition of the assets, and visualize their most updated status in a 3D environment. This paper proposes a new approach, based on 3D image-based reconstruction and integrated recognition of color, shape, and texture for highway assets, and presents preliminary results from the developed system on a real world case study. INTRODUCTION Infrastructure systems are recognized as the fundamental foundation of societal and economic functions such as transportation, communication, energy distribution, wastewater collection, and water supply. Most of the infrastructure systems are both geographically extensive and have a long service life. It is expensive to provide and manage any physical infrastructure over spatially extensive areas and for longtime spans. This spatial and temporal range of infrastructure systems causes a high degree of uncertainty in setting numerical models for modeling deterioration rates. These characteristics of the infrastructure systems complicate the planning for future infrastructure maintenance, repair, and reconstruction of the existing facilities. High costs, tight budgets, and previous decisions that were based on inaccurate predictions of infrastructure performance are resulting in serious consequences (Maser 2005). American Society of Civil Engineers is estimating that $2.2 trillion is needed over five years to repair and retrofit the U.S. infrastructure to a good condition (ASCE 2009). This issue is not only limited to the U.S. as the infrastructure in other countries is also aging and failing. Although managing and maintaining infrastructure is not a

67 Copyright ASCE 2011

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib

68

COMPUTING IN CIVIL ENGINEERING

new problem, nonetheless, in recent decades a significant expansion in size and complexity of infrastructure networks have posed several new engineering and management problems on how existing infrastructure can be monitored, prioritized, and maintained in a timely fashion. One of the grand challenges in restoring and improving urban infrastructure, as identified by the National Academy of Engineering (NAE 2010), is to devise techniques to efficiently create records of locations and upto-date status of the infrastructure. The need for frequent tracking and condition assessment is not only specific to existing infrastructure but it is also affecting new construction projects due to lack of techniques to easily and quickly track, analyze and visualize the as-built status of a project and monitor performance metrics (Golparvar-Fard et al. 2010, 2009a&b). To address these inefficiencies in an all-inclusive manner, this research looks into creating a new technique through application of infrastructure close range imagery, and explores how current challenges of creating up-to-date records of new and existing civil infrastructure (recognizing and locating them), in addition to assessing their conditions can be proactively addressed. This paper proposes a new approach, based on 3D image-based reconstruction and integrated recognition of color, shape, and texture for highway assets, and presents preliminary results from the developed system on a real world case study. PROBLEM STATEMENT In current practice, assessing asset conditions is still a predominantly manual and thus a time consuming process. A certain amount of subjectivity and the experience of the raters have an undoubted influence on the final assessment (Binachini et al. 2010). In addition, most maintenance decision-making approaches employ a discrete representation of condition. For example, pavements are usually evaluated in five different condition states varying form excellent to very poor (de la Garza and Krueger 2008). Advances in continuous condition based decision-making are of interest to the infrastructure management community, since infrastructure damage variables are typically continuous in nature. Rapid advances in automated inspection techniques are easily measuring these damage variables, and practical benefits from considering this more natural representation of condition are increasingly possible. These advances foster further research in formulating, solving, and implementing infrastructure management methods using continuous representations of important condition variables. Some research studies have already addressed the problem of automated detection, classification, and assessment of assets in a discrete fashion (Mashford et al. 2009, Meegoda et al. 2006). Current research efforts in devising a computer vision model for highway asset detection are roughly divided into three stages: segmentation, detection and condition assessment. Bascon (2010) presented a Support Vector Machine to recognize road-signs. Krishnan (2007) has presented a triangulation and bundle adjustment approach for identifying road signs. Hu and Tsai (2010) and Wu and Tsai (2006) have created a nearest-neighbor assignment of feature descriptors for an image recognition model for developing a sign inventory. Although most of these techniques have achieved the goal of automation and accuracy to a reasonable level, nonetheless none of these systems use the same visual information to locate the assets and more importantly detect them in a continuous fashion.

Copyright ASCE 2011

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib

COMPUTING IN CIVIL ENGINEERING

69

The specific goal of this research is to create an automated condition assessment tool that will be used for low-cost, accurate, frequent and continuous data collection. The newly created condition assessment system, contrary to the current systems in use will not be solely focusing on one type of asset, but will be a comprehensive system that can be employed to perform automated condition assessment for many different assets (such as guardrail, signs, paved ditches and lighting fixtures). By utilizing this newly created system, highway agencies would not only obtain lowcost, accurate, and frequent condition data. But this consistent data can be used to set the discrete representation of conditions of the low capital assets and formulate the deterioration rates for these assets. Consequently, that would allow better investment planning for low-capital assets. Furthermore, this new approach is built on a newly developed 3D image based reconstruction technique (Golparvar-Fard et al. 2010 & 2009a) which enables assets to be located and visualized in a common 3D environment, and integrates 3D reconstruction with 2D recognition of elements. RESEARCH APPROACH The new proposed approach and the developed system will be able to exceed minimum requirements of standards on safety, efficiency and the consistency by utilizing visual sensing techniques. The working principle of the system is summarized in Figure 1. The steps that will be followed to create the proposed system are as follows: 1. 3D image-based reconstruction of all objects using the D4AR reconstruction approach (Golparvar-Fard 2010) which integrates structure-from-motion, multiview stereo and voxel coloring/labeling; 2. Utilizing Semantic Texton Forrest (STF) algorithm to independently segmentize each image into proper asset categories; 3. Integrate camera parameters recognized through the reconstruction step with the segmented areas to stitch relevant image parts into an panoramic image (Necessary for large sized assets which are present in more than one frame like guardrail and pavement); 4. Project and visualize the results into a common 3D environment, accessible through ubiquitous devices in onsite and remote coordination centers.

Figure 1. The data and process in our developed system

Copyright ASCE 2011

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib

70

COMPUTING IN CIVIL ENGINEERING

3D Image-based Reconstruction The state-of-the-art 3D reconstruction has gone under a significant important over the past few years. Availability of cheap and high-resolution imagery along with large data storage capacity, in addition to advances in computing, has created a great opportunity to run 3D image-based reconstruction at large scales. A few research groups (Furukawa et al. 2010, Gallup et al. 2010) have already demonstrated high density and accurate image based reconstruction results. Application of image-based 3D reconstruction in the construction industry is relatively new. These images are traditionally unordered and uncalibrated, and usually include significant amount of occlusion, which makes the application of existing 3D reconstruction algorithms difficult. Recently Golparvar-Fard et al. (2010, 2009a) proposed a new dense reconstruction algorithms which is based on Structure-from-Motion (SfM), MultiView Stereo (MVS) and a voxel coloring/ labeling mechanism which results in dense reconstruction. In this research, the 3D image-based reconstruction module builds upon the newly proposed algorithm and is tested in the context of sequentially captured images for highways. Semantic Texton Forest (STF) for Recognition Textons and visual words have proven powerful discrete image representations for categorization and segmentation. In these approaches, filter bank responses (e.g., derivatives of Gaussians, wavelets) or invariant descriptors (e.g., SIFT) are computed across a training set. The collections of these descriptors are clustered to produce a codebook of visual words, typically with the simple but effective k-means, followed by nearest-neighbor assignment. Unfortunately, this three-stage process is extremely slow and often the most time consuming part of the whole system, even with optimizations such as kd-trees, the triangle inequality, or hierarchical clusters, making their application less attractive for highway asset management. The STF algorithm (Shotton et al. 2008) is an efficient and powerful low-level feature which can be effectively employed in the semantic segmentation of images. Semantic texton forests do not need the expensive computation of filter-bank responses or local descriptors. The STF algorithm is built upon a randomized decision tree structure where the nodes in the trees provide: (i) Implicit hierarchical clustering into semantic textons, and (ii) Explicit local classification estimate. Finally, these features are used in machine learning algorithm which performs segmentation and detection with a semi-supervised technique (the algorithm trains itself with ground truth images that are created by the user). Randomized Decision Trees As illustrated in Figure 2, a decision forest is a group of T decision trees. P(c|n) is the learned class (c) probability distribution associated with each node (n) in the tree. A decision tree works by branching down the tree according to a learned binary function of the feature vector, until a leaf node l is reached. The whole forest achieves an accurate and robust classification by averaging the class distributions over the leaf nodes L= (l1,….lT):

Copyright ASCE 2011

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib

71

COMPUTING IN CIVIL ENGINEERING

∑T P c|lT P c|L (1) T A forest consists of T decision trees. A feature vector is classified by descending each tree. This gives, for each tree, a path from root to leaf, and a class distribution at the leaf. As an illustration, the roots to leaf paths are highlighted in yellow and class distributions in red for one input feature vector.

Figure 2. Decision forests. Randomized Learning Each tree is trained separately on a small random subset of the training data I. Learning precedes repetitively, splitting the training data In at node n into left and right subsets Il and Ir according to a threshold t of some split function f of the feature vector v. | , (2) \ (3) At each split node, several candidates for function f and threshold t are generated randomly, and the one that maximizes the expected gain in information about the node categories is chosen. ∆

| |

| |

| |

| |

(4)

Where E(I) is the Shannon entropy of the classes in the set of examples I (Shotton et al. 2008). The training continues to a maximum depth D or no further information can be acquired. The class distribution P(c|n) are estimated as a histogram of the class labels ci of the training examples i that reached node n. Bag of Semantic Textons The bag of semantic textons combines a histogram of semantic textons over an image region with a region prior category distribution. The bag of semantic textons is used with a support vector machine (SVM) classifier which assumes an image-level prior over categories, enables the segmentation to emphasize those categories that the SVM believes to be present. Tree tT

Prob.

Tree t1

d:

2

3

4

5

d:

2

3

4

5

Category

Figure 3. Bags of semantic textons. Within a region r of image I, the semantic texton histogram and region prior generated. The histogram incorporates the implicit hierarchy of clusters in the STF,

Copyright ASCE 2011

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib

72

COMPUTING IN CIVIL ENGINEERING

containing both STF leaf nodes (green) and split nodes (yellow). The region prior is computed as the average of the individual leaf node class distributions P(c|l). RESEARCH EXPERIMENTS The developed asset management system first performs a 3D image-based reconstruction using the images that are collected in a sequential fashion, next, the STF algorithm is implemented to perform segmentation and classification of the images acquired form the highway. The performance of the semantic texton forest algorithms for the segmentation and detection of the highway assets is evaluated in the newly created automatic condition assessment system. The recognition algorithm uses a dataset consisting of the images and the ground truths (same image labeled in a supervised fashion) of these images that are used to create the decision trees. There were two experiments performed to evaluate the performance of this algorithm. First experiment was performed with a new image dataset consisting of four categories (i.e., guardrail, pavement, poles, and signs) plus the void category consisting of fourteen images to investigate the performance of the algorithm for the segmentation and detection of the highway asset images. These images are taken from Virginia Tech’s Smart Road, which is a 2.1 mile long research facility used for highway research located at Blacksburg VA. An initial 3D reconstruction was performed with this dataset. The results of this initial and controlled experiment suggested that the number of categories used for the training should be increased in order to have correct segmentation with minimal segmentation confusion (wrong recognition of the category). Subsequently, second experiment was performed with extending the dataset that was created for the 1st experiment. By adding the background objects (such as sky, grass, soil or trees) as new categories for the algorithm to be trained, the confusion was reduced significantly. The dataset for this experiment were consisting of twelve different categories plus a void category to train the algorithm. Similar to the first experiment, a 3D image-based reconstruction was performed with this dataset. Table 1 presents the results of evaluating performance of the 3D image-based reconstruction algorithm with the state-of-the-art Structure from Motion algorithm (Snavely et al. 2007) on the dataset. Table 1. Results of the 3D image-based reconstruction. D4AR SfM Point D4AR Point SfM cloud cloud computational computational resolution resolution time1 time #1 120 108,621 1,437,001 6hr 13min 8hr 25min #2 171 175,737 2,076,887 8hr 54min 10hr 17min 1 Computation times are benched marked on an Intel i7 core with 12GBs of RAM. 2 Recall: percentage of the images that are successfully registered to the point cloud. Experiment

# of images

Copyright ASCE 2011

Recall2 0.93 0.98

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib

73

COMPUTING IN CIVIL ENGINEERING

Figure 4. 3D Image-based reconstruction results Table 2 presents the segmentation categories, the number of images used per category and the specific color that was assigned to each category for supervised training and automated testing. For this purpose, the regions of interests were highlighted with these colors in a supervised fashion and the rest of the images were highlighted in black representing the void category. Table 2. Thirteen segmentation categories for experiment #2. Category Name

Images (#) 7 7

(0,0,0) (0,128,0)

Grass Soil

7

(0,0,128)

7

(128,0,0)

Poles

7

(128,128,0)

Signs

7

(128,128,128)

Trees

7

(0,128,128)

Void Asphalt Pavement Concrete Pavement Guardrail

(R,G,B)

Guardrail (a-1)

(a-2)

Asphalt

Color

Category Name

Images (#) 7 7

(128,0,128) (255,0,0)

Sky

7

(0,255,0)

Safety Cones Traffic Lights Pavement Markings

7

(0,0,255)

7

(255,128,255)

7

(128,255,255)

Pole

(c-2)

Color

Pole Soil (b-1)

(b-2)

(d-1)

Pavement Markings (d-2)

Guardrail (c-1) (c-1)

(R,G,B)

Figure 5. Supervised segmentation of the ground truth images. Results and Discussion Results of the second experiment were investigated further to evaluate the performance of the STF algorithm. For each category, three images long with their

Copyright ASCE 2011

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib

74

COMPUTING IN CIVIL ENGINEERING

segmentations were randomly selected per category. Ten principal pixels were selected per image from the asset in interest and the segmentation result was evaluated by acquiring RGB value of these points. If the RGB of the specific point matches the specific color assigned for the asset in interest, it was considered to be a True Positive (TP), if it did not match, it was considered to be a False Negative (FN). The results of this analysis were plotted in a Receiver Operating Characteristic (ROC) plot (Figure 6). ROC Plot For Training Categories

True Positine Rate (Percent)

100

Asphalt Pavement Concrete Pavement Guardrail Poles Signs Trees Grass Soil

50 0 0

20

40

60

80

False Negative Rate (Percent)

100

Figure 6. ROC plot for trained categories. As demonstrated in Figure 6, the results of this preliminary experiment were mostly reasonable. All of the images except one have a true positive rate above the 50% line. Although there were minor segmentation confusions present, as demonstrated in Figure 7, most of the images are segmented successfully. The high succession rates in the segmentations are encouraging and suggesting that the STF algorithm can be implemented to perform the segmentations for the newly created automated condition assessment system.

(a-1)

(a-2)

(b-1)

(b-2)

(c-1)

(c-2)

(d-1)

(d-2)

Figure 7. The segmentation and asset recognition results. The results show that if distinct features of the highway asset present, the success rate in segmentizing that asset is increased. As represented in Figure 6, the True Positive rates for the signs are among the highest. This is caused by the distinct green color of these signs. In contrary, the segmentation results for the poles are among the lowest since the features of these asset items resemble other asset items such as the guardrails. The computational time confirms that application of such a machine learning algorithm is much faster and more convenient compared to other algorithms used for segmentation. The machine learning kernel allows the thresholds for the filter bank to be automatically trained through the ground truth data and dynamically

Copyright ASCE 2011

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib

COMPUTING IN CIVIL ENGINEERING

75

finds the threshold surface. This flexibility is an important attribute for the highway asset condition assessment system, yet confirms that for a more robust segmentation & categorization of assets; more systematic collection of training data is required. Conclusion The automated and integrated image-based 3D reconstruction and recognition asset management system presented in this paper demonstrates promising results. The lowcost, and accuracy of this technology along with the high safety associated with its application, can replace the current manual and subjective data analysis and/or the computer vision systems that are currently in use. The implementation of this algorithm is the first step in creating this new condition assessment system. By using this approach, there will be no need for application of filter-bank responses or local descriptors which are computationally expensive. More experiments need to be conducted by expanding the training dataset, and testing performance on different datasets with different levels of visibility and occlusion. Since the 3D image-based reconstruction algorithm geo-registers and associates images together, the segmentation results in any of these paired images can help in boosting the confidence in segmentation and recognition of any new training image. This integration will also be tested and reported in a near future. References ASCE. (2009). The 2009 report card for America’s infrastructure. http://www.asce.org/reportcard/2009. Accessed Jan. 10 2011. Bascon S. M., Rodriguez J. A. , Arroyo S. L., Caballero A. F., and Lopez-Ferreras F. (2010). “An optimization on pictogram identification for the road-sign recognition task using SVMs.” CVIU. 14 (3), 373-383. Bianchini A., Bandini P., and Smith D.W. (2010). “Interrater reliability of manual pavement distress evaluations.” ASCE J. of Transp.Eng., 136 (2), 165-172. de la Garza J. M., and Krueger D. A. (2007). “Simulation of highway renewal asset management strategies.” Proc., ASCE Conf. of Computing in Civil Eng., 527-541, 2007. Furukawa Y., Curless B., Seitz S.M. and Szeliski R. (2010). “Towards internet-scale multi-view stereo.” Proc., Computer Vision and Pattern Recognition Conf. Gallup D., Frahm J.-M., Pollefeys M. (2010). “A heightmap model for efficient 3D reconstruction from street-level video.” Proc., Int. Conf. on 3D Data Processing, Visualization and Transmission (3DPVT2010). Golparvar-Fard M., Peña-Mora F. and Savarese S. (2010). “D4AR – 4 dimensional augmented reality tools for automated remote progress tracking and support of decision-enabling tasks in the AEC/FM industry.” Proc., The 6th Int. Conf. on Innovations in AEC. Golparvar-Fard M., Peña-Mora F., and Savarese S. (2009a). “D4AR- a 4-dimensional augmented reality model for automating construction progress data collection, processing and communication.” Journal of Information Technology in Construction (ITcon), 14, 129-153. Golparvar-Fard M., Peña-Mora F. Arboleda C. A., and Lee S. H. (2009b). “Visualization of construction progress monitoring with 4D simulation model overlaid on time-lapsed photographs.” ASCE J. of Computing in Civil Engineering, 23 (6), 391-404 Hu Z. and Tsai Y. (2010) “Image Recognition Model for Developing a Sign Inventory” ASCE J. of Comp. in Civil Eng., in press. Krishnan A. (2009). “Computer vision system for identifying road signs using triangulation and bundle adjustment”. MS Thesis, Computer Engineering. Kansas State University, Manhattan, Kansas. Maser K., J. (2005) “Automated systems for infrastructure condition assessment” ASCE J. Infrastruct. Syst. 11, 153.

Copyright ASCE 2011

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib

76

COMPUTING IN CIVIL ENGINEERING

Mashford J., P. Davis P., Rahilly M. “Pixel-based colour image segmentation using support vector machine for automatic pipe inspection,” Proc. the 20th Australian Joint Conf. on AI, vol. 4830,739–743. Meegoda J. N., Juliano T. M., and Banerjee A., (2006). “A Framework for Automatic Condition Assessment of Culverts,” Paper No. 06-2414, 85th Annual Meeting of the Transportation Research Board, Washington, DC, NAE, National Academy of Engineers (2010). Grand Challenges for Engineering. NAE of the National Academies. Sotton J., Johnson M., Cipolla R., (2008). “Semantic Texton Forests for Image Categorization and Segmentation.” Proc. Int. Conf. Computer Vision and Pattern Recognition. Snavely N., Steven M. Seitz, S. M., Szeliski, R. (2007). “Modeling the World from Internet Photo Collections”. Int. J. of Comp.Vis., 2007. Wu J. and Tsai Y. (2006). “Enhanced Roadway Inventory Using 2-D Sign Video Image recognition Algorithm”, J. of Computer-Aided Civil & Infrastructure Eng., 21, 369-382.

Copyright ASCE 2011

Computing in Civil Engineering 2011

Downloaded 11 Aug 2011 to 128.173.204.147. Redistribution subject to ASCE license or copyright. Visit http://www.ascelib