Construction Research Congress 2012 © ASCE 2012
Automated Vision-based Recognition of Construction Worker Actions for Building Interior Construction Operations Using RGBD Cameras Víctor Escorcia1, María A. Dávila1, Mani Golparvar-Fard2 and Juan Carlos Niebles1 1
Electrical and Electronic Engineering Dept., Universidad del Norte, Barranquilla, Colombia, PH +57 (5) 350-9270; FAX (540) 231-7532; email: {escorciav, mdavilaa, njuan}@uninorte.edu.co
2
Assistant Professor, Vecellio Construction Eng. and Mgmt., Via Dept. of Civil and Env. Eng., and Myers-Lawson School of Construction, Virginia Tech, Blacksburg, VA; PH (540) 231-7255; FAX (540) 231-7532; email:
[email protected]
ABSTRACT In this paper we present a novel method for reliable recognition of construction workers and their actions using color and depth data from a Microsoft Kinect sensor. Our algorithm is based on machine learning techniques, in which meaningful visual features are extracted based on the estimated body pose of workers. We adopt a bag-of-poses representation for worker actions and combine it with powerful discriminative classifiers to achieve accurate action recognition. The discriminative framework is able to focus on the visual aspects that are distinctive and can detect and recognize actions from different workers. We train and test our algorithm by using 80 videos from four workers involved in five drywall related construction activities. These videos were all collected from drywall construction activities inside of an under construction dining hall facility. The proposed algorithm is further validated by recognizing the actions of a construction worker that was never seen before in the training dataset. Experimental results show that our method achieves an average precision of 85.28 percent. The results reflect the promise of the proposed method for automated assessment of craftsmen productivity, safety, and occupational health at indoor environments. INTRODUCTION Activity analysis, the continuous and detailed process of benchmarking, monitoring, and improving the amount of time craft workers spend on different construction activities can play an important role in improving construction productivity, safety, and occupational health. As a workface assessment tool, activity analysis examines the proportion of time workers spent on specific construction activities. Combination of detailed assessment and continuous improvement significantly differentiates activity analysis from work sampling and can provide recommendation for activity monitoring, improvements, and improvements applicability (Gouett 2011, CII 2010). In recent years, many companies have experienced the benefits of activity analysis and are now proactively working towards implementing it in their projects (ENR 2011). Despite the benefits of activity analysis, the accurate and detailed assessment of work in progress requires an observer for every construction activity, which can be prohibitively expensive. In addition, due to the variability on how construction tasks are carried out, or in the duration of each activity, it is often necessary to record several cycles of operations. Not only are traditional time-studies labor intensive, but
879
Construction Research Congress 2012 © ASCE 2012
also the significant amount of information that needs to be manually collected and analyzed can adversely affect the quality of the process. Given the significance of these observations, the state-of-the-art practice of activity sampling (CII 2010) suggests defining sampling populations methods such as Tour or Modified Crew Method. These methods require routes and times to be initially selected to make sure results are representative of the actual work performed. The length of these studies may vary based on the stage of a construction project, the timeframe of the study (e.g., when the work begins, before or after lunch), and the site layout and congestion of activities. This may require observers to take a few complete days to observe the entire site for each type of activity. Furthermore, techniques used in different studies are another controversial issue associated with cross examination of activity analysis results between different projects (Whiteside 2008). Thus, there is a need for a lowcost and reliable method which can systematically and automatically track construction workers and analyze their actions across all construction projects. Such a method can significantly minimize the challenges associated with sampling methods or manual observations. Over the past few years, several research studies (e.g., Brilakis et al. 2011, Yong et al. 2010) have proposed interactive vision methods for tracking project entities. These methods have the potential in addressing some of the needs for an inexpensive tracking mechanism. Nonetheless, these methods do not propose any solution for automated recognition of construction worker actions. Particularly, limited line of sight, static and dynamic occlusions, and varying illuminations can all significantly affect applicability of these approaches in unstructured and dynamic indoor construction environments. To address these limitations, this paper proposes and implements a novel method for reliable recognition of construction workers and their actions using an RGBD camera. The remaining sections are organized as follows. First, the states of knowledge in the areas of automated craft activity analysis, human action recognition and application of the Microsoft Kinect Sensor are briefly reviewed. Next, the research objectives and methodologies are presented. Finally, we discuss the experimental validations of the proposed method for benchmarking and monitoring of construction worker actions. RESEARCH BACKGROUND Current Research in Automated Craft Activity Analysis Prior studies have compared several existing 2D vision-based tracking algorithms on construction sites and identified several challenges when workforce interactions (Arif and Vela 2009). More recently a supervised tracking algorithm was developed which requires the user to manually identify the construction workers, and subsequently a machine-learning algorithm learns and tracks the target (Yang et al. 2011). As indicated by the authors, the proposed algorithm fails under sever changes in lighting condition and does not perform well with partial occlusions; conditions that are predominant on construction sites. Other methods such as Brilakis et al. (2011) also only focus on tracking construction entities and do not propose any solution for action recognition and real-time tracking of workers, which are the primary objectives of this paper. Peddi et al. (2009) has also proposed a blob-tracking algorithm to track productivity of workers by classifying their pose into categories such as effective, ineffective, and contributory. The assumption is that workers are
880
Construction Research Congress 2012 © ASCE 2012
always moving and each pose belongs to a certain type of action. The simple assignment of each static pose to an action can be a major limitation when it comes to activity analysis. There is a need for a new method that can simultaneously perform real-time tracking and action recognition of multiple interacting construction workers. Current Research in Worker Tracking and Action Recognition The problems of visually tracking human movements and recognizing their actions in video have been widely studied in the computer vision community. Here, we present a brief overview of related research works and refer the reader to more detailed literature surveys (Aggarwal and Ryoo 2011, Forsyth et al. 2005). There are two main approaches for human tracking in a video: First, considering humans as blobs and using fast generic object tracking methods (Yilmaz 2006). These approaches usually assume that the target object is a rigid rectangular blob and have difficulty dealing with the extreme articulations of the human body. Second, top-down models which explicitly incorporate prior information about the articulation and movements of the human body (Ramanan 2007). Most of these methods rely on manual initialization and can mainly work under controlled environments with good foreground/background separation. Learning and inference procedures for these methods are generally complex and computationally expensive, and as a result, their application is prohibitively slow. The outline methods focus on tracking people using regular RGB cameras. Nonetheless, there is limited research on tracking people with color+depth sensors (such as the Microsoft Kinect). Recently, a method for real-time estimation of human poses from single depth images was proposed by Shotton et al. (2011). This method avoids performing the difficult search for body parts in color images by operating directly on the depth data. Such framework can be extended to perform real-time tracking of people using the Microsoft Kinect sensor (OpenNI Organization 2011). Meanwhile, the problem of video-based human action recognition can be addressed from several perspectives (Aggarwal and Ryoo 2011). Popular approaches include extracting visual characteristics in the form of spatio-temporal volumes, local features obtained from spatio-temporal interest points, trajectories of tracked features points and pose-based features. These visual characteristics can then be used within recognition models to discover specific patterns that can discriminate human actions. In this work, we are particularly interested in creating a method which can classify construction worker actions using the certain characteristics from their body configurations. In this context, several researchers have proposed to represent actions as a sequence of human poses. Feng and Perona (2002) presented the use of short sequences of poses, or movelets, and modeled their sequential occurrence by a Hidden Markov Model. Weinland and Boyer (2008) proposed building a set of keyposes to represent actions, and used a simple generative model based on a Gaussian distribution for classification. On the other hand, Thurau and Hlaváč (2009) proposed representing human poses as a linear combination of pose primitives, and recognized human actions with a probabilistic classifier. These methods focus on classifying actions on color video data. Unfortunately, due to computational difficulty of estimating human pose directly from videos, the application of these methods for extensive action recognition tasks is not practical.
881
Construction Research Congress 2012 © ASCE 2012
882
Recently, Sung et al. (2011) presented a classification algorithm for recognizing human actions from color+depth (RGBD) videos. Their framework combines pose-based and local image features into a hierarchical Maximum Entropy Markov Model classifier. The complexity and high dimensionality of the classifier makes it difficult to learn from limited training data. We address this issue by introducing a simpler human action representation and combining a powerful discriminative classifier to achieve successful worker action recognition. OBJECTIVES AND SCOPE This paper proposes a novel method for real-time tracking and action recognition of multiple interacting construction workers using the Microsoft Kinect Sensor. The focus is on indoor operations, and in particular typical drywall construction operations. Our specific objectives are to create a comprehensive method to (1) track construction workers and their body skeleton in real-time; (2) track worker actions under various body postures and Kinect configuration; and (3) test and validate the proposed method in real-world settings. In the following, the methodology and validations to address these objectives are detailed. AUTOMATED WORKER TRACKING AND ACTION RECOGNITION Figure 1 depicts an overview of the proposed system. The first key component of our system is the use of a Kinect sensor for capturing color and depth information of the scene. This system avoids the difficult problem of recognizing actions from color image data only by performing real-time human pose estimation from depth images. The second important component is a new visual representation for worker actions. The key observation is that typical worker actions in a construction site can be characterized by a temporal set of poses that the worker executes. We encode this observation by adopting a powerful bag-of-poses representation of worker actions. This representation captures the occurrence of different pose-codewords within the action sequences. We exploit this powerful representation to build highly discriminative human action classifiers. Finally, experimental results show that the proposed algorithm is able to classify novel worker actions with a high-accuracy. RGBD Video
3D Pose Estimation
Generate pose-codebook (K-means Clustering)
Bag of Poses Histogram
Multiclass SVM action classifier
Figure 1: Overview of the proposed action representation and model learning
Automated Tracking of Workers and Their Body Skeletons The first goal of our system is to locate and track workers in the scene. For this purpose, we segment the workers from the background clutter and focus the activity recognition analysis on the relevant worker movements. Automatically tracking workers in color videos from unconstrained environments is still an unsolved problem. However, recent advancements in technology for human pose estimation from depth images plus the availability of inexpensive color+depth sensors have enabled wide applications for real-time tracking and estimation of human body configuration algorithms. In this work, we adopt the algorithms for human pose estimation proposed by Shotton et al. (2011) in conjunction with a Microsoft Kinect sensor and apply them to the context of tracking and estimating worker poses. In practice, we use available implementations as provided by the OpenNI framework
Construction Research Congress 2012 © ASCE 2012
and d NITE libraaries (OpenN NI Organizattion 2011). A As a result, a rich amouunt of visual vid deo data is captured frrom the sceene: color im mages, deppth data, and real-time traccking and estimation of worker w body y poses (Figuure 2). Once eaach worker in the scenee is tracked and his/herr pose is esttimated, the nov vel algorithm m can focus on recognizzing which action is beeing perform med by each sub bject. By usiing tracking as an interm mediate outpput, the folloowing stage can ignore irreelevant visu ual data thaat could ottherwise ovverwhelm/diistract the recognition algo orithm. A side benefit iss that the am mount of dataa to be furthher processedd is reduced sign nificantly, which w translaates in shorteer computatioonal processsing times.
Figure 2: The Microsoft M Kin nect sensor captures colo r video and d depth data, w which can be used u to track k workers an nd estimate th heir body po se.
Automated Worker Actio on Recognitiion p the idea of using the t sequencee of body pooses to represent worker In this paper, pted and ex xtended (Thu urau and Hllaváč 2009)). We arguee that poses actiions is adop enccode sufficieent informatiion about thee action beinng executed and show thhat adopting a visual v representation of actions baseed on posess leads to suuccessfully rrecognizing worrker activitiees on a construction site. The propossed algorithm m representss actions by anaalyzing the pose p of the worker w in eaach frame annd counting the occurrennce of each posse within thee sequence. For F examplee, a “walkingg” sequence would be coomposed by man ny poses witth legs form ming a scissorr shape, but very few pooses with staanding legs. Succh occurrencce pattern can c be capttured by a histogram, where the entries that corrrespond to walk-like w po oses would have h high coounts while oother poses w would have low w-counts. Th he representaation of actio ons in form of a histogrram of posess is denoted as bag-of-poses b s. This has several s advaantages: it iss straightforw ward, can bee computed quickly and cap ptures the ov verall pose sttatistics that occur withinn an action. Given a video sequence s th hat containns an actiion, the bag-of-poses omputing a histogram of the posses that apppear in the representation requires co quence. How wever, the sp pace of all possible p posses is large and continuuous, which seq imp poses an imp portant challlenge; the need n for deffining a finite set of reppresentative posses, denoted d as pose co odebook. Th his effectiveely quantizees the pose space and enaables the co omputation of o the propo osed bag-off-poses histoogram. We denote the entrries in the pose p codebo ook as posee codewordss, which shoould be com mprehensive eno ough to reprresent comm monly occurring body poses. Oncce a pose ccodebook is deffined, the baag-of-poses representatio r on is compuuted as a hisstogram thatt counts the occcurrence of each e pose codeword with hin the inputt sequence. The pose codebook k proposal extends thee idea of poose exemplaars recently presented by Weinland W an nd Boyer (20 008). Their method bassed on posee exemplars focused on 2D poses and iss limited to action a recognnition in collor videos. N Nonetheless, ourr tracking an nd pose estim mation stagee provides deetailed full 33D pose estiimation and as a result the method for 2D poses prroposed in W Weinland annd Boyer (20008) is not
883
Construction Research Congress 2012 © ASCE 2012
direectly applicaable. In the following, we w discuss tthe details oof the proposed method for building a pose p codeboo ok from the training t dataa. Measuring Posse Similarity The bag g-of-poses representatio r on requires ccomputing a histogram that counts d within thee sequence. In order to count such the occurrence of each pose codeword occcurrence, wee need to assign each pose in the iinput sequennce to the m most similar posse codeword d. However,, measuring the similarrity betweenn full 3D pposes is not straaightforward d. Hence the 3D posess are repressented by thhe location off fifteeen body joiints (Figure 3) with resp pect to the caamera coorddinate system m. Müller et al. (2005) argu ued that co omparing po oses directlyy in the joiint location space can pro oduce undesiired results: semantically y similar mootions which may not neccessarily be num merically aliike. We addrress this issu ue by definiing a transfoormation thaat compares posses in a featu ure space. The T proposed d feature spaace is invariaant to anthroopomorphic diffferences bettween workeers and theirr relative loocations withh respect to the Kinect sen nsor. Such in nvariance is achieved a by y defining feaatures basedd on the anglles between bod dy parts and d articulatio ons. These features aree an extendded subset oof the pose feattures propossed by (Chen ng et al. 2010 0) for retrievval of motionn capture daata. In summ mary, each 3D 3 pose is transformed bby computinng the featurees , as depicted d in Figure F 3. Fu urthermore, the similaritty between ttwo poses iss defined as the inverse of th he Euclidean n distance beetween theirr correspondiing feature ddescriptors.
Figure 3: Th hirteen pose features f for computing c siimilarities beetween work ker poses.
Posse Codebookk A key piece p of the proposed p bag g-of-poses reepresentation is the posee codebook. Wee need to bu uild a set of o the repreesentative pooses that occcur commoonly in the con nstruction acctions of inteerest. Instead d of manual ly building tthis set, we use a datadrivven approacch, which leverages l alll the posess obtained from the image/video traiining datasett. In practicee, the K-meaans clusterinng algorithm m was used tto group all posses in the training set in nto K group ps of similarr poses. Thee metric deffined in the previous section is further used to meaasure the disstance betweeen differentt poses. The resu ulting K gro oup centers automaticall a ly form the selected posse codewordds. Figure 4 sho ows a visualiization of thee top eight pose p codeworrds in the coodebook.
Figu ure 4: Ourr algorithm automaticallly computees a pose ccodebook th hat contains representative body b poses. The T top eightt pose show tthe codeword ds in the posee codebook.
884
Construction Research Congress 2012 © ASCE 2012
Bag g of Poses Representatio on Given an a input videeo sequence and the posse codebookk, the input sequence is represented by a histogram m of pose cod dewords. Noote that to m make this reppresentation invariant to thee length of th he sequence, the resultinng histogram m is normaliized, so that e is equal e to 1. Figure F 5 shoows an exam mple of the bbag of poses the sum of its elements representation for f the walkiing action cllass.
Figure 5: Bag-o of-poses reprresentation of o an input “w walking” vid deo sequencee. The input videeo is represented by a hisstogram of poses in the p ose codebook k.
Lea arning and Recognition R of o Worker Actions The pro oposed visuaal representaation for acction classifiication is leeveraged by traiining a discrriminative model m for eaach action ccategory. Paarticularly, a non-linear binary Supportt Vector Maachine (SVM M) classifierr with keernels is inddependently traiined for each h action cateegory. SVM classifiers aare discriminnative binaryy classifiers thatt optimize a decision boundary betw ween two claasses. Note thhat the choicce of a nonlineear kernel is due to its suitabilitty for classi fication of hhistograms ((Maji et al. 200 08). In orderr to extend th he binary classification decision of each SVM classifier to multiple classess, the one-veersus-all mu ulticlass classsification schheme is adopted. When traiining the SV VM classifier that corrresponds to each actionn class, we set all the exaamples from m that class as positive and the exxamples froom all otherr classes as neg gatives. The result of thee training pro ocess is one bbinary SVM M classifier pper action off inteerest. Given a novel tessting video, we apply aall binary classifiers andd select the actiion class corrresponding to the classifier with higghest score. EX XPERIMEN NTS AND DIISCUSSION N ON RESU ULTS In orderr to validatee the propossed method, we focus oon drywall cconstruction actiivities that are a typical in i many buiilding constrruction projeects. Drywaall activities incllude 1) Instaalling Metal Stud, 2) Han nging Dryw wall; 3) Applyying Tape; 44) Installing Insu ulations; 5) Layout ou ut Walls; 6) Framing O Openings; 77) Framing Wall-Beam Inteersections; and a 8) Applying Fire Caulking. C Inn order to ttest our proototype, we dev velop a set of new actions, which are visuallyy distinct annd can suppport drywall actiivity analysiis. The hardw ware setup an nd data colleection proceess are as follows: Hardware Settup – In this study, thee sensing deevice is a M Microsoft Kinnect sensor, GB camera coupled wiith an infrarred camera for inferringg depths. It whiich is an RG outputs an RGB B image at a frame rate of o 30Hz togeether with alligned depthhs generated by the infrared d camera at each pixel. This producces a 640×4480 depth im mage with a practical appliccation range of 1.0 to 3.5m. In our eexperiments, the sensor was placed
885
Construction Research Congress 2012 © ASCE 2012
on a camera tripod t and was fixed during the exp periment. Fig gure 6 showss the hardwaare setup. Exp periment Data D Collecction – To prove the con ncept and th he method presented p in this paper, dataa was colleected from multiple m sin ngle-worker actiions. Particu ularly, the dataset was collected c in a controlled c setting s for five action ns that are visu ually distincct and can help h determ mine typical dry ywall constru uction activiities. Duratiion of each actiion varied frrom 5 to 15 seconds. One O of these Figure 66: Hardwaree setup actiions was prredetermined d to be idlee. Here, wee deffine idle to be b any non-rrelated activiity, which m may include short distance walking, talk king on the phone, p turnin ng around, stretching s boody, or pointting to an obbject on the sitee. The variaation in visu ual form of these actionns, their durration and tthe level off randomness thaat is fused in nto the data collection ffurther support the practticability off the proposed method. Fo our differentt people wiith construcction experttise but no kno owledge abo out the modeel or the algo orithms usedd were incluuded in the eexperiment. Thiis helped no ot biasing the t experim ment toward better preccision or reccall on the outcome. Basicc instruction ns on how to o perform thhe activities such as ham mmering or cau ulking were provided. Each E action was recordded four tim mes for eachh individual parrticipating in n the experim ment; assemb bling the totaal number of videos to 880. Figure 7 sho ows an exam mple from ourr training daataset.
Fig gure 7: Ourr dataset co ontains 5 drrywall consstruction acctions perfoormed by 4 actors multiplee times for a total of 80 0 sequences.. Exp perimental Setting – The proposed d method is vvalidated byy testing the accuracy off the worker action classificaation. We ad dopt the leavve-one-personn-out settingg, where we set videos from m one subjectt as testing examples e andd use videos from other subjects for traiining. We repeat this process p for each subjecct in the daataset and coompute the aveerage precisiions for evaaluation purp poses. Note that during the trainingg stage, our algo orithm doess not have access a to an ny videos frrom the worker in the testing set. Theerefore, our experimentss measure ho ow well cann the proposeed algorithm m generalize to new n and unseen subjectss and recogniize their actiions. Ressults – Thee classification accuracy y of the prooposed methhod is testedd using the pro ocedure detaailed above. A pose codebook c oof size is useed in these exp periments wh hile the SVM M parameteer is set to 0.7. The m multi-class cllassification resu ults are sum mmarized with the confu usion matrixx shown in F Figure 8.a. T The overall acccuracy of ourr system wh hen recogniziing five dryw wall activities is 76.25% %. Note that a raandom classiifier would only o achievee an average accuracy off 20%.
886
Construction Research Congress 2012 © ASCE 2012
The pro oposed mod del achieves good perfo rmance for the classificcation task. Wee note that th he most challenging action categoryy is “idle” aas reflected bby the large con nfusion in th he confusion matrix (Fig gure 8a). Thiis challenge is due to thee extremely larg ge diversity of poses peerformed in the “idle” aaction when compared tto the other actiion classes. We W also anaalyzed the peerformance oof the propoosed approacch by taking out the idle actiion class wh hile keeping the other foour action claasses. In thiss setting we set the parameeters and . As a resuult, the moddel achieves an overall h the limitedd amount of available traaining data, acccuracy of 84.38% (Figurre 8.b). With ourr model is caapable of cap pturing the diversity d of pposes and mootions withinn these four actiions. A posssible directio on to improv ve the perforrmance of ouur system foor the “idle” actiion category y is to increase the number and varietty of examples in the traaining set. Additionally, the peerformance of each binaary action cclassifier is iindividually valiidated in terms of the Av verage Preciision (AP) ( See Figure 88.c). The aveerage of the classsifiers train ned with 5 actions a give us a mean A AP of 85.288% while thee classifiers traiined with 4 actions a obtaiin a mean AP P of 91.04% %.
(c) (a) (b) Fig gure 8: (a) Recognition R n accuracy summarized s d by a conffusion matrrix for a 5classs action mo odel. (b) Co onfusion ma atrix for a 44-class actioon model, prroduced by tak king out the “idle” action category y. (c) Perforrmance evaaluation for the binary classsifiers in teerms of Aveerage Precission. CO ONCLUSIONS AND FU UTURE WO ORK In this paper, p we fo ocused on thee problem o f detecting aand recognizzing actions of construction c n workers in indoor dynamic and clluttered consstruction envvironments. Wee used an in nexpensive color+depth c sensor (Miccrosoft Kineect) to collecct the input dataa. The choice on a low w cost senso or enables w wide applicaability of thhe proposed metthod on all construction c sites. The prreliminary eexperimentall results show w that up to six construction n workers caan be simultaaneously traacked in reall-time. It alsoo shows the omise of thee proposed action reco ognition meethod for aautomated pperformance pro asseessment in cluttered c and d dynamic in ndoor construuction enviroonments. Future work w includees compiling g a larger dattaset of simiilar construction worker actiions with diffferent visuaal forms (e.g., finishing cconcrete witth a trowel vvs. machine) and d under variious degrees of occlussion. There is also a nneed to inveestigate the posssibility of detecting multiple m actiion classes within the same RGB BD videos. Com mputing crafftsmen prod ductivity, occcupational heealth, and saafety using tthe outcome of the t proposed d method will w be also investigated i and results will be preesented in a neaar future.
887
Construction Research Congress 2012 © ASCE 2012
REFERENCES Aggarwal, J. K. and Ryoo, M. S. (2011). “Human activity analysis: A review”, ACM Comput. Surv. 43(3). Arif, O., and Vela, P. (2009) “Kernel covariance image region description for object tracking.” Proc., IEEE Int. Conf. on Image Processing. Brilakis, I., Park, M.W. and Jog, G. (2011). “Automated Vision Tracking of Project Related Entities.” Elsevier J. of Adv. Eng. Informatics, 25(4), 713-724. CII (2010). “Guide to activity analysis.” Construction Industry Institute’s Implementation Resource 252-2a, Austin, TX, 1-76. Chen, C., Zhuang, Y., Nie, F., Yang, Y., Wu, F. and Xiao, J. (2010) "Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor." IEEE Transactions on Visualization and Computer Graphics, 1676-1689. ENR (2011) “Don’t blame the workers.” Engineering News-Record, Bruce Buckley (Jun. 1, 2011). Feng, X. and Perona, P. (2002). “Human Action Recognition By Sequence of Movelet Codewords”. 3DPVT, 717-723. Forsyth, D. A., Arikan, O., Ikemoto, L., O’Brien, J. and Ramanan D. (2005). “Computational studies of human motion: Part 1, tracking and motion synthesis”. Foundations and Trends in Computer Graphics and Vision. Gouett, M., Haas, C., Goodrum P., and Caldas, C. (2011). “Activity analysis for directwork rate improvement in construction.” ASCE J. of Constr. Eng. Mgmt. Maji, S., Berg, A. and Malik J. (2008). “Classification using Intersection Kernel Support Vector Machines is Efficient.” Proc. IEEE Conf. CVPR, 1-8. Müller, M., Röder, T., and Clausen, M. (2005). "Efficient content-based retrieval of motion capture data." Proceedings of ACM SIGGRAPH, 677-685. OpenNI Organization. (2010). “OpenNI Documentation.” OpenNI website http://75.98.78.94/ (2011). Peddi, A., Huan, L., Bai, Y. and Kim, S. (2009). “Development of human pose analyzing algorithms for the determination of construction productivity in real-time.” Proc.,Construction Research Congress, Seattle, WA, 11–20. Ramanan, D., Forsyth, D. A. and Zisserman, A. (2007). “Tracking people by learning their appearance.” IEEE Trans. Pattern Anal. Machine Intell., 29(1):65–81. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T.. Finocchio, M., Moore, R., Kipman, A. and Blake, A. (2011). “Real-time human pose recognition in parts from single depth images.” Proc. IEEE Conf. CVPR, 1297-1304. Sung, J., Ponce, C., Selman, B. and Saxena, A. (2011) “Human Activity Detection from RGBD Images.” AAAI workshop on Pattern, Activity and Intent Recognition. Thurau, C. and Hlaváč, V. (2009) "Recognizing Human Actions by Their Pose." Statistical and Geometrical Approaches to Visual Motion Analysis, 169-162. Weiland, D. and Boyer, E. (2008). “Action recognition using exemplar-based embedding.” Proc. IEEE Conf. CVPR, 23-28. Yang, J., Arif, O., Vela, P.A., Teizer, J., and Shi, Z. (2010). “Tracking Multiple Workers on Construction Sites using Video Cameras.” J. of Adv. Eng. Informatics, 24(4). Yilmaz, A., Javed, O. and Shah, M. (2006). “Object tracking: A survey”. ACM Comput. Surv., 38(4):13. Whiteside J. D. (2006). “Construction Productivity.” Proc., 2006 AACE International Transactions, EST.08, 1-8.
888