Downloaded from ascelibrary.org by University of Michigan on 03/28/16. Copyright ASCE. For personal use only; all rights reserved.
Automated Information Retrieval for Hazard Identification in Construction Site H. Kim1, H. S. Lee2, M. Park3, B. Choi4 1
Ph.D. Student, Dept. of Architecture and Architectural Engineering, Seoul National Univ., 1 Gwanak-ro, Gwanak-gu, Seoul 151-742, Republic of Korea. PH +82-2-8808311; FAX +82-2-887-8923; email:
[email protected] 2 Prof., Dept. of Architecture and Architectural Engineering, Seoul National Univ., 1 Gwanak-ro, Gwanak-gu, Seoul 151-742, Republic of Korea. PH +82-2-880-8311; FAX +82-2-887-8923; email:
[email protected] 3 Prof., Dept. of Architecture and Architectural Engineering, Seoul National Univ., 1 Gwanak-ro, Gwanak-gu, Seoul 151-742, Republic of Korea. PH +82-2-880-8311; FAX +82-2-887-8923; email:
[email protected] 4 M.S Course Student, Dept. of Architecture and Architectural Engineering, Seoul National Univ., 1 Gwanak-ro, Gwanak-gu, Seoul 151-742, Republic of Korea. PH +82-2-880-8311; FAX +82-2-887-8923; email:
[email protected] ABSTRACT The repetitive occurrence of similar accident in construction disasters is one of the prevalent features. Similar accident cases provide direct information for determining the risk of scheduled activities and planning safety countermeasure. Researchers have developed many systems in order to retrieve and use past accident cases. Although the developed systems have a clear and limited target, most of them were developed under a retrieval methods based on ad-hoc systems which can cause inconvenience for users in using the retrieval system. To overcome these limitations, this study proposes an automated information retrieval system that can search for and provide similar accident cases. The retrieval system extracts building information modeling objects and composes a query set by combining BIM objects with a project management information system. Based on the results of this study, the users can excessively reduce query generation. Furthermore, they can easily avoid risks by receiving similar past accident cases that can happen while they work. INTRODUCTION Despite the enormous efforts made in regards to safety, the construction industry has had a poor record of preventing accidents. One of the noticeable characteristics of construction accidents is repetitive occurrences (Abudayyeh et al. 2003). Accident cases are the strongest stimuli in safety management and raising laborer’s awareness for safety. This is due to the knowledge from past accidents being directly related to the prevention of future accidents and laborers’ raised safety awareness (Lindberg et al. 2010).
897
Computing in Civil Engineering (2013)
Downloaded from ascelibrary.org by University of Michigan on 03/28/16. Copyright ASCE. For personal use only; all rights reserved.
898
COMPUTING IN CIVIL ENGINEERING
Information retrieval technology can be used to improve the efficiency and quality of safety management. Some approaches, such as case-based reasoning (CBR) and information retrieval systems based on accident cases, have attempted to put accident cases for practical use (Ye 1998). Despite the contributions of current systems to improve safety management, laborers have not adopted these systems because there might be avoidance and inconveniences in performing a large amount of queries to search for the desired information. Therefore, this study suggests an information retrieval system that can overcome these limitations. The suggested system can automatically search for similar past accident cases. Based on the push system concept, the suggested system provides past accident cases to the laborers who are similarly related to the case. BACKGROUND User and Usability. Many researchers use safety information by relating construction activities or retrieving accident cases applied to safety planning. Several research studies about safety information systems have been performed at the national level. They provide the information on the related past accident cases when a user inputs queries. However, generating a query every time for a user to retrieve information makes the current systems inconvenient. Since laborers are overconfident in their skills and experiences, they usually do not recognize the necessity of using past accident cases. This issue can be solved by the push system. The push system can be defined as automatically providing the right knowledge to the right person (Meso and Smith 2000). To provide similar accident cases to a laborer by using a push system, it is required to have the information about where an activity is performed and who is related to the activity. Query generation and providing results. Generally, information retrieval can be achieved by using either Ad hoc systems or filtering systems (Jukna 2012). Ad hoc systems input queries whenever a user requires new information. Filtering systems store user interests in the user profile, and its main objective is to select newly generated information and delete useless information. Also, a routing system— which is one of the information filtering system—automatically pushes search results to users. If users want to identify countermeasures by retrieving accident cases, current safety retrieval systems using the Ad hoc model should input as many accurate queries as possible with the number of risk factors. However, with the routing model, users do not need to generate queries every time they retrieve. This model retrieves information whenever there is a change in the user profile or the DB. Also, it can push results to the related users. Retrieval Method. Most of safety information retrieval systems have been developed using the Boolean model to compare queries and the DB, and to extract related cases. The Boolean model is processed based on whether or not the documents contain the query terms. Although the Boolean model is simple and clear, too many or too few results can be retrieved by its matching rule (Yves and Hammer 2010).
Computing in Civil Engineering (2013)
Downloaded from ascelibrary.org by University of Michigan on 03/28/16. Copyright ASCE. For personal use only; all rights reserved.
COMPUTING IN CIVIL ENGINEERING
899
Another problem is that most researchers set the weight of each index equally. This leads to a decrease in the effectiveness and usefulness of retrieval results (Clote and Kranakis 2002). Lin and Soibelman (2009) suggested an extended Boolean model that is a hybrid retrieval type of a vector model which can overcome the limitation of the original Boolean model. To achieve system improvement, this study suggests an information retrieval system that combines the routing model and the extended Boolean model. If these two models are combined, the system can automatically provide retrieval results including their ranks. INFORMATION RETRIEVAL SYSTEM Indices for information retrieval. In the previous studies (Lee et al. 2012a; Choudhry et al. 2009), various risk factors are extracted and evaluated. This study adopted the ideas of hazard identification and risk assessment performed in Lee et al.’s study. Therefore, this study adopts ten factors extracted from the study completed by Lee et al. (2012a). The extraction process consists of two steps. The first step is to collect various risk influence factors from research studies and disaster investigation on the code standard. By reviewing related studies, 27 factors were finally selected. The second step is to select factors with much more influence than others by reviewing the result from the survey conducted with 42 qualified safety managers. Lee et al. (2012a) suggested influence factors affecting the occurrence of an accident and ten factors were selected by experts in safety management experience. Although all the factors are used as indices for composing queries, not all of them are used in every phase of safety planning. Safety management plans are usually organized into three steps: preliminary plan, monthly plan, and daily plan. The uncertainty of construction projects makes it necessary to acquire information at each safety planning phase. Therefore, classifying factors is required to make the fittest queries for each phase. The committee—which is composed of thirteen members, each with more than ten years of experience—had presented their opinions to determine the classification of the ten factors. Also, determining the weights of indices is necessary to improve the effectiveness of the similarity measurement (Kolodner 1992). To weigh the indices, the analytic hierarchy process (AHP) developed by Saaty (1980) was used. The questionnaire for calculating weights follows that of Lee et al. (2012a)’s. It was sent to fifty experts, each of whom had worked more than ten years as a safety manager. The results of the surveys and AHP analysis are shown in Table 1. To judge the degree of congruency between each index, it is important to determine the proper data format (Ye 1998). Depending on the type of data, the similarity index (SI) calculation method varies. If data follows a numeric format, the distance between two values that correspond to the construction site condition value and the accident case is calculated using Formula 1: A-B SI = 1 (1) Α
Computing in Civil Engineering (2013)
Downloaded from ascelibrary.org by University of Michigan on 03/28/16. Copyright ASCE. For personal use only; all rights reserved.
900
COMPUTING IN CIVIL ENGINEERING
where A = condition value of a current site; and B = condition value of a past accident case. Table 1. Classification and weight indices by safety planning phase. Influence Factors Preliminary Monthly Daily Work process rate 0.241 0.167 0.099 Cost of construction 0.175 0.131 0.077 Work type 0.334 0.238 0.125 Building type 0.250 0.173 0.096 Occupation type 0.181 0.100 Date 0.109 0.059 Age 0.069 Workdays on current site 0.118 Safety training 0.166 Temperature 0.089 No. of distributed surveys 50 50 50 No. of collected surveys 43 43 43 No. of collected surveys with a 43 31 23 consistency index below 0.1 When the type of data follows a string format, SI is 1 if the condition value of a current site and the condition value of a past accident case are identical and 0 if they are completely different. After calculating the similarity index of each influence factor, the similarity score (SS) was determined. This value can be expressed as the sum of the multiplication of the similarity index (SI) and weight (M), as seen in Formula 2: n
ss = ∑ M i SI j
(2)
i =1
Data source for system development. Appropriate data source for information retrieval system is one of the most important aspects to develop the system. The structure of the data source in this study incorporates the benefits of PMIS and commercial BIM programs. Lee et al. (2012b) demonstrated the possibility of combining and linking information between PMIS and influence factors. Navon and Kolton (2006) suggested a model combining schedule and safety information based on AutoCAD or BIM. Based on these methods, the DB is established on the outside of a BIM program, and the values of influence factors and accident cases will be saved on to the DB. Using the extraction function of BIM, geometric values are extracted and connected with the DB. Through this kind of system composition, users can save, handle, and use information as much as they want. Also, property values can be used without decreasing the efficiency of commercial BIM programs. Information retrieval system framework. The retrieval system framework suggested in this study is shown in Figure 1. The system consists of input, data processing, and output. The input has information about factors extracted from indices for the information retrieval section, weight of each factor, and accident case
Computing in Civil Engineering (2013)
Downloaded from ascelibrary.org by University of Michigan on 03/28/16. Copyright ASCE. For personal use only; all rights reserved.
COMPUTING IN CIVIL ENGINEERING
901
DB. The weight of each factor’s DB has weight values acquired through AHP analysis. These values are used as indices of weight in an information retrieval module. The accident case DB is utilized to search similar accident cases by comparing queries based on construction site conditions.
Figure 1. Retrieval system framework. Data processing comprises the BIM drawings module, extraction module, and information retrieval module. The BIM drawings module forms 3D oriented parametric models. It has the geometric property values of objects, and provides objects that include additional property values. The extraction module extracts objects from BIM drawings. Finally, extracted objects and information corresponding with influence factors are combined. Information processed in the extraction module is sent to the information retrieval module and is used to find similar accident cases. The most relevant case for each labor can be found by this type of query generation. PMIS including information related to laborers is combined with BIM objects. This combination makes a query set. In this process, property values of a labor category generate a suitable query for each laborer. For example, the crews that work on the same activity have the same values of work and work condition. However, an individual labor’s property values vary. These property values are incorporated to the generating query process. The generated queries include not only characteristics of work and work condition but also characteristics of individual labor. The information retrieval module determines the type of retrieval method by the number and degree of BIM property values. After that, AHP result values are loaded by the type of retrieval method. Also, queries are generated and converted to internal representations. Based on query sets by the safety planning phase, the system performs retrieval of similar past accident cases. Similarity indices are calculated by comparing accident cases and internal representations based on construction site values. The output represents similar past accident cases retrieved by query sets of each safety planning phase. The similar accident cases classified by the number of inputted factors can be found. Each case provides the coordination and remaining
Computing in Civil Engineering (2013)
COMPUTING IN CIVIL ENGINEERING
Downloaded from ascelibrary.org by University of Michigan on 03/28/16. Copyright ASCE. For personal use only; all rights reserved.
902
period of risk factors by combining objects extracted from BIM drawings. Retrieved past accident cases are extracted by the property values connected with a BIM object. The object connected with retrieved past accident cases includes information consisting of labor work type, work start time, work finish time, coordinates of work area, and etc. as property values. The suggested system searches for the laborers whose work type or work conditions correspond to the retrieved past accident cases. Then, these cases will be provided to related labors. The BIM object including geometric information and additional properties has an identifier. Thus, the labor ID in additional properties can be linked with an object’s identifier. The identifier is used to search a labor / laborers who has/have relevance to the object. The system provides retrieval results to related laborers. APPLICATIONS The sample case is an apartment building construction project located in Seoul, Korea. This construction project had been operated from July 2009 to April 2012. The project consists of 4 buildings with 24 stories and 340 households. The suggested system had been used for this project in 2 months (February to March 2011). The results of information retrieval are shown in Figure 2. Safety managers and laborers were provided with similar past accident cases sorted by similarity scores. Query was generated by the combination of influence factors. Based on the query set, the retrieval process was performed and the results presented the similarity of each past accident case, as well as the related laborers’ information. The cases included the similarity score, the related laborers, coordinates of the risk area extracted from BIM objects, and the duration of risk factors.
Figure 2. Example of the information retrieval results. The suggested system was validated by comparison the retrieval performance of the system with the established accident case retrieval system of KOSHA. Precision and recall are the two most frequent and basic measurement indices for information retrieval effectiveness (Manning et al. 2008). The definitions of precision and recall are as follows:
Computing in Civil Engineering (2013)
COMPUTING IN CIVIL ENGINEERING
903
# (relevant items retrieved ) = P(relevant / retrieved ) (3) # (retrieved items ) # (relevant items retrieved ) = P(retrieved / relevnat ) (4) Recall = # (relevant items )
Downloaded from ascelibrary.org by University of Michigan on 03/28/16. Copyright ASCE. For personal use only; all rights reserved.
Precision =
Generally, text retrieval conference (TREC) collections have been used to evaluate the performance of an information retrieval system (Zobel et al. 1996). However, it is difficult for the suggested system to directly apply the text collections of TREC because the DB is structured based on past accident cases. Therefore, the precision and recall of this study were evaluated through the comparison of KOSHA’s past fatal accident case retrieval system. One hundred past accident cases were extracted from KOSHA’s fatal construction accident DB by using the random extraction function. The calculations of precision and recall were performed by reflecting safety steps: preliminary plan, monthly plan, and daily plan. The number of indices of each safety plan is different. Precision and recall were estimated by using Formulas 3 and 4, and the results are presented as Table 2. Examining the results of measurements, the values of precision and recall were higher than those of KOSHA’s retrieval system. This indicates that the suggested system has the potential to retrieve more related information, and the retrieved cases can be more likely to be used. Moreover, the retrieval performance of the suggested system can be considered higher than that of the currently used system. Table 2. Retrieval effectiveness comparison results. Safety planning phase Preliminary plan Monthly plan Daily plan Number of indices 4 6 10 Search system P R P R P R KOSHA system 0.72 0.61 0.63 0.41 0.84 0.34 Suggested system 0.68 0.82 0.79 0.75 0.92 0.61 CONCLUSION This study proposes an accident case retrieval system that can automatically generate queries based on construction site conditions. The results include time and geometric information, as well. Also, they are automatically provided to the laborers by using the push system. To develop the suggested information retrieval system, BIM, PMIS, AHP result DB, and the accident case DB were structured. Then, information retrieval algorithms were defined. Finally, the push system was established to provide retrieved accident cases to the related laborers. The results of the suggested system include similar cases happened in the past, related works, and the coordination of similar cases and work time in a current construction site. These results can help safety managers prepare safety countermeasures in the safety planning steps and also help raise laborers’ attention to safety. The automatic retrieval of accident cases in the system increases the efficiency of identifying risk factors. Despite these advantages, this study has some limitations. If there is no abundant data source such as PMIS, enormous efforts are required to input data. Also,
Computing in Civil Engineering (2013)
Downloaded from ascelibrary.org by University of Michigan on 03/28/16. Copyright ASCE. For personal use only; all rights reserved.
904
COMPUTING IN CIVIL ENGINEERING
accumulated accident cases may not be sufficient for application to safety management. Interworking between 3D models and the coordination of the retrieval result is needed to visualize the risk area or factors. Future research will eventually remedy these limitations and help improve safety management. ACKNOWLEDGEMENT This research was supported by a grant (code #09 R&D A01) from Super-Tall Building R&D Project funded by the Ministry of Land, Transport and Marin-time Affairs of Korean government. REFERENCES Abudayyeh, O., Federricks. T., Palmquist, M., and Torres, H. (2003) “Analysis of Occupational Injuries and Fatalities in Electrical Contracting.” J. Constr. Eng. Manage., 129(2), 152-158. Clote, P. and Kranakis, E. (2002). Boolean Functions and Computation Models, Springer, New York, USA. Jukna, S. (2012). Boolean Function Complexity: Advances and Frontiers. Springer, New York, USA. Kolodner, J. L. (1992). “An introduction to case-based reasoning.” Artificial Intelligence Review, 6(1), 3-34. Lee, H., Kim, H., Park, M., Teo, E. A. L., and Lee, K. (2012a). “Construction risk assessment using site influence factors.” J. Comput. Civ. Eng., 26(3), 319330. Lee, H., Lee, K., Park, M., Baek, Y., and Lee, S. (2012b). “RFID-Based Real-Time Locating System for Construction Safety Management.” J. Comput. Civ. Eng., 26(3), 366-377. Lin, K. and Soibelman, L. (2009). "Incorporating domain knowledge and information retrieval techniques to develop and architectural/engineering/construction online product search." J. Comput. Civ. Eng., 23(4), 201-210. Manning, C. D., Raghavan, P. and Schutze, H. (2008). Introduction to information retrieval, Cambridge University Press, New York, USA Meso, P. and Smith, R. (2000). "A resource-based view of organizational knowledge management systems." Journal of Knowledge Management, 4(3), 224-234. Navon, R., and Kolton, O. (2006). “Model for Automated Monitoring of Fall Hazards in Building Construction.” J. Constr. Eng. Manage., 132(7), 733-740. Satty, T. L. (1980). The analytic hierarchy process, McGraw-Hill, New York. USA. Ye, T. (1998). “Reasoning Model of the Case-Based Construction Safety Management System.” Master Course. dissertation, Seoul National Univ., Seoul, Korea. Yves, C. and Hammer, P. L. (2010). Boolean Models and Methods in Mathematics, Computer Science, and Engineering, Cambridge University Press, London. Zobel, J., Moffat, A, and Ramamohanarao, K. (1996). “Guidelines for presentation and comparison of indexing techniques.” ACM SIGMOD Record, 25(3), 1015.
Computing in Civil Engineering (2013)