Available online at www.sciencedirect.com
ScienceDirect Procedia Engineering 122 (2015) 57 – 64
Operational Research in Sustainable Development and Civil Engineering - meeting of EURO working group and 15th German-Lithuanian-Polish colloquium (ORSDCE 2015)
The Case-Based Reasoning Model Of Cost Estimation At The Preliminary Stage Of A Construction Project Krzysztof Zimaa * a
Cracow University of Technology, Warszawska 24,Kraków 31-155, Poland
Abstract The paper presents calculation of a unit price of construction elements or works with the use of the case-based reasoning (CBR) method. The primary source of knowledge in the CBR method is a collection of cases encountered in previous problems and retained in memory. The solution to a completely new problem is generated by finding the most suitable cases and adjusting them to the new case. A method of a unit price calculation will be shown which uses the prices found in the accessible set of previously occurring cases, i.e. it uses the prices listed in estimations presented in offer bids. The CBR systems consist of data bases in which the cases are stored along with their solutions. So the paper presents a concept of a knowledge base supporting the process of cost estimation at the preliminary stage of a construction project. © Published by by Elsevier Ltd. Ltd. This is an open access article under the CC BY-NC-ND license © 2015 2015The TheAuthors. Authors. Published Elsevier (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of the Operational Research in Sustainable Development and Civil Peer-review under responsibility of the organizing committee of the Operational Research in Sustainable Development and Civil Engineering - meeting of EURO working group and 15th German-Lithuanian-Polish colloquium. Engineering - meeting of EURO working group and 15th German-Lithuanian-Polish colloquium Keywords: Case Based Reasoning; construction costs; cost estimation.
1. Introduction Knowledge acquisition is related to machine learning. It consists in improving a knowledge base by formulation of new notions, examination of data patterns, acquisition of new notions with the use of examples and analogies and elimination of redundancy.
* Krzysztof Zima. Tel.:+48 12 628 3090; fax:+48 12 628 2329. E-mail address:
[email protected]
1877-7058 © 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of the Operational Research in Sustainable Development and Civil Engineering - meeting of EURO working group and 15th German-Lithuanian-Polish colloquium
doi:10.1016/j.proeng.2015.10.007
58
Krzysztof Zima / Procedia Engineering 122 (2015) 57 – 64
Knowledge acquisition with the use of examples is a special case of inductive learning. This method of knowledge acquisition is quite frequently used in creating knowledge bases. The method consists in generating a general description of concepts (classes) on the basis of a set of examples objectively representing these concepts (classes). Thus the general description is developed following the principle of induction. The methods of examplebased knowledge acquisition may be differentiated depending on the source of examples. Reasoning is usually modelled as a process in which the conclusion is arrived at by linking rules representing generalized knowledge into induction chains. In case-based reasoning the primary source of knowledge are not rules but a set of cases originating from previously encountered problems and retained in memory. Solutions to completely new problems are generated by finding the most suitable cases and adjusting them to the new case. CBR may be defined as models solving new problem cases by adapting the results which were used in solving some old cases. A new problem is solved by finding a similar case in the available set of cases and applying to it the solution that was used in the found case or cases. The CBR systems thus consist of a data base in which the cases are stored along with their solutions. Additionally, they have instruments implementing the algorithms capable of comparing cases and adapting solutions. The problems of construction cost or indirect cost estimation have been discussed inter alia in [1], [2], and the problems related to exceeding the said cost have been described inter alia in [3]. Numerous attempts are currently being made to improve the accuracy of construction cost estimation. For example, some of them make use of neural networks. The basic methods and problems of unit price estimation for construction works have been discussed in [4]. Gunaydın et al. [5] presented a model of early cost calculation of structural systems of building using neural networks. They based the cost estimation process on 9 project parameters. Gwang-Hee Kima et al. [6] took into account 12 parameters in their calculations and combined the neural networks with genetic algorithms. Similarly, Juszczyk in his work [7] used neural networks for calculating the cost of erection of multifamily buildings at the early stage of planning. In contrast, in [8] the multi-criteria analysis methodology was used to calculate the execution cost of one selected element, which were the walls. Case based reasoning may be used in cost calculations of construction of buildings. For example, Sung-Hoon et al. [9] proposed a model using previous experience in cost calculations. AHP was used to elicit domain knowledge from the experts systematically. CBR provides a simple method of construction costs measurement, given the fact that most studies have identified the existence of nonlinear relations between the cost and the influence of factors. [10] and [11] have demonstrated it on the example of bridge and road works, whereas [12], [13] have presented a comparison for construction works. Cost calculation also uses hybrid systems, in which case-based reasoning is a component. For example, in [14] the CBR system has been supported in cost calculation by the AHP method used for the purpose of determining the weight of individual criteria. This study focused on the development of a more accurate estimate technique for highway projects in South Korea at the early stage of cost estimation. Other methods are also used in cost calculation. Studies comparing CBR to other predictive methods, such as neuron networks [15] and parametric methods [15], [16], have demonstrated that the CBR cost estimation model may retain a high quality of prediction for a longer time – better than other models, and its solution finding capability remains high in spite of the imprecision of certain information. The cost model developed by [17] was based on the CBR methodology. The cost estimation model is based on similar historical cases as the basis for the cost calculations. Mesaros et al. analyzed the empirical approaches of Slovak construction companies to cost management and examined wide range of cost management and cost controlling methods and IT tools [18]. 2. The case-based reasoning system – CBR Reasoning on the basis of known cases is a method using similar cases which have been once solved operating on the presumption that similar cases have similar solutions. An important feature differentiating CBR from other reasoning methods is that this method is characterized by a mechanism of the system’s continuous learning in a way that is independent of the expert. The learning process takes place by accumulation of cases (solutions) in a data base and making them accessible for solving new problems in the future. Solved cases stored in the data base usually
59
Krzysztof Zima / Procedia Engineering 122 (2015) 57 – 64
contain the knowledge that is specific to a given problem. The data base does not have to contain complete or comprehensive knowledge from a given field [19]. In order to acquisition knowledge from the cost estimates, technical specifications may be used, for example, text mining methods that have been described, among others, in [20]. The operation cycle of the system implementing the CBR method may be described in short with the use of four Rs: x Retrieve – finding the most similar case or set of cases, x Reuse – using the knowledge contained in this case for solving the problem, x Revise – assessing the usefulness of the suggested solution, x Retain – storing the experience in memory for the purpose of its later use while solving new problems in the future. The quality and effectiveness of the CBR method depends on four factors: x the manner and accuracy of the modelled area mapping, x the quality of the comparison making, x the methods of solution adaptation, x accuracy in case description. 3. Preliminary cost estimation model using the CBR methodology The CBR method may be used in construction inter alia for the purposes of cost management or preparing reliable project cost estimations for the investor, reflecting the market prices. The main idea of this concept is viewing the prepared cost estimate as a data base and making the assumption that the information on costs will be accumulated and stored. The first stage of the presented method is creation of an adequate data base containing costs of construction works or elements. The data base on costs is based on real estimates submitted in offer bids for construction of various sports facilities, such as: sports halls, football pitches, tennis courts, skate parks, etc. The offered estimates have been broken down within the data base into individual uniform groups of works W 1, W2, …. Wn or construction elements E1, E2, …. En. Each construction work or element has been described providing various types of information included in the estimates, such as: the cost of work, the scope of works, date of realization, location of the facility where the work or element belongs, etc. ݁ݏܽܥ ൌ ሺ݊݅ݐݑ݈ݏ ǡ ݊݅ݐܽݑݐ݅ݏ ǡ ݀݁݊݅ݐ݅ݎܿݏ ሻܥݎ ൌ ሺܵ ǡ ܵݐ ǡ ܦ ሻ
(1)
where: Si – solution of the i-th case, i.e. the unit cost of a construction work or an element, Sti – situation for the i-th case, i.e. the estimate situation for the i-th case (date of execution, project location, the scope of works), Di – description of the i-th case, i.e. additional information on the i-th case (such as the type or kind of element, advantages of the solution, preferences). And thus the data base is a collection of all cases (construction works or elements) containing the solutions of the problem, situations and descriptions. ܤܦൌ σୀଵ ܥ ൌ σୀଵሺܵ ǡ ܵݐ ǡ ܦ ሻ
(2)
where: DB – relative data base, Ci – the i-th case, n – the number of cases. The data base structure has a hierarchic form so that the information could be arranged in an orderly fashion, which facilitates finding the desired construction elements E 1, E2, …. En or works W1, W2, …. Wn making up the scope of work for the new estimate calculation. The information has been ordered by indexing construction elements
60
Krzysztof Zima / Procedia Engineering 122 (2015) 57 – 64
Ei and works Wi, which will allow reducing the number of necessary similarity examinations. Similarity examinations will only be carried out within the groups of uniform works or elements, according to the ascribed indices. The procedure of cost estimation based on CBR has been presented in Fig. 1.
Fig. 1. Cost estimation based on CBR scheme.
The next step is assessment of similarity between the cases retrieved from the data base and the new case. We are thus looking for a case in which the situation for the i-th case from the data base Sti is equal to the situation for the new case StN. It is an ideal situation, where we could assume that solution Si = SN, which means that the unit price of the work or element will be the same or very close to the i-th solution for the new case. If such situation does not occur, we must find the solutions that are the most similar to the new case. So, we must find a work or element that is the closest taking into account the description parameters. In order to do that, we need to determine local similarities for individual construction elements or works SIM (S N, Si) [0,1], where 0 means no similarity and 1 – complete similarity. Three situations may be enumerated in the created data base DB for which local similarities are calculated with the use of different formulas [based on 21]: x for the sub-criteria adopting numerical values within the range [V min – Vmax] ܵܯܫ൫ܸே ǡ ܸ ൯ ൌ ͳ െ x
(3)
for the sub-criteria described by linguistic values
ܵܯܫ൫ܸே ǡ ܸ ൯ ൌ ͳ െ x
หಿ ିೕ ห ೌೣ ି
หሺಿ ሻିሺೕ ሻห ெିଵ
(4)
for the sub-criteria in which one or both values are unknown
ܵܯܫሺܸே ǡ ݊ܽݐܽ݀ሻ ൌ Ͳ
(5)
The next step is adaptation, i.e. modification of the solution to make it more suitable for the solution of the new case. The process of unit cost modification in the model is aimed at adjusting the cost of construction works to the
61
Krzysztof Zima / Procedia Engineering 122 (2015) 57 – 64
location of a given project and at introducing further corrections to accommodate the indexation of cost of materials as well as labor and equipment operation prices. The model implements automatic adaptation in this respect, based on the data base of regional and indexation rates. However, the model also features the option of adaptation by the user if no similar case has been found in the data base DB. In such situation, the user – by definition familiar with the cost estimation methods and processes – will prepare a classic cost estimate for individual construction works for which the model has found no solution. The last step is evaluation of the obtained solution based on local or global testing of the model. Testing may be done both on the local level – for individual construction works and elements – and on the global level – referring it to the final outcome of the cost estimate. Practical verification of the results obtained from the proposed model may be done by comparing the estimated total cost of the new case with the actual cost for the selected testing set, so that the mean absolute estimation error could be calculated with the use of the following formula: ൌ
หి ห షి σ ిా ఽి ిఽి
୬
ͲͲͳ כΨ
(6)
where: MAEE – mean absolute estimation error, CCBR – estimated cost of the new case, CACT – actual cost of the case from the selected testing set, N – the number of tested cases. 4. Example of cost estimation based on the CBR model Simplified calculations for a sports field construction in the town of Alwernia will now be discussed as an example. The project includes construction works consisting in removal of the hummus layer, making the sub-base and paving the area around the field with setts as well as installation of the field surface made of synthetic turf and setting up the football goals. Table 1. Breaking down into construction elements and works in the data base, together with indexing. Main group
Index
Subgroup
Construction works
R1010
Removal of hummus
R1020
Excavation and Fill
R2010
Subbase Layer
…
…
…
Elements
E3010
Pavement surfaces
E4010
Pedestrian Pavement Curbs
…
…
E5010
Sports fields surfaces
…
…
E7010
Sports Equipment
E8010
Protective/safety nets
…
…
… …
…
The data base DB has been developed by breaking down the cost estimates prepared by prospective contractors and presented in their offer bids in the tender procedure. The data base includes only selected offers, rejecting projects which have not been completed yet or have ended in failure. Breaking down the cost estimates is combined with the process of construction elements and works indexing with the purpose of creating the first level of
62
Krzysztof Zima / Procedia Engineering 122 (2015) 57 – 64
automated selection of desired cases and thus limiting the number of comparisons in the process of finding identical solutions. A fragment of the base DB with indexed construction elements and works has been presented in Table 1. The indexing system has been invented by the Author and adopted for sports fields. Each case included in the base DB has been described following formula (1). An example entry for an element is presented below: Casei(E3010) = (76.68 PLN/m2; 28.02.14, Dąbrówka Nowa, 392m2; concrete sett, 19.8×16.2×6.0) The above entry signifies the i-th case with index E3010 – pavement surfaces. The stored information is as follows: x solution – unit price of pavement surface installation: 76.68 PLN/m2; x situation – the date of submitting the offer: 22 April 2014, location of the project: Dąbrówka Nowa, the amount of works from the bill of quantities: 392 m2. x description – the type of material: concrete sett, dimensions (length, width, thickness): 19.8 x 16.2 x 6.0 cm. A fragment of the data base has been presented in Tab. 2. Table 2. Fragment of the data base for the element Pavement surfaces. Wi Project name
Unit price
Si Date of submitting the offer
Di
Project location
Amount of works [m2]
Material
Construction of a multi-purpose sports field at Primary School no 44 in Bytom, 6 Grota-Roweckiego street
69.86 PLN
24.02.2014
Bytom
781
concrete sett, 19.8×16.2×6.0
Construction of a multi-purpose sports field in Dąbrówka Nowa
76.68 PLN
28.02.2014
Dąbrówka Nowa
392
concrete sett, 19.8×16.2×6.0
Construction of a sports field in the village of Biskupie
no data
04.03.2014
Biskupie
no data
carpet lawn created from the seed
Construction of a sports field at the branch of the Primary School in Suchy Las (the area of Forteczna street) together with the accompanying infrastructure, Suchy Las commune
86.34 PLN
07.04.2014
Suchy Las
100
concrete pavement slabs of dimensions: 35x35x5
Construction of a multi-purpose sports field in Gymnasium no 2 in Augustów
16.45 PLN
26.08.2014
Augustów
3000
carpet lawn created from the seed
Construction of a multi-purpose sports field at Primary School no 1 in Głowno
17.64 PLN
02.05.2014
Głowno
4840
carpet lawn created from the seed
The first step for the new case in the CBR model is to see whether the base DB contains an identical case, disregarding, obviously, the date of submitting the offer. If there is no such case, and this is what always happens in practice, we must proceed to calculate the similarities in the main indexed groups which make up the scope of works in the new case. Example calculations for the main group E3010 – Pavement surfaces – have been presented in Tab. 3. SIME3010(NS, Sim) is a partial assessment of similarity for the element pavement surfaces, which is a sum of products of weights and local similarities SIM material(vn, vj) and SIMamount of works(vn, vj). A similar calculation procedure, yet with different assessment criteria, is applicable to the remaining construction works and elements. Automatic modification of the selected solution requires correcting the selected price by the indexing and regional coefficient. Due to its high similarity (0.99), the selected solution has the unit price equal to 76.68 PLN/m2. After correcting it first by the regional coefficient and then by the indexation coefficient: C j(E3010) = 76.68 * 0.98 * 0.964 = 72.81 PLN/m2.
63
Krzysztof Zima / Procedia Engineering 122 (2015) 57 – 64
Assessment criteria
New case
Material
concrete sett, 19.8×16.2×6.0
concrete sett, 19.8×16.2×6.0
Table 3. Calculation of similarity for the main group – Pavement surfaces.
Amount of works
500
781
SIMmaterial(Vn, Vj) SIMamount of work(Vn, Vj) SIME3010(NS, Sim)
-
1 0.94 0.98
Dąbrówka Nowa
Biskupie
Suchy Las
Augustów
Głowno
concrete sett, 19.8×16.2×6.0
carpet lawn created from the seed
concrete pavement slabs of dimensions: 35x35x5
carpet lawn created from the seed
carpet lawn created from the seed
Bytom
Cases from the data base DB
100
3000
4840
392 no data New case – similarity calculation 1 0.98 0.99
0 0 0
0 0.92 0.27
0 0.47 0.14
0 0.08 0.03
The last step is determination of the final estimated price of the new case equal to the sum of construction elements or works making up the total amount of works multiplied by the amount of works from the bill of work quantities: ܥே ൌ ݆ܥଵᇱ ሺܴͳͲͳͲሻ ܮ כଵ ሺܴͳͲͳͲሻ ݆ܥଶᇱ ሺܴʹͲͳͲሻ ܮ כଶ ሺܴʹͲͳͲሻ ݆ܥଷᇱ ሺͲͳͲ͵ܧሻ ܮ כଷ ሺͲͳͲ͵ܧሻ ݆ܥସᇱ ሺܧͷͲͳͲሻ ܮ כସ ሺܧͷͲͳͲሻ ݆ܥହᇱ ሺܧͲͳͳሻ ܮ כହ ሺܧͲͳͳሻ That is: ܥே ൌ Ͳǡͺ ͲͲͻʹ כ ʹʹǡͳͺ כͷ ʹǡͺͳ כͷͲͲ ͳͺͲͳǡʹͷ ʹͳ כͷǡͲͲ ʹ ͵ כͷͲͲ ൌ ʹʹͳͺǡͷܲܰܮ where: ܥே – the cost estimate price for the new case, ݆ܥଵᇱ ሺܴͳͲͳͲሻ – the unit price of work no 1 with index R1010 removal of hummus (indexing in compliance with Tab. 1.) L1(R1010) – the amount of works for the new case related to work with index R1010 removal of hummus. Global mean absolute estimation error in this particular case is as follows:
ൌ
σ
ȁʹʹͳͺǡͷ െ ʹͺͻʹͳͻǡͷȁ ʹͺͻʹͳͻǡͷ ൌ ͲǡͲͷ ͳ
Assuming that the mean calculation error in Poland, viewed as the difference between the calculation prepared by the investor and the calculation of the chosen offer ranges within 30 – 40 %, the result may be considered satisfactory. 5. Conclusions The unquestionable advantages of the presented model are its simplicity and the calculation speed. In comparison with other methods, the good point of CBR is undoubtedly that it allows avoiding the laborious development of
64
Krzysztof Zima / Procedia Engineering 122 (2015) 57 – 64
rules. The mean absolute estimation error is influenced by the size of the case base (the greater the base, the higher the chances of finding more similar cases) and by the correction coefficients. Unfortunately, the passage of time and the territorial differences in prices influence the final outcome of calculations. Solution adaptation taking into account the project location and the inflation is therefore a necessary component, yet it promotes a greater estimation error since precise determination of the values of such factors is difficult. In general, however, the method may be applied to cost estimation of a project at its conceptual stage or to the investor’s cost calculations based on the design documentation. Nevertheless, it requires a large data base composed of cost calculations taken from a great number of offer bids. References [1] A. Leśniak, The application of artificial neural networks in indirect cost estimation, AIP Conference Proceedings, Volume 1558 Issue 1 (2013) 1312-1315. [2] A. Tazikova, M. Kozlovska, Achievement economic and environmental efficiency of construction using the parallel calculating costs, International Multidisciplinary Scientific GeoConference Surveying Geology and Mining Ecology Management, SGEM Vol. 2 (2013) 11-18. [3] E. Plebankiewicz, A. Leśniak, Overhead costs and profit calculation by Polish contractors, Technological and Economic Development of Economy, Volume 19 Issue 1 (2013) 141-161. [4] A. Czarnigowska, Unit rate-based cost estimating - input and methods. Czasopismo Techniczne/Technical Transactions 111(5), (2014) 95103. [5] H. Murat Gunaydın, S. Zeynep Dogan, A neural network approach for early cost estimation of structural systems of buildings, International Journal of Project Management 22 (2004) 595–602. [6] G. H. Kim, J. E. Yoon, Ǥ Ǥ , H. H. Cho, K. I. Kang, Neural networkmodel incorporating a genetic algorithm in estimating construction costs, Building and Environment 39 (2004) 1333 – 1340. [7] M. Juszczyk, The use of artificial neural networks for residential buildings conceptual cost estimation, AIP Conference Proceedings Volume 1558 Issue 1 (2013) 1302-1306. [8] E. K. Zavadskas, L. Ustinovičius, Z. Turskis, G. Ambrasas, V. Kutut, Estimation of external walls decisions of multistorey residential buildings applying methods of multicriteria analysis, Technological and Economic Development of Economy, Volume 11 Issue 1 (2005) 5968. [9] S. H. An, G. H. Kim, K. I. Kang, A case-based reasoning cost estimating model using experience by analytic hierarchy proces, Building and Environment 42 (2007) 2573–2579. [10] I. N. Chengalur-Smith, D. P. Ballou, H. L. Pazer, Modeling the costs of bridge rehabilitation. Transportation Research Part A: Policy and Practice, 31(4) (1997) 281–293. [11] J. S. Chou, M. Peng, K. Persad, J. O'Connor, Quantity-based approach to preliminary cost estimates for highway projects. Transportation Research Record, 1946 (1) (2006) 22–30. [12] M. W. Emsley, D. J. Lowe, A. R. Duff, A. Harding, A. Hickson, Data modelling and the application of a neural network approach to the prediction of total construction costs. Construction Management and Economics, 20(6) (2002) 465-472. [13] D. J. Lowe, M. W. Emsley, A. Harding, Predicting construction cost using multiple regression techniques. Journal of Construction Engineering and Management, 132(7) (2006) 750–758. [14] S. Kim, Hybrid forecasting system based on case-based reasoning and analytic hierarchy process for cost estimation. Journal of Civil Engineering and Management, Vol. 19 Issue 1 (2013) 86-96. [15] G. H. Kim, S. H. An, K. I. Kang, Comparison of construction cost estimating models based on regression analysis, neural networks, and case-based reasoning. Building and Environment, 39(10) (2004) 1235–1242. [16] P. Duverlie, J. M. Castelain, Cost estimation during design step: Parametric method versus case based reasoning method. International Journal of Advanced Manufacturing Technology, 15(12) (1999) 895–906. [17] Ch. W. Koo, T-H. Hong, Ch. T. Hyun, S. H. Park, J-oh Seo, A study on the development of a cost model based on the owner's decision making at the early stages of a construction project, International Journal of Strategic Property Management Vol. 14 Issue 2 (2010) 121-137. [18] P. Mesároš, T. Mandičák, A. Mesárošová, J. Selín, Empirical analysis of approaches to cost management of architectural and construction projects in Slovakia. In: SGEM 2014: International Multidisciplinary Scientific Conferences on Social Sciences and Arts : Arts, Performing Arts, Architecture and Design, Conference proceedings: 1-10 September, 2014, Albena, Bulgaria. Sofia, (2014) 665-672. [19] S. Wrona,P. Zadora, Przetwarzanie wiedzy pochodzącej z przeszłości w celu rozwiązania problemów klasyfikacyjnych, in: H. Sroka, S. Stanek: Inteligentne systemy wspomagania decyzji w zarządzaniu. Transformacje systemów, AE Katowice, Katowice, 1997. [20] M. Gajzler, Text and data mining techniques in aspect of knowledge acquisition for decision support system in construction industry, Technological And Economic Development Of Economy, 16(2), (2010) 219-232. [21] W. Traczyk, Inżynieria wiedzy, EXIT, Warszawa, 2010.