Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property Peter Rossini Lecturer School of Economics, Finance and Property, University of South Australia

Keywords:

Artificial Neural Networks, Artificial Intelligence, Valuation Methodology, Residential Valuation.

Abstract:

This paper researches the application of Artificial Neural Networks (ANN) to residential property valuation. Neural Networks (NN) are a multivariate analytical tool which promises to become the next major tool used for computerized mass appraisal. Current literature on ANN is reviewed. The technique is then used in three procedures to compared to results to those from Multiple Regression Analysis (MRA) and the actual sale prices. The paper helps to clarify the current position with ANN and seeks to examine the practical use compared to MRA.

Introduction Artificial Neural Networks (ANN) are a multivariate analytical tool which promises to become the next major tool used for computerized mass appraisal of residential real estate. Tay (1992) appreciated that property appraisal is essentially a problem of “pattern recognition” and notes that ANN should be able to learn from historical sales and apply the sale prices to the respective ‘pattern’ identified. Borst(1995) suggests that ANN is the next logical step in the chain of tools used for mass appraisal building upon the two most commonly used tools, Multiple Regression Analysis and the Feedback Method. They (ANN) are based upon the structure of the brain specifically upon neurons, the smallest unit in the brain. ANN forms layers of interconnected neurode as Borst (1991) explains:

“there is an input layer, a hidden layer, and an output layer. In mass appraisal, the input neurodes represent the input data in much the same fashion as the X, (independent variables) in the linear model. The output layer represents the output sought by the model of the process of interest. In mass appraisal, one output neuron would be used to represent estimated selling price or perhaps estimated rent. The hidden layer allows for the combination of input data in a near infinite number of ways.” This typical topology is illustrated in Figure 1

Figure 1

input 1

input 2 Output input 3

input 4 input layer

hidden layer

Page 1

output layer

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

Weights (analogous to regression coefficients) are used with the input data to try to model the output layer. The process of training the network involves the establishment of weights such that the average square error over the training set is minimized. As can occur with Multiple Regression Analysis (MRA), it is possible to “over-train” the model. In the case of MRA this can occur when the number of explanatory variables approaches the number of observations. Similarly in ANN the model can be over-trained or over-fitted and the model tends to work only with that particular data set. Robust testing of ANN models requires split sets of data, one set for training the model and a further set for testing. Such model testing has occur in several countries using those property attribute files peculiar to those locations. For example Borst (1991) reported the use of ANN to data sets of family residences in New England. Tay and Ho (1992) examined sets in Singapore using 833 residential apartment properties for training and tested this against 222 case set of similar apartment properties. Do and Grudnitiski (1992) used data from a multiple listing service in California while Evans (1993) worked with residential housing in the United Kingdom. The most recent work comes from Worzala (1995), Borst(1995, 1996), and McCluskey (1996a, 1996b). These latest studies use multiple training sets and compare the output of ANN’s with MRA. Summarizing, Borst (1995) concludes that 1. Accuracy will likely rival or exceed that of the linear model calibrated by MRA. 2. The analyst need not be a trained statistician. 3. Software implementation of NNTs arc plentiful and relatively inexpensive. 4. Explainability is no longer a deficiency of NNTS. 5. Strong consideration should be given for their use in mass appraisal. They can be used as a primary valuation tool, or as a quality check on values estimated by other methods. Worzala (1995) sought to reproduce the methodologies employed by previous researchers (Borst 1991, Do and Grudnitski 1992) in the application of ANN to their data set. They worked with 288 cases, using three methodologies for the analysis of the data. Case 1 used the whole data set, Case 2 used properties within the price range as does the Do and Grudnitski study ($105,000 to 288,000) and in the final case they worked with a small ‘homogenous’ set of houses similar to the Borst (1991) study. They sought to examine if the ANN model produced was superior to regression results and if there was consistency between the two major software producers, NeuroShell and @Brain. The results of the case study are provided below (mean percentage error): Case 1: NeuroShell 12.1%, @Brain 17.4% and Regression 20.1% Case 2: @Brain 11.1%, Regression 11.1%, 13.1% NeuroShell Case 3: NeuroShell 11.6%, @Brain 11.7% and Regression 12.8% While Borst(1995), Do (1992) and Tay (1992) strongly support the use of ANN, the results of Worzala (1995) cast some doubts about this. Their results are inconsistent at best and demonstrate a problem ANN, that even with the same variable inputs, it is possible to get different answers, if different software is used. Borst (1996) uses a mathematical comparable sales analysis as well as MRA and concludes that neural networks performed at least as well as the other models used to predict prices. The literature shows that there is mixed success with this method, probably due to different variable inputs and market conditions. While Borst and McCluskey (1996) state that the predictive abilities of ANN are well established through investigative studies, James and Lam (1996, 6) feel that more work must be done on “real world data sets in order to validate the methods for use in appraisal” This project seeks to apply this technique to South Australian data. The results of this project would go some way to establishing the usefulness of this method to Australian and in particular to South Australian market conditions.

Methodology In this paper three general procedures are used to assess the suitability of neural networks for residential valuation. Each procedure uses sales data recorded by the Department of Environment and Natural Resources (DENR) in South Australia and accessed through the UPmarket sales retrieval system. DENR records details of all sales occurring in South Australia and makes these available in digital format. A wide range of data is available for each property including details of the sale, assessed values, legal descriptors, locational information and physical description in the case of residential properties. UPmarket allows for detailed filtering of sales including the exclusion of non-market transactions such as sales which involve related parties, government agencies, other land, other inclusions or are specifically tagged as non-market transactions. For this research these sale were excluded and then only the sales of detached dwellings were used. Variables used are summarised in Table 1.

Page 2

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

Table 1

Description of Variables used in Analysis

Variable Sale Date

Description Actual Sale Date Recorded as dd/mm/yy

Sale Price Suburb Improvements Land Area Zone Rooms Equivalent Building Area Condition Wall Type Roof Type Building Style Year of Construction

Conversion Used in Modeling Converted to number of months from sale date to date when model is to be applied. Price in Dollars recorded on the Transfer Used in current format Suburb the property is located in Converted to Dummy Variables where appropriate String of improvement descriptors Converted to Dummy Variables where appropriate Area in Hectares Used in current format Government Code for zone which determines Converted to Dummy Variables development controls where appropriate Number of Main rooms in the building Used in current format Calculated equivalent area of buildings based on Used in current format weighted average formula for main buildings and other buildings Scaled code from 1 - Demolition level to 9 - high quality Used in current format new condition. Categorical variable allowing for 9 roof cladding types. Converted to Dummy Variables where appropriate Categorical variable allowing for 48 building style. Converted to Dummy Variables where appropriate Categorical variable allowing for 9 wall cladding types. Converted to Dummy Variables where appropriate Date of construction of the main building. Dates varied Converted to Age of Building in from 1880 to 1996. Years.

The data have proved to be quite adequate in the past for valuation prediction, but there are some obvious shortcomings in the data. Most of the variables are quantitative or descriptive with a lack of qualitative measures. While the condition code is qualitative it is rather simplistic and cannot capture sufficient variation. Typically this is measured from an inspection of the exterior of the property. It does not encompass internal condition or quality. Environmental characteristics are not represented. While much of the environmental and locational variation may be eliminated by studying properties over a small geographic area (typically Adelaide suburb are small covering less than 2 square kilometres) large variation exists in environmental quality. Aspects such as street-scape, landscape, views and outlook may vary significantly. In some cases these are major contributors to price variation. The data has been modeled for detached houses for some 20 years using techniques such as multiple regression. Typically areas with homogeneous environmental characteristics would record estimates with mean absolute errors of five to ten percent while areas characterised by wide variations in these characteristics may be estimated with mean predictive errors as high as twenty five percent. Given past outcomes it was felt appropriate to assess the results of neural network analysis in several locations using regression as a comparison.

Assessment Procedure 1 The first procedure largely follows the works of various other authors such as Tay and Ho (1992), Do and Grudnitiski (1992), Worzala (1995), Borst (1995) and McCluskey (1996). The data set covered the Local Government Area of Payneham north east of Adelaide’s CBD. A total of 223 sales were used for training and 111 for testing. Results from the neural network analysis were compared to those of multiple regression analysis and the actual sale prices. In each case the sales data was used to produce a model which was then applied to the test properties. The test properties were not included in the original models. No sales were removed during the modeling process. The widely used practice of removing sales which are poorly predicted on the basis that they might be non-market transactions was avoided as this tends to over estimate the predictive ability of models. The results of these models were assessed by establishing the mean absolute error between the predicted values and the actual sales prices for each model. The errors for the neural network and MRA models were then split into price ranges to test if the errors vary at different price levels. ANOVA was used to test for differences in mean and variance. The duplication of methods would test if the Adelaide data produced similar results to those found elsewhere.

Page 3

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

While these studies are good for testing the general viability of the methods, the number of sales used and emphasis on overall result is most applicable to mass appraisal situation. The research was extended to consider more typical residential valuations.

Assessment Procedure 2 The second procedure is designed to more closely replicate the way in which a valuer might apply neural networks in a typical residential valuation situation. Six well distributed suburban areas were chosen, one in each of Adelaide’s northern, southern, eastern and western (inner to middle distance) suburbs, plus developing suburbs in Adelaide’s outer northern and southern areas. This selection gave variation in locational, environmental and housing characteristics with an expectation that the models would perform better in some locations. Once these areas were selected, multiple regression and neural network models were produced using residential sales from the proceeding 12 months. Probable non-market transactions were removed. No other sales were removed regardless of the ability of various models to explain price variation. These models were saved and then applied to the sales which occurred in the following month. This approach resulted in the model being applied to between five and seven sales in each location. While this number may seem low, this assessment procedure was to establish the percentage of properties which were likely to be evaluated at a reasonable level using a process more comparable to commercial valuation practice.

Assessment Procedure 3 To establish further the usefulness of the technique, another location was selected and data collected to test the stationarity of the models. The outer suburb of Salisbury, north of the city, was randomly selected for this procedure which involved producing eight models for both MRA and NN. These used progressively the last six months sales so that in each successive model, one months data was removed and data from the next month was added. By the end of the modeling process, completely new data were being used. These models were then applied to details of the three most recent sales. The estimates are graphed show the stationarity if the estimates. If the models are stable then estimates of values for properties will remain much the same over the period. This again more closely replicates the needs of the typical residential valuation. Certainly valuers would not expect the value to change significantly if a valuer was to value the same property in each of eight successive months given the last six months sales in each case. One problem which could be encountered with this procedure is that of changes to value due to a general increase or decrease in the market. If the market were in fact changing then it should appear as a trend in final estimates and the sales date variable should be significant.

Results Procedure 1 - Mass Appraisal Model - Payneham

Page 4

Figure 2 - Distribution of Training and Test Data 60% Training Data

50% Percentage of Sales

The data used consisted of 223 sales for training the models and 111 for testing. The distribution is shown as Figure 2. This shows a fairly typical distribution of house prices roughly normally distributed but skewed to the right. The distribution of training and test data are quite similar. A multiple regression model was established using ordinary least squares with no data transformations other than those mentioned earlier. The model used 10 independent variables, each significant at greater than 98%. Several neural network models were produced. These had either all variables included or only those found significant in the MRA. Similarly various different neural network structures were used. Many of the results were similar, with the models using the smaller number of variables usually superior. The simplest topology, using three layers incorporating 10 input neurodes and 10 hidden neurodes with the ten variables used in regression, proved to be at least as good as most others and was chosen for use in comparison. The test data applied to the models yielding the results in Table 2.

Testing Data 40% 30% 20% 10% 0% 75000

175000

275000

375000

Price (Mid Point)

475000

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

The comparison indicates that MRA produced better results on most counts. Both MRA Neural Net models fall within the expected error range. A mean percentage error of ten to twenty Mean Absolute Error (MAE) 12.75% 19.97% percent was expected due to general market Correlation (with Actual) 0.86 0.69 imperfections and the lack of suitable Minimum Error 0% 1% environmental and internal condition Maximum Error 68% 77% variables. The mean absolute error (MAE) Lowest 3rd of Prices MAE 14.54% 10.25% for regression techniques suggests it is Middle 3rd of Prices MAE 9.80% 20.36% clearly superior in this case. This is supported by the correlation between the Highest 3rd of Prices MAE 13.34% 29.29% estimated price for each model and the actual Lowest 3rd of Prices Variance 0.016 0.004 observed values. A significant finding Middle 3rd of Prices Variance 0.006 0.030 comes from the stratified results. By Highest 3rd of Prices Variance 0.017 0.025 splitting the test data into three equal parts, it was possible to see the variation in the predictive power of the models over the sample space. The analysis shows that for MRA the MAE of 12.75% is approximated in each of the three groups. Analysis of variance shows that these means do not statistically differ and the same is so for are the variances. Similar analysis for the outcomes of the neural network show that the error increases with the sale price of properties with the lowest priced group having an MAE of only 10.25% while the highest priced properties blow-out to 29.29%. Analysis of variance including the use of Hartley’s Test shows that both the means and variances are statistically different. Generally speaking, the neural network produced better results for low priced properties while MRA out performed NN on the medium and high priced properties. This is highlighted in Figure 3 which indicates that the MRA model is generally superior and that, while both models suffer from heteroscedasticity, this is more noticeable with the NN model. For each model the relationship between the actual and estimated price should approach a ratio relationship with a beta of one. The MRA model comes close with an intercept of 749 and a beta of .9963. The neural network model shows some serious systematic error with an intercept of 49,563 and a beta of .7864. Table 2 - Comparison of Models - Procedure 1

Figure 3 - Neural Network and Multiple Regression Estimates Against Price 400000 350000 NN

Actual Price

300000

MRA

250000 200000 150000 100000 50000 0 0

50000

100000

150000

200000

250000

300000

350000

Estimated Price

Procedure 2 - Valuations based on small data sets in Six Suburbs Results from the first procedure were used as the basis for the creation of models for the second procedure. Because of the large number of models which would be produced by experimenting with different variables and typologies, it was decided to follow the practice of establishing an MRA model first then using the same variables in the NN model with the same number of neurodes in a single hidden layer. The decision to use the sales from the preceding twelve months for each suburb led to variable training sample sizes. The smallest training set was for Tranmere with 40 sales, the largest being Golden Grove with 119 sales. The results of the application of the test data to the trained models is shown in Table 3. Results - Procedure 2 Table 3 Suburb

Price

NN

MRA

Page 5

MAE NN

MAE MRA

Best

MAE NN MAE MRA

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

Broadview 1 Broadview 2

$185,000 $128,000

Estimate $213,968 $99,481

Estimate $195,097 $135,254

(Valuation) 16% 22%

(Valuation) 5% 6%

MRA MRA

Broadview 3

$129,000

$95,044

Broadview 4

$99,000

$84,863

Broadview 5

$180,000

Broadview 6

$85,000

Broadview 7

(Suburb) Suburb)

$101,193

26%

22%

MRA

$112,238

14%

13%

MRA

$156,817

$158,269

13%

12%

MRA

$92,530

$96,834

9%

14%

NN

$220,000

$214,463

$266,741

3%

21%

NN

Flinders Park 1 Flinders Park 2

$97,500 $107,000

$115,562 $114,036

$98,453 $109,009

19% 7%

1% 2%

MRA MRA

Flinders Park 3

$107,500

$93,835

$92,405

13%

14%

NN

Flinders Park 4

$139,500

$114,204

$101,646

18%

27%

NN

Flinders Park 5

$125,150

$112,330

$103,758

10%

17%

NN

Golden Grove 1 Golden Grove 2

$158,000 $205,000

$119,588 $170,337

$133,981 $177,855

24% 17%

15% 13%

MRA MRA

Golden Grove 3

$180,000

$205,158

$189,569

14%

5%

MRA

Golden Grove 4

$139,000

$144,120

$150,069

4%

8%

NN

Golden Grove 5

$105,000

$105,109

$95,100

0%

9%

NN

Golden Grove 6

$110,000

$105,345

$97,235

4%

12%

NN

Golden Grove 7

$113,000

$106,929

$106,849

5%

5%

NN

Marion 1 Marion 2

$93,500 $117,000

$109,046 $125,929

$96,502 $124,453

17% 8%

3% 6%

MRA MRA

Marion 3

$120,000

$125,454

$120,266

5%

0%

MRA

Marion 4

$127,000

$132,508

$137,247

4%

8%

NN

Marion 5

$134,000

$132,459

$128,958

1%

4%

NN

Marion 6

$166,000

$165,110

$145,606

1%

12%

NN

Marion 7

$172,500

$169,646

$176,662

2%

2%

NN

Seaford Rise 1 Seaford Rise 2

$124,000 $87,000

$83,919 $97,815

$105,521 $82,625

32% 12%

15% 5%

MRA MRA

Seaford Rise 3

$83,000

$68,881

$83,368

17%

0%

MRA

Seaford Rise 4

$68,000

$112,822

$66,493

66%

2%

MRA

Seaford Rise 5

$90,000

$98,209

$103,392

9%

15%

NN

Tranmere 1 Tranmere 2

$156,000 $80,000

$101,146 $153,946

$161,523 $90,185

35% 92%

4% 13%

MRA MRA

Tranmere 3

$101,000

$111,565

$142,960

10%

42%

NN

Tranmere 4

$85,000

$85,564

$90,844

1%

7%

NN

Tranmere 5

$150,000

$149,673

$170,833

0%

14%

NN

Table 4 - Comparison of Results Procedure 2 NN MRA Mean Absolute Error 15.27% 10.42% Std Dev Errors 18.36% 8.41% Correlation (with Actual) 0.78 0.90 Minimum Error 0% 0% Maximum Error 92% 42% Number of Times 17 19 Best Estimate

Page 6

15%

13%

13%

12%

10%

10%

5%

5%

27%

7%

28%

16%

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

The table shows that for each suburb MRA produces the lowest or equal lowest MAE. At a suburb level NN is never superior but many NN estimates are superior to those of MRA. Examination of the aggregate results in table 4 supports this. While the overall MAE is superior for MRA, the number of times MRA produces the best estimate approximates the number for NN. A closer inspection of the results suggests that while regression produces a consistently good result, the NN produces excellent results quite often. Generally the NN produces results which are either very good or very poor while MRA tends to a mediocre result more often. This is highlighted in Figure 3. The NN produces more estimates with less than 5% error but when the results are evaluated at greater than 8% error MRA produces more estimates under the required rate. Eighty three percent of properties are estimated with less than 15% error using MRA while only sixty four percent meet this standard using NN.

Percentage of Properties

Figure 4 - Percentage of Valuations with Error Levels 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Less Than 5% Error

NN MRA

Less Than 10% Error

Less Than 15% Error

Less Than 20% Error

Less Than 25% Error

Less Than 30% Error

Procedure 3 - Stationarity Test This procedure was designed to test stationarity of the models through logical successive sample selection. In this test eight MRA and NN models were required. To simplify the approach only four variables were used in each model. These proved to be highly significant in each successive MRA model. The three test properties were then applied to the models. The results of these tests are shown in Table 5 and shown graphically in Figure 4. The results are remarkably stable especially for the lower priced properties. This is true for both the MRA and NN models. The models tend to consistently over or underestimate the actual price. Neural networks produce consistently good estimates for properties 1 and 3 while MRA produces better estimates for property 2.

Page 7

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

Table 5 Table of Estimates for Three Properties using Successive Sales Sampling Using Sales from Months 1 - 6

Property 1 Sold for $80000 Property 2 Sold for $75,000 Property 3 Sold for $120,000 MRA ANN MRA ANN MRA ANN $78,752 $81,327 $82,791 $84,607 $96,861 $109,385

Months 2 - 7

$75,320

$81,073

$79,280

$84,849

$99,183

$123,419

Months 3 - 8

$76,914

$80,537

$81,840

$85,118

$96,825

$118,152

Months 4 - 9

$76,611

$80,847

$81,479

$84,133

$96,637

$115,943

Months 5 - 10

$76,649

$82,973

$81,495

$87,771

$98,905

$130,077

Months 6 - 11

$77,880

$85,354

$83,788

$87,290

$104,723

$109,956

Months 7 - 12

$78,103

$84,492

$83,879

$87,398

$101,244

$104,983

Months 8 - 13

$80,293

$84,230

$86,629

$87,152

$92,813

$104,885

Figure 4 - Price Estimate Stability Tests

MRA - Prop 1 - Sale price $80,000 ANN - Prop 1 - Sale price $80,000 MRA - Prop 2 - Sale price $75,000

$140,000

ANN - Prop 2 - Sale price $75,000 MRA - Prop 3 - Sale price $120,000 ANN - Prop 3 - Sale price $120,000

$130,000

Estimated Price

$120,000

$110,000

$100,000

$90,000

$80,000

$70,000

$60,000 Months 1 - 6

Months 2-7

Months 3 - 8

Months 4 - 9

Months 5 - 10

Sales used in Analysis

Page 8

Months 6 - 11

Months 7 - 12

Months 8 - 13

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

General Findings James (1996) points out the advantages of neural networks in terms of small data sets. This is also an observation from this research. Neural networks would seem to be a better tool for smaller data sets while regression is clearly superior for larger data sets. Regression is statistically poor with small data sets, a problem not encountered by neural networks. Regression results can be calculated very quickly regardless of the size of the problem while the time needed to produce neural networks seems to increase exponentially with the size of the data set. The models established using the smaller data sets were generally achieved in less than one minute. However the neural network established for the largest problem (223 cases and 42 variables) did not reach a suitable global minimum after 23 hours, when an unfortunate power failure left the model stranded. Clearly a suitable model must be achieved in minutes to be commercially viable. This also supports the use of neural networks for small data sets. Generally speaking the results for MRA are consistently good while NN produces some excellent estimates but some which are equally poor. Perhaps it is safe to conclude that in this research MRA produced less “risky” results evidenced by the lower mean and variance of the absolute errors. While this conclusion is valid on a pragmatic test basis, it does not reflect other aspects of the test procedures. MRA has been applied successfully to residential property data for many years. Neural networks are relatively new. This disparity in the experience levels may explain much of the variation. Another important factor in the future application of neural networks are improvements in software. The approach is notoriously “black-box” in nature. Few tools exist to establish the appropriate network typology or the influence of variables in the final model. This however is changing rapidly. Tay and Ho (1994) used a method to approximate the influence of the variables in the final model. Borst (1995) suggests that significant further progress has been made through the graphical outputs of modern software. Similarly there has been advances in the technology used to select the appropriate topology. Software used in this paper required the user to select a typology but McCluskey (1996,44) allows ANN “to work in dynamic mode i.e. allowing it the freedom to determine its own typology”. New tests need to be established to assist with the question of when to stop the training. In practice it is very difficult to decide when to stop training the network. If the network is under trained there is likely to be large errors with both the training and test data. Over training will led to good results for the training data but large errors in the test applications. Another issue is the variations in the model which can result from using different software or from repeated trials using the same software. This was highlighted in the work by Worzala (1995). One clear advantage of multiple regression over neural networks is the ability to statistically test the model and the ability to establish a confidence interval for predictions. Users of MRA are able to use the this confidence interval in conjunction with other data to make more reliable estimates. Many of these issues are summarised by James (1996) when he states “In practice very careful methodologies should be followed to ensure that the neural networks are proper estimators of the population, and that they have adequately identified the major patterns in the data. The back-propagation algorithm is notorious for stopping at local rather than global patterns in a data set.”

Conclusions This paper seeks to assess the application of artificial neural networks and multiple regression to residential valuation. The assessment would support the use of MRA ahead of NN but this is not completely conclusive. While this is the case in these tests, the results of researches such as Do and Grudnitski (1995) suggest that NN is a superior technique. It is certainly much to early to dismiss neural networks as a suitable valuation tool. Early users of regression were highly skeptical. Advancements in computer technology and improvements to regression techniques and statistics have made it a more viable proposition as an analytical tool. Similar improvements for neural networks may establish it as the preeminent tool for property valuation.

Page 9

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

References Borst, R.A and McCluskey (1996) The Role of Artificial Neural Networks in the Mass Appraisal of Real Estate, paper presented to the Third European Real Estate Society Conference, Belfast, June 26-28 Borst, R.A. (1991) Artificial Neural Networks: The Next Modelling/Calibration Technlogy for the Assessment Community? Property Tax Journal, IAAO, 10(1):69-94 Borst, R.A. (1995) Artificial neural networks in mass appraisal, Journal of Property Tax Assessment & Administration, 1(2):5-15 Do, A.Q. and Grudnitiski, G. (1992), A Neural Network Approach to Residential Property Appraisal, The Real Estate Appraiser, Dec 1992:38-45 Evans, A. James,H. And Collins, A. (1993), Artificial Neural Networks: an Application to Residential Valuation in the UK, Journal of Property Valuation & Investment: 11:195-204 James, H. And Lam, E, (1996) The Reliability of Artificial Neural Networks for Property Data Analysis, paper presented to the Third European Real Estate Society Conference, Belfast, June 26-28 McCluskey, W.(1996a) Predictive Accuracy of Machine Learning Models for Mass Appraisal of Residential Property, New Zealand Valuer’s Journal, July:41-47 McCluskey, W., Dyson, K., McFall, D. & Anand,S. (1996b) Mass Appraisal for Property Taxation: An Artificial Intelligence Approach, Land Economics Review, Vol 2, No 1, 25-32 O’Roarty, B., Adair, A., McGreal, S. And Patterson, D. (1996) Computer Assisted Techniques and the Determination of Retail Rents, paper presented to the Third European Real Estate Society Conference, Belfast, June 26-28 Tay, D.P.H. and Ho, D.K.K. (1992), Artificial Inteligence and the Mass Appraisal of Residential Apartment, Journal of Property Valuation & Investment, 10:525-540 Tay, D.P.H. and Ho, D.K.K. (1994), Intelligent Mass Appraisal, Journal of Property Tas Assessment & Administration, Vol 1, No 1, 5-25 Worzala, E., Lenk, M. And Silva, (1995) A. An Exploration of Neural Networks and Its Application to Real Estate Valuation. The Journal of Real Estate Research, Vol. 10 No. 2

Peter Rossini, Lecturer - University of South Australia School of Economics, Finance and Property North Terrace, Adelaide, Australia, 5000 Phone (61-8) 83022649 Fax (61-8) 83021512 Mobile 041 210 5583 E-mail [email protected]

Page 10

Keywords:

Artificial Neural Networks, Artificial Intelligence, Valuation Methodology, Residential Valuation.

Abstract:

This paper researches the application of Artificial Neural Networks (ANN) to residential property valuation. Neural Networks (NN) are a multivariate analytical tool which promises to become the next major tool used for computerized mass appraisal. Current literature on ANN is reviewed. The technique is then used in three procedures to compared to results to those from Multiple Regression Analysis (MRA) and the actual sale prices. The paper helps to clarify the current position with ANN and seeks to examine the practical use compared to MRA.

Introduction Artificial Neural Networks (ANN) are a multivariate analytical tool which promises to become the next major tool used for computerized mass appraisal of residential real estate. Tay (1992) appreciated that property appraisal is essentially a problem of “pattern recognition” and notes that ANN should be able to learn from historical sales and apply the sale prices to the respective ‘pattern’ identified. Borst(1995) suggests that ANN is the next logical step in the chain of tools used for mass appraisal building upon the two most commonly used tools, Multiple Regression Analysis and the Feedback Method. They (ANN) are based upon the structure of the brain specifically upon neurons, the smallest unit in the brain. ANN forms layers of interconnected neurode as Borst (1991) explains:

“there is an input layer, a hidden layer, and an output layer. In mass appraisal, the input neurodes represent the input data in much the same fashion as the X, (independent variables) in the linear model. The output layer represents the output sought by the model of the process of interest. In mass appraisal, one output neuron would be used to represent estimated selling price or perhaps estimated rent. The hidden layer allows for the combination of input data in a near infinite number of ways.” This typical topology is illustrated in Figure 1

Figure 1

input 1

input 2 Output input 3

input 4 input layer

hidden layer

Page 1

output layer

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

Weights (analogous to regression coefficients) are used with the input data to try to model the output layer. The process of training the network involves the establishment of weights such that the average square error over the training set is minimized. As can occur with Multiple Regression Analysis (MRA), it is possible to “over-train” the model. In the case of MRA this can occur when the number of explanatory variables approaches the number of observations. Similarly in ANN the model can be over-trained or over-fitted and the model tends to work only with that particular data set. Robust testing of ANN models requires split sets of data, one set for training the model and a further set for testing. Such model testing has occur in several countries using those property attribute files peculiar to those locations. For example Borst (1991) reported the use of ANN to data sets of family residences in New England. Tay and Ho (1992) examined sets in Singapore using 833 residential apartment properties for training and tested this against 222 case set of similar apartment properties. Do and Grudnitiski (1992) used data from a multiple listing service in California while Evans (1993) worked with residential housing in the United Kingdom. The most recent work comes from Worzala (1995), Borst(1995, 1996), and McCluskey (1996a, 1996b). These latest studies use multiple training sets and compare the output of ANN’s with MRA. Summarizing, Borst (1995) concludes that 1. Accuracy will likely rival or exceed that of the linear model calibrated by MRA. 2. The analyst need not be a trained statistician. 3. Software implementation of NNTs arc plentiful and relatively inexpensive. 4. Explainability is no longer a deficiency of NNTS. 5. Strong consideration should be given for their use in mass appraisal. They can be used as a primary valuation tool, or as a quality check on values estimated by other methods. Worzala (1995) sought to reproduce the methodologies employed by previous researchers (Borst 1991, Do and Grudnitski 1992) in the application of ANN to their data set. They worked with 288 cases, using three methodologies for the analysis of the data. Case 1 used the whole data set, Case 2 used properties within the price range as does the Do and Grudnitski study ($105,000 to 288,000) and in the final case they worked with a small ‘homogenous’ set of houses similar to the Borst (1991) study. They sought to examine if the ANN model produced was superior to regression results and if there was consistency between the two major software producers, NeuroShell and @Brain. The results of the case study are provided below (mean percentage error): Case 1: NeuroShell 12.1%, @Brain 17.4% and Regression 20.1% Case 2: @Brain 11.1%, Regression 11.1%, 13.1% NeuroShell Case 3: NeuroShell 11.6%, @Brain 11.7% and Regression 12.8% While Borst(1995), Do (1992) and Tay (1992) strongly support the use of ANN, the results of Worzala (1995) cast some doubts about this. Their results are inconsistent at best and demonstrate a problem ANN, that even with the same variable inputs, it is possible to get different answers, if different software is used. Borst (1996) uses a mathematical comparable sales analysis as well as MRA and concludes that neural networks performed at least as well as the other models used to predict prices. The literature shows that there is mixed success with this method, probably due to different variable inputs and market conditions. While Borst and McCluskey (1996) state that the predictive abilities of ANN are well established through investigative studies, James and Lam (1996, 6) feel that more work must be done on “real world data sets in order to validate the methods for use in appraisal” This project seeks to apply this technique to South Australian data. The results of this project would go some way to establishing the usefulness of this method to Australian and in particular to South Australian market conditions.

Methodology In this paper three general procedures are used to assess the suitability of neural networks for residential valuation. Each procedure uses sales data recorded by the Department of Environment and Natural Resources (DENR) in South Australia and accessed through the UPmarket sales retrieval system. DENR records details of all sales occurring in South Australia and makes these available in digital format. A wide range of data is available for each property including details of the sale, assessed values, legal descriptors, locational information and physical description in the case of residential properties. UPmarket allows for detailed filtering of sales including the exclusion of non-market transactions such as sales which involve related parties, government agencies, other land, other inclusions or are specifically tagged as non-market transactions. For this research these sale were excluded and then only the sales of detached dwellings were used. Variables used are summarised in Table 1.

Page 2

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

Table 1

Description of Variables used in Analysis

Variable Sale Date

Description Actual Sale Date Recorded as dd/mm/yy

Sale Price Suburb Improvements Land Area Zone Rooms Equivalent Building Area Condition Wall Type Roof Type Building Style Year of Construction

Conversion Used in Modeling Converted to number of months from sale date to date when model is to be applied. Price in Dollars recorded on the Transfer Used in current format Suburb the property is located in Converted to Dummy Variables where appropriate String of improvement descriptors Converted to Dummy Variables where appropriate Area in Hectares Used in current format Government Code for zone which determines Converted to Dummy Variables development controls where appropriate Number of Main rooms in the building Used in current format Calculated equivalent area of buildings based on Used in current format weighted average formula for main buildings and other buildings Scaled code from 1 - Demolition level to 9 - high quality Used in current format new condition. Categorical variable allowing for 9 roof cladding types. Converted to Dummy Variables where appropriate Categorical variable allowing for 48 building style. Converted to Dummy Variables where appropriate Categorical variable allowing for 9 wall cladding types. Converted to Dummy Variables where appropriate Date of construction of the main building. Dates varied Converted to Age of Building in from 1880 to 1996. Years.

The data have proved to be quite adequate in the past for valuation prediction, but there are some obvious shortcomings in the data. Most of the variables are quantitative or descriptive with a lack of qualitative measures. While the condition code is qualitative it is rather simplistic and cannot capture sufficient variation. Typically this is measured from an inspection of the exterior of the property. It does not encompass internal condition or quality. Environmental characteristics are not represented. While much of the environmental and locational variation may be eliminated by studying properties over a small geographic area (typically Adelaide suburb are small covering less than 2 square kilometres) large variation exists in environmental quality. Aspects such as street-scape, landscape, views and outlook may vary significantly. In some cases these are major contributors to price variation. The data has been modeled for detached houses for some 20 years using techniques such as multiple regression. Typically areas with homogeneous environmental characteristics would record estimates with mean absolute errors of five to ten percent while areas characterised by wide variations in these characteristics may be estimated with mean predictive errors as high as twenty five percent. Given past outcomes it was felt appropriate to assess the results of neural network analysis in several locations using regression as a comparison.

Assessment Procedure 1 The first procedure largely follows the works of various other authors such as Tay and Ho (1992), Do and Grudnitiski (1992), Worzala (1995), Borst (1995) and McCluskey (1996). The data set covered the Local Government Area of Payneham north east of Adelaide’s CBD. A total of 223 sales were used for training and 111 for testing. Results from the neural network analysis were compared to those of multiple regression analysis and the actual sale prices. In each case the sales data was used to produce a model which was then applied to the test properties. The test properties were not included in the original models. No sales were removed during the modeling process. The widely used practice of removing sales which are poorly predicted on the basis that they might be non-market transactions was avoided as this tends to over estimate the predictive ability of models. The results of these models were assessed by establishing the mean absolute error between the predicted values and the actual sales prices for each model. The errors for the neural network and MRA models were then split into price ranges to test if the errors vary at different price levels. ANOVA was used to test for differences in mean and variance. The duplication of methods would test if the Adelaide data produced similar results to those found elsewhere.

Page 3

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

While these studies are good for testing the general viability of the methods, the number of sales used and emphasis on overall result is most applicable to mass appraisal situation. The research was extended to consider more typical residential valuations.

Assessment Procedure 2 The second procedure is designed to more closely replicate the way in which a valuer might apply neural networks in a typical residential valuation situation. Six well distributed suburban areas were chosen, one in each of Adelaide’s northern, southern, eastern and western (inner to middle distance) suburbs, plus developing suburbs in Adelaide’s outer northern and southern areas. This selection gave variation in locational, environmental and housing characteristics with an expectation that the models would perform better in some locations. Once these areas were selected, multiple regression and neural network models were produced using residential sales from the proceeding 12 months. Probable non-market transactions were removed. No other sales were removed regardless of the ability of various models to explain price variation. These models were saved and then applied to the sales which occurred in the following month. This approach resulted in the model being applied to between five and seven sales in each location. While this number may seem low, this assessment procedure was to establish the percentage of properties which were likely to be evaluated at a reasonable level using a process more comparable to commercial valuation practice.

Assessment Procedure 3 To establish further the usefulness of the technique, another location was selected and data collected to test the stationarity of the models. The outer suburb of Salisbury, north of the city, was randomly selected for this procedure which involved producing eight models for both MRA and NN. These used progressively the last six months sales so that in each successive model, one months data was removed and data from the next month was added. By the end of the modeling process, completely new data were being used. These models were then applied to details of the three most recent sales. The estimates are graphed show the stationarity if the estimates. If the models are stable then estimates of values for properties will remain much the same over the period. This again more closely replicates the needs of the typical residential valuation. Certainly valuers would not expect the value to change significantly if a valuer was to value the same property in each of eight successive months given the last six months sales in each case. One problem which could be encountered with this procedure is that of changes to value due to a general increase or decrease in the market. If the market were in fact changing then it should appear as a trend in final estimates and the sales date variable should be significant.

Results Procedure 1 - Mass Appraisal Model - Payneham

Page 4

Figure 2 - Distribution of Training and Test Data 60% Training Data

50% Percentage of Sales

The data used consisted of 223 sales for training the models and 111 for testing. The distribution is shown as Figure 2. This shows a fairly typical distribution of house prices roughly normally distributed but skewed to the right. The distribution of training and test data are quite similar. A multiple regression model was established using ordinary least squares with no data transformations other than those mentioned earlier. The model used 10 independent variables, each significant at greater than 98%. Several neural network models were produced. These had either all variables included or only those found significant in the MRA. Similarly various different neural network structures were used. Many of the results were similar, with the models using the smaller number of variables usually superior. The simplest topology, using three layers incorporating 10 input neurodes and 10 hidden neurodes with the ten variables used in regression, proved to be at least as good as most others and was chosen for use in comparison. The test data applied to the models yielding the results in Table 2.

Testing Data 40% 30% 20% 10% 0% 75000

175000

275000

375000

Price (Mid Point)

475000

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

The comparison indicates that MRA produced better results on most counts. Both MRA Neural Net models fall within the expected error range. A mean percentage error of ten to twenty Mean Absolute Error (MAE) 12.75% 19.97% percent was expected due to general market Correlation (with Actual) 0.86 0.69 imperfections and the lack of suitable Minimum Error 0% 1% environmental and internal condition Maximum Error 68% 77% variables. The mean absolute error (MAE) Lowest 3rd of Prices MAE 14.54% 10.25% for regression techniques suggests it is Middle 3rd of Prices MAE 9.80% 20.36% clearly superior in this case. This is supported by the correlation between the Highest 3rd of Prices MAE 13.34% 29.29% estimated price for each model and the actual Lowest 3rd of Prices Variance 0.016 0.004 observed values. A significant finding Middle 3rd of Prices Variance 0.006 0.030 comes from the stratified results. By Highest 3rd of Prices Variance 0.017 0.025 splitting the test data into three equal parts, it was possible to see the variation in the predictive power of the models over the sample space. The analysis shows that for MRA the MAE of 12.75% is approximated in each of the three groups. Analysis of variance shows that these means do not statistically differ and the same is so for are the variances. Similar analysis for the outcomes of the neural network show that the error increases with the sale price of properties with the lowest priced group having an MAE of only 10.25% while the highest priced properties blow-out to 29.29%. Analysis of variance including the use of Hartley’s Test shows that both the means and variances are statistically different. Generally speaking, the neural network produced better results for low priced properties while MRA out performed NN on the medium and high priced properties. This is highlighted in Figure 3 which indicates that the MRA model is generally superior and that, while both models suffer from heteroscedasticity, this is more noticeable with the NN model. For each model the relationship between the actual and estimated price should approach a ratio relationship with a beta of one. The MRA model comes close with an intercept of 749 and a beta of .9963. The neural network model shows some serious systematic error with an intercept of 49,563 and a beta of .7864. Table 2 - Comparison of Models - Procedure 1

Figure 3 - Neural Network and Multiple Regression Estimates Against Price 400000 350000 NN

Actual Price

300000

MRA

250000 200000 150000 100000 50000 0 0

50000

100000

150000

200000

250000

300000

350000

Estimated Price

Procedure 2 - Valuations based on small data sets in Six Suburbs Results from the first procedure were used as the basis for the creation of models for the second procedure. Because of the large number of models which would be produced by experimenting with different variables and typologies, it was decided to follow the practice of establishing an MRA model first then using the same variables in the NN model with the same number of neurodes in a single hidden layer. The decision to use the sales from the preceding twelve months for each suburb led to variable training sample sizes. The smallest training set was for Tranmere with 40 sales, the largest being Golden Grove with 119 sales. The results of the application of the test data to the trained models is shown in Table 3. Results - Procedure 2 Table 3 Suburb

Price

NN

MRA

Page 5

MAE NN

MAE MRA

Best

MAE NN MAE MRA

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

Broadview 1 Broadview 2

$185,000 $128,000

Estimate $213,968 $99,481

Estimate $195,097 $135,254

(Valuation) 16% 22%

(Valuation) 5% 6%

MRA MRA

Broadview 3

$129,000

$95,044

Broadview 4

$99,000

$84,863

Broadview 5

$180,000

Broadview 6

$85,000

Broadview 7

(Suburb) Suburb)

$101,193

26%

22%

MRA

$112,238

14%

13%

MRA

$156,817

$158,269

13%

12%

MRA

$92,530

$96,834

9%

14%

NN

$220,000

$214,463

$266,741

3%

21%

NN

Flinders Park 1 Flinders Park 2

$97,500 $107,000

$115,562 $114,036

$98,453 $109,009

19% 7%

1% 2%

MRA MRA

Flinders Park 3

$107,500

$93,835

$92,405

13%

14%

NN

Flinders Park 4

$139,500

$114,204

$101,646

18%

27%

NN

Flinders Park 5

$125,150

$112,330

$103,758

10%

17%

NN

Golden Grove 1 Golden Grove 2

$158,000 $205,000

$119,588 $170,337

$133,981 $177,855

24% 17%

15% 13%

MRA MRA

Golden Grove 3

$180,000

$205,158

$189,569

14%

5%

MRA

Golden Grove 4

$139,000

$144,120

$150,069

4%

8%

NN

Golden Grove 5

$105,000

$105,109

$95,100

0%

9%

NN

Golden Grove 6

$110,000

$105,345

$97,235

4%

12%

NN

Golden Grove 7

$113,000

$106,929

$106,849

5%

5%

NN

Marion 1 Marion 2

$93,500 $117,000

$109,046 $125,929

$96,502 $124,453

17% 8%

3% 6%

MRA MRA

Marion 3

$120,000

$125,454

$120,266

5%

0%

MRA

Marion 4

$127,000

$132,508

$137,247

4%

8%

NN

Marion 5

$134,000

$132,459

$128,958

1%

4%

NN

Marion 6

$166,000

$165,110

$145,606

1%

12%

NN

Marion 7

$172,500

$169,646

$176,662

2%

2%

NN

Seaford Rise 1 Seaford Rise 2

$124,000 $87,000

$83,919 $97,815

$105,521 $82,625

32% 12%

15% 5%

MRA MRA

Seaford Rise 3

$83,000

$68,881

$83,368

17%

0%

MRA

Seaford Rise 4

$68,000

$112,822

$66,493

66%

2%

MRA

Seaford Rise 5

$90,000

$98,209

$103,392

9%

15%

NN

Tranmere 1 Tranmere 2

$156,000 $80,000

$101,146 $153,946

$161,523 $90,185

35% 92%

4% 13%

MRA MRA

Tranmere 3

$101,000

$111,565

$142,960

10%

42%

NN

Tranmere 4

$85,000

$85,564

$90,844

1%

7%

NN

Tranmere 5

$150,000

$149,673

$170,833

0%

14%

NN

Table 4 - Comparison of Results Procedure 2 NN MRA Mean Absolute Error 15.27% 10.42% Std Dev Errors 18.36% 8.41% Correlation (with Actual) 0.78 0.90 Minimum Error 0% 0% Maximum Error 92% 42% Number of Times 17 19 Best Estimate

Page 6

15%

13%

13%

12%

10%

10%

5%

5%

27%

7%

28%

16%

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

The table shows that for each suburb MRA produces the lowest or equal lowest MAE. At a suburb level NN is never superior but many NN estimates are superior to those of MRA. Examination of the aggregate results in table 4 supports this. While the overall MAE is superior for MRA, the number of times MRA produces the best estimate approximates the number for NN. A closer inspection of the results suggests that while regression produces a consistently good result, the NN produces excellent results quite often. Generally the NN produces results which are either very good or very poor while MRA tends to a mediocre result more often. This is highlighted in Figure 3. The NN produces more estimates with less than 5% error but when the results are evaluated at greater than 8% error MRA produces more estimates under the required rate. Eighty three percent of properties are estimated with less than 15% error using MRA while only sixty four percent meet this standard using NN.

Percentage of Properties

Figure 4 - Percentage of Valuations with Error Levels 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Less Than 5% Error

NN MRA

Less Than 10% Error

Less Than 15% Error

Less Than 20% Error

Less Than 25% Error

Less Than 30% Error

Procedure 3 - Stationarity Test This procedure was designed to test stationarity of the models through logical successive sample selection. In this test eight MRA and NN models were required. To simplify the approach only four variables were used in each model. These proved to be highly significant in each successive MRA model. The three test properties were then applied to the models. The results of these tests are shown in Table 5 and shown graphically in Figure 4. The results are remarkably stable especially for the lower priced properties. This is true for both the MRA and NN models. The models tend to consistently over or underestimate the actual price. Neural networks produce consistently good estimates for properties 1 and 3 while MRA produces better estimates for property 2.

Page 7

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

Table 5 Table of Estimates for Three Properties using Successive Sales Sampling Using Sales from Months 1 - 6

Property 1 Sold for $80000 Property 2 Sold for $75,000 Property 3 Sold for $120,000 MRA ANN MRA ANN MRA ANN $78,752 $81,327 $82,791 $84,607 $96,861 $109,385

Months 2 - 7

$75,320

$81,073

$79,280

$84,849

$99,183

$123,419

Months 3 - 8

$76,914

$80,537

$81,840

$85,118

$96,825

$118,152

Months 4 - 9

$76,611

$80,847

$81,479

$84,133

$96,637

$115,943

Months 5 - 10

$76,649

$82,973

$81,495

$87,771

$98,905

$130,077

Months 6 - 11

$77,880

$85,354

$83,788

$87,290

$104,723

$109,956

Months 7 - 12

$78,103

$84,492

$83,879

$87,398

$101,244

$104,983

Months 8 - 13

$80,293

$84,230

$86,629

$87,152

$92,813

$104,885

Figure 4 - Price Estimate Stability Tests

MRA - Prop 1 - Sale price $80,000 ANN - Prop 1 - Sale price $80,000 MRA - Prop 2 - Sale price $75,000

$140,000

ANN - Prop 2 - Sale price $75,000 MRA - Prop 3 - Sale price $120,000 ANN - Prop 3 - Sale price $120,000

$130,000

Estimated Price

$120,000

$110,000

$100,000

$90,000

$80,000

$70,000

$60,000 Months 1 - 6

Months 2-7

Months 3 - 8

Months 4 - 9

Months 5 - 10

Sales used in Analysis

Page 8

Months 6 - 11

Months 7 - 12

Months 8 - 13

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

General Findings James (1996) points out the advantages of neural networks in terms of small data sets. This is also an observation from this research. Neural networks would seem to be a better tool for smaller data sets while regression is clearly superior for larger data sets. Regression is statistically poor with small data sets, a problem not encountered by neural networks. Regression results can be calculated very quickly regardless of the size of the problem while the time needed to produce neural networks seems to increase exponentially with the size of the data set. The models established using the smaller data sets were generally achieved in less than one minute. However the neural network established for the largest problem (223 cases and 42 variables) did not reach a suitable global minimum after 23 hours, when an unfortunate power failure left the model stranded. Clearly a suitable model must be achieved in minutes to be commercially viable. This also supports the use of neural networks for small data sets. Generally speaking the results for MRA are consistently good while NN produces some excellent estimates but some which are equally poor. Perhaps it is safe to conclude that in this research MRA produced less “risky” results evidenced by the lower mean and variance of the absolute errors. While this conclusion is valid on a pragmatic test basis, it does not reflect other aspects of the test procedures. MRA has been applied successfully to residential property data for many years. Neural networks are relatively new. This disparity in the experience levels may explain much of the variation. Another important factor in the future application of neural networks are improvements in software. The approach is notoriously “black-box” in nature. Few tools exist to establish the appropriate network typology or the influence of variables in the final model. This however is changing rapidly. Tay and Ho (1994) used a method to approximate the influence of the variables in the final model. Borst (1995) suggests that significant further progress has been made through the graphical outputs of modern software. Similarly there has been advances in the technology used to select the appropriate topology. Software used in this paper required the user to select a typology but McCluskey (1996,44) allows ANN “to work in dynamic mode i.e. allowing it the freedom to determine its own typology”. New tests need to be established to assist with the question of when to stop the training. In practice it is very difficult to decide when to stop training the network. If the network is under trained there is likely to be large errors with both the training and test data. Over training will led to good results for the training data but large errors in the test applications. Another issue is the variations in the model which can result from using different software or from repeated trials using the same software. This was highlighted in the work by Worzala (1995). One clear advantage of multiple regression over neural networks is the ability to statistically test the model and the ability to establish a confidence interval for predictions. Users of MRA are able to use the this confidence interval in conjunction with other data to make more reliable estimates. Many of these issues are summarised by James (1996) when he states “In practice very careful methodologies should be followed to ensure that the neural networks are proper estimators of the population, and that they have adequately identified the major patterns in the data. The back-propagation algorithm is notorious for stopping at local rather than global patterns in a data set.”

Conclusions This paper seeks to assess the application of artificial neural networks and multiple regression to residential valuation. The assessment would support the use of MRA ahead of NN but this is not completely conclusive. While this is the case in these tests, the results of researches such as Do and Grudnitski (1995) suggest that NN is a superior technique. It is certainly much to early to dismiss neural networks as a suitable valuation tool. Early users of regression were highly skeptical. Advancements in computer technology and improvements to regression techniques and statistics have made it a more viable proposition as an analytical tool. Similar improvements for neural networks may establish it as the preeminent tool for property valuation.

Page 9

Artificial Neural Networks versus Multiple Regression in the Valuation of Residential Property - Rossini

References Borst, R.A and McCluskey (1996) The Role of Artificial Neural Networks in the Mass Appraisal of Real Estate, paper presented to the Third European Real Estate Society Conference, Belfast, June 26-28 Borst, R.A. (1991) Artificial Neural Networks: The Next Modelling/Calibration Technlogy for the Assessment Community? Property Tax Journal, IAAO, 10(1):69-94 Borst, R.A. (1995) Artificial neural networks in mass appraisal, Journal of Property Tax Assessment & Administration, 1(2):5-15 Do, A.Q. and Grudnitiski, G. (1992), A Neural Network Approach to Residential Property Appraisal, The Real Estate Appraiser, Dec 1992:38-45 Evans, A. James,H. And Collins, A. (1993), Artificial Neural Networks: an Application to Residential Valuation in the UK, Journal of Property Valuation & Investment: 11:195-204 James, H. And Lam, E, (1996) The Reliability of Artificial Neural Networks for Property Data Analysis, paper presented to the Third European Real Estate Society Conference, Belfast, June 26-28 McCluskey, W.(1996a) Predictive Accuracy of Machine Learning Models for Mass Appraisal of Residential Property, New Zealand Valuer’s Journal, July:41-47 McCluskey, W., Dyson, K., McFall, D. & Anand,S. (1996b) Mass Appraisal for Property Taxation: An Artificial Intelligence Approach, Land Economics Review, Vol 2, No 1, 25-32 O’Roarty, B., Adair, A., McGreal, S. And Patterson, D. (1996) Computer Assisted Techniques and the Determination of Retail Rents, paper presented to the Third European Real Estate Society Conference, Belfast, June 26-28 Tay, D.P.H. and Ho, D.K.K. (1992), Artificial Inteligence and the Mass Appraisal of Residential Apartment, Journal of Property Valuation & Investment, 10:525-540 Tay, D.P.H. and Ho, D.K.K. (1994), Intelligent Mass Appraisal, Journal of Property Tas Assessment & Administration, Vol 1, No 1, 5-25 Worzala, E., Lenk, M. And Silva, (1995) A. An Exploration of Neural Networks and Its Application to Real Estate Valuation. The Journal of Real Estate Research, Vol. 10 No. 2

Peter Rossini, Lecturer - University of South Australia School of Economics, Finance and Property North Terrace, Adelaide, Australia, 5000 Phone (61-8) 83022649 Fax (61-8) 83021512 Mobile 041 210 5583 E-mail [email protected]

Page 10