Jun 6, 2016 - For example, if we look into the Indian banking system, there may be .... Bank (13.61%), Union Bank of India (11.43%) and State Bank of ...
International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 5, Issue 6 June 2016
Selection of Input-Output Variables in Data Envelopment Analysis - Indian Commercial Banks Subramanyam T Department of Statistics, Christ University, Bangalore ABSTRACT Data Envelopment analysis is a nonparametric method used to evaluate the performance of profit and non-profit organizations. It assumes that the input and output variables are known a priori. In each environment, there exists a huge number of input and output variables, and all the variables will serve as either input or output variable. If there exist a large number of input and output variables, the discriminatory power of the DEA will reduce. To overcome this difficulty one may need to reduce the input and output variables using appropriate scientific methods. This study proposed a new stepwise method to reduce the data set with the help of non-parametric tests. The outputs are impressive, and this proposed method is approximately suitable in reducing the insignificant input and output variables and also tried to minimize the wastage of the input variables about 2% for the selected study. Keywords: DEA, Environmental Risk, Non-Performing Assets, Commercial Bank
1. Introduction Productivity and efficiency are the two relative key terms to define the working environment of an organization or a unit. To measure the relative efficiency of an organizational unit the most popular nonparametric method in literature is the Data Envelopment Analysis (DEA), which was originated by Charnes et al. (1978). DEA is an optimization method uses linear programming techniques to evaluate the relative efficiency of organizational units where multiple inputs and outputs make comparison difficult. In DEA, the unit which is to be evaluated is known as Decision Making Unit (DMU). The focal idea behind the DEA technology is that the efficiency of each DMU is evaluated comparing to the other DMUs by assigning the favorable weights to the corresponding input and output variables. DEA forms a production possibility set with the available number of input and output variables and measures the efficiency of each DMU. The efficiency score of all the DMUs lies between 0 and 1. If any DMU assigns efficiency score 1, we will call it as efficient otherwise an inefficient DMU. The discriminating power of DEA will depend on the number of DMUs and the available number of inputs and outputs respectively. DEA doesn't explain how to identify the relevant input and output variables in the data analysis. The efficiency score of a DMU will depend on the included input and output variables in the data exploration. If the number of input and output variables is more, the dimensionality of the production space will increase and proportionally the discriminatory power of DEA will decrease. The greatest challenge in DEA is to identify the parsimonious model. The major drawback in DEA analysis is it assumes the input and output variables are predefined for evaluation. The efficiency of a particular Decision Making Unit (DMU) depends based on the selected input and output variables. For example, if we look into the Indian banking system, there may be enormous literature on banking efficiency but there is no general agreement on the choice of input and output variables from the available data set (T.Subramanyam et. al, 2008, 2014). The efficiency of a DMU based on the availability of the data. In regression analysis, there may be many well-established methods to identify the significant variables. In DEA, there is no such type of methods to decide the important variables. The available literature on the selection of variables is insignificant. Recently some of the researchers focused on this to strengthen the parsimonious or statistics based methods (Larry Jenkins et.al, 2013, Hiroshi Morita et.al, 2009, Niranjan et.al, 2011).
51
Subramanyam T
International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 5, Issue 6 June 2016
Some of the authors argued that the number of input and output variables must be not more than one-third of the DMUs (Friedman and Sinuany-Stern, 1998). This paper raises a simple statistical method, but based on robust statistical techniques such as parametric and non-parametric methods. The researchers working on DEA with little mathematical knowledge are also able to understand this approach. 1.2 Objectives of the Study: To identify the insignificant input and output variables To test whether there is any significant differences between the full and reduced model efficiency scores To reduce the input losses using reduced model 2. Review on variable selection methods DEA itself doesn't provide any guidance in selection of input and output variables and this selection left to the user's direction. But, there may be numerous studies on selection of the variables in DEA. Some authors argued that correlation analysis, regression analysis and Principle Component Analysis (PCA) are useful in selecting input and output variables. Larry Jenkins et.al, (2003) presented a multivariate statistical approach to identify the significant variables with least loss of information. This paper discussed the correlation among the input and output variables. To include the input and output variable correlation analysis utilized as a tool and if the input and output variables have more correlation that variables are included in the model. But, to identify which variables are appropriate to include in the model the variance of the variables considered into the evaluation. JM Wagner et.al, (2007) presented a method using the stepwise selection of input and output variables. This paper focused on the least difference of the average efficiency scores. But, there is no cut point to stop the variables to be included or excluded from the data exploration. Hiroshi Morita et.al, (2009) discussed a method call design of experiments. They have selected output variables using the 2-level fractional factorial designs. The test statistic used to identify the distance between two variables is Welch statistic. Niranjan et.al, (2011) outlined variable selection techniques in DEA. This paper discussed four mostly used methods to variable specification. These are ECM, principle component analysis, regression based test and test based on bootstrapping methods. The ECM is the best method for low correlation of variables. 3. Basic CCR and BCC DEA Models: 3.1 CCR Model: Charnes, Cooper and Rhodes (1978) proposed a linear programming technique to measure the relative efficiency of decision making units (DMUs) in a competitive environment when multiple inputs are comprised to produce the multiple outputs.
Suppose we have n-decision making units (DMUs) with m-inputs and s-outputs. The DMUj j=1,2,,…,n is to be evaluated under investigation with input and output vectors Xj=(x1j, x2j, …, xmj) and Yj=(y1j, y2j, …, ysj) respectively. Where Xj > 0 and Yj > 0. The CCR model to evaluate the relative efficiency of DMUk is under constant returns to scale is given by: θ CCR Min λ :
52
Subramanyam T
n
j1
λ j x ij λx i0 ;
n
λ u j
j1
rj
u r0 ; λ j 0, i 1,2, , m; r 1,2, , s
International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 5, Issue 6 June 2016
3.2 BCC Model: Banker, Charnes, and Cooper (1984) corrected the scale differences by introducing an additional constraint into the CCR model. The BCC model to evaluate the relative efficiency of DMUs under variable returns to scale is given by: BCC Min λ :
n
j1
λ j x ij λx i0 ;
n
λ u j
j1
rj
u r0 ;
λ
j
1, λ j 0; i 1,2, , m; r 1,2, , s
This BCC model is known as the envelopment problem since the production possibility set envelops all the observations tightly and hence the name Data Envelopment Analysis. The Scale efficiency is calculated by CCR using the ratio . BCC 4 Reduction of Input/output set in DEA The general procedure to evaluate the efficiency of DMUs is to identify the suitable input and output variables a prior. Since, the input and output variables differ from one researcher to another the efficiency of a DMU will also change from one researcher to another researcher, and there is no general agreement on the efficiency of DMUs. To reduce this discrepancy this paper proposed a general method to select the significant input and output variables from the available data set. The earlier literature has shown that in general there is a high correlation between the input and output variables and the correlation and regression analysis no more suitable to reduce or identify the significant variables from the data set. The following stepwise procedure is useful to identify the significant variables or reduce the variables in the data set. 4.1 Stepwise Procedure Assume that we have n-decision making units with ‘m’ (i=1, 2…, m) input variables and ‘s’ (j=1, 2, …,s) output variables from the available data set. To reduce the number of input and output variables we proposed the following stepwise procedure. Step1: Run the full model with constant return to scale and store in ‘E’. Step2: Drop one variable at a time and run the DEA model. Store the efficiency values in the set Eij, i=1,2,…,I , j=1,2,…,O
Step3: Use nonparametric test for testing the significance of the dropped variables. Observe the percentage change in significance value. Step4: If the percentage change in significance value is greater than 20% retain the variable otherwise exclude the variable from the data exploration. Step5: If more than one variable has the significant value less than 20%, use mean efficiency change and remove the variable with least significant change. Step6: Run the DEA model with new set of variables and repeat the steps 1-5. Step7: Repeat the procedure until all variables percentage change in significance value is greater than 20%. At least one input and output variable is required to run the final model. 5 Empirical Study: Indian Commercial Banks Indian banking system is a secure and stable industry comparing to most of the developing countries. In India, the banks were working in a heterogeneous environment whose management policies, importance to urban and rural areas are extremely different among the managements. All the researchers assumed that the banks were working under the same frontier to evaluate their efficiency (T. Subramanyam et.al, 2008, 2012). If a bank is in an efficient environment, one may think to prefer that bank comparing to other banks.
53
Subramanyam T
International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 5, Issue 6 June 2016
To calculate the efficiency of a commercial bank using DEA models first we need to identify the possible number of input and output variables from the available data set. In this study, we select the following variables as the input and output variables. Input Variables: 1. Number of employees 2. Fixed assets Output Variables: 1. Deposits 2. Investments 3. Advances 4. Interest income 5. Other income 6. Results and analysis 6.1 Stepwise Method The present data relating to 26 public sector banks is working under government sector environment. This study attempted to reduce the possible number of output variables from the data exploration. From Table1, we can observe that there is a high correlation among the input and output variables. There exists high correlation among all the variables. In general, DEA deals with the highly correlated variables in nature. In the given data, there exist strong correlation among the output variables, but technically they are not correlated in nature. All the variables are independent in nature and are all the products of the banks. The biggest problem with correlation analysis is that which variables can omit with least loss of information. The proposed stepwise method applied to identify the insignificant variable from the available data set. The input-oriented DEA model used to calculate the efficiency of commercial banks. First, the input-oriented DEA model performed with 2-inputs and 5-output variables. By dropping one variable at a time we got the significant differences. In stage1 (from table2) the variable ‘Investment’ has the lease significant difference and the corresponding change in significant level is 0%. This variable doesn’t have any significant impact on the overall efficiency and among all the variables this variable is an appropriate one to exclude from the data exploration. In stage2 (from table3) with 2-input and 4-output variables, the overall efficiency score seems to be 0.7357. Dropping each of the output variables separately results in changes in average efficiency scores that seem fairly substantial. The variable ‘Interest Income' has the least significant difference with 0% change in significant level. Of course, the appropriate variable to eliminate at this stage is ‘Interest Income'. In stage3 (from table4) with 2-input and 3-output variables the overall efficiency score is 0.7313. By dropping each of the output variables, we observe that all the variables have minimum 1% change in mean efficiency score. The change in significant level is approximately more than 20% in all the variables. Based on the contribution of the changes in mean efficiency and significant level all the three variables namely, Other Income, Advances, and Deposits were selected as significant output variables for the efficiency evaluation. 6.2 Statistical significance The significance change at 5% level compared using Wilcoxon Matched Pair signed rank test. This test examined the significant of full and reduced models in each stage. The CCR and Scale efficiency changes are not statistically significant, and there is some significant effect in BCC scores. CCR BCC Scale Full Model 0.74 0.87 0.85 Reduced Model 0.73 0.85 0.87 Sig. 0.068 0.012* 0.093
54
Subramanyam T
International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 5, Issue 6 June 2016
In CCR environment, the major changes occur in Central Bank of India (1.42%), State Bank of Hyderabad (5.08%) and State Bank of Travancore (3.99%) respectively. In BCC environment, huge changes occur in some of the bank efficiencies namely, Canara Bank(17.35%), Central Bank of India(8.10%), Punjab National Bank (13.61%), Union Bank of India (11.43%) and State Bank of Hyderabad (1.28%). There is no change in the number of efficient DMUs before and after the stepwise method. The major change in scale efficiency scores occurs in Canara Bank (15.25%), Centra Bank of India (11.93%), Punjab National Bank (13.18%) and Union Bank of India (13.22%) respectively.
The comparison between full and reduced model reveals that there is 1% change in the average efficiency scores in CCR environment and 2% average efficiency change in BCC environment. The reduced model tried to minimize the wastage of the input about 2% comparing to the full model. 7. Conclusions: In this paper, a stepwise method was developed to reduce the input waste by reducing the insignificant output variables from the data. This study depends on the mean efficiency change and significant changes. This method is useful for researchers with little statistical knowledge. The Wilcoxon Signed Rank test used to compare the changes are statistically significant or not between full and reduced models. The CCR model utilized as the base for performing the stepwise method. This study fixed two input variables namely, the number of employees and fixed assets as input variables and tried to reduce the output set; this leads to minimize the input losses. In CCR environment, there is no significant change between full and reduced model, but there is a statistically significant difference between the full and reduced models in BCC environment. The reduced model minimized on an average 2% input loss in each of the DMU. This paper tried to develop a new stepwise method with the help of the existing literature in DEA model. This model will provide some confidence among the researchers in reducing the insignificant in input and output variables. 8. 1. 2. 3. 4.
References: Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management science, 30(9), 1078-1092. Charnes. A., Cooper. W.W., and Rhodes, E. 1978, Measuring the efficiency of decision making units, European Journal of Operational Research 2(2): 429-444. Friedman, L., & Sinuany-Stern, Z. (1998). Combining ranking scales and selecting variables in the DEA context: The case of industrial branches. Computers & Operations Research, 25(9), 781-791. Jenkins, L., & Anderson, M. (2003). A multivariate statistical approach to reducing the number of variables in data envelopment analysis. European Journal of Operational Research, 147(1), 51-61.
55
Subramanyam T
International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 5, Issue 6 June 2016 5.
Morita, H., & Avkiran, N. K. (2009). SELECTING INPUTS AND OUTPUTS IN DATA ENVELOPMENT ANALYSIS BY DESIGNING STATISTICAL EXPERIMENTS (< Special Issue> Operations Research for Performance Evaluation). Journal of the Operations Research Society of Japan, 52(2), 163-173. 6. Nataraja, N. R., & Johnson, A. L. (2011). Guidelines for using variable selection techniques in data envelopment analysis. European Journal of Operational Research, 215(3), 662-669. 7. Subramanyam, T., & Reddy, C. S. (2008). Measuring the risk efficiency in Indian commercial banking-a DEA approach. East-West Journal of Economics and Business, 11(1-2), 76-105. 8. Reddy, C. S., & Subramanyam, T. (2011). Data Envelopment Analysis Models to Measure Risk Efficiency: Indian Commercial Banks. IUP Journal of Applied Economics, 10(4), 40. 9. Subramanyam, T. (2013). TECHNICAL AND RISK EFFICIENCY EVALUATION OF INDIAN COMMERCIAL BANKS USING DEA MODELS. International Journal of Information, Business and Management, 5(3), 105. 10. Wagner, J. M., & Shimshak, D. G. (2007). Stepwise selection of variables in data envelopment analysis: Procedures and managerial perspectives. European journal of operational research, 180(1), 57-67.
Appendix Table1: Correlation: Employees Employees
Fixed Assets
Deposits
Investments
Advances
Interest Income
Other Income
1
Fixed Assets
0.8794
1
Deposits
0.9631
0.9171
1
Investments
0.9693
0.9391
0.9828
1
Advances
0.9788
0.9139
0.9959
0.9861
1
Interest Income
0.9831
0.9246
0.9896
0.9948
0.9957
1
Other Income
0.9860
0.9006
0.9810
0.9849
0.9918
0.9942
1
Table2:Stage1
Overall Mean Efficiency (2inputs, 5-outputs)
Efficient Banks
Mean Efficiency
3
0.7357
Change in mean efficiency
Sig.
Change in Sig. level (%)
Variables Dropped
56
Other Income
3
0.7183
0.0174
0.998
40
Interest Income
3
0.7313
0.0044
1.00
0
Advances
3
0.7286
0.0071
1.00
0
Investments
3
0.7357
0.0000
1.00
0
Deposits
3
0.7277
0.0080
1.00
0
Subramanyam T
International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 5, Issue 6 June 2016
Table3: Stage2 Change in mean efficiency
Sig.
Change in Sig. level (%)
0.7158
0.0199
0.988
1.2
3
0.7313
0.0044
1.00
0
Advances
3
0.7286
0.0071
1.00
0
Deposits
3
0.7257
0.0100
0.999
0.1
Efficient Banks
Mean Efficiency
Change in mean efficiency
Sig.
Change in Sig. level (%)
3
0.7313
Other Income
3
0.7107
0.0206
0.96
4.0
Advances
3
0.7188
0.0125
0.99
1.0
Deposits
3
0.7197
0.0116
0.99
1.0
Efficient Banks
Mean Efficiency
3
0.7357
Other Income
3
Interest Income
Step1: Overall Mean Efficiency (2I, 4O) Variables Dropped
Table4:Stage3
Step2: Overall Mean Efficiency (2I, 3O) Variables Dropped
57
Subramanyam T