11th International Conference on Data Envelopment Analysis, Samsun, Turkey, 2013
Implementing DEA models in the R program José Francisco Moreira Pessanha Rio de Janeiro State University,
[email protected], (corresponding Author) Alexandre Marinho Rio de Janeiro State University,
[email protected] Luiz da Costa Laurencel Rio de Janeiro State University,
[email protected] Marcelo Rubens dos Santos do Amaral Rio de Janeiro State University,
[email protected]
Abstract This paper aims to present an implementation of classical DEA models in R, a free software and open source, highly extensible that offers a variety of functions and graphical routines for data analysis. In this work we show both the CRS and VRS DEA models. The computational implementation is illustrated with real data from the Brazilian electric power distribution utilities. Keywords: Data Envelopment Analysis, classical models, R programming language Introduction Introduced by Charnes, Cooper and Rhodes in 1978, Data Envelopment Analysis (DEA) is an important branch of the operations research, as well as of economics, as evidenced by numerous publications with practical applications and theoretical developments on little more than three decades (EMROUZNEJAD et al, 2008, COOK & SEIFORD, 2009). In summary, DEA can be described as a nonparametric technique based on linear programming to evaluate the efficiency of organizations working in the same industry. This paper presents a brief introduction about the implementation of classical DEA models in the R programming language. The models implemented include the CRS (Constant Returns to Scale) and the VRS (Variable Returns to Scale), both in the multiplier form and input oriented. The computational implementation is illustrated by the efficiency evaluation of the 18 biggest Brazilian electric power distribution utilities. There are several software tools available for DEA (BARR, 2004); however, the possibility of implementing DEA models in other programming languages provides great flexibility in the application of the DEA methodology. The advent of the R project (R DEVELOPMENT CORE TEAM, 2013), a free software and open source, highly extensible, offers a variety of functions and graphical routines (packages) for data analysis. For example, the Frontier Efficiency Analysis with R - FEAR (WILSON, 2008) and Benchmarking (BOGETOFT & OTTO, 2011) are two R packages dedicated to DEA. However, R is more than a library of packages; it allows analysts to build their own programs.
1
11th International Conference on Data Envelopment Analysis, Samsun, Turkey, 2013
Classical DEA models DEA is a widely used technique for evaluating the efficiency of a set with N peer entities called decision making units (DMU) which convert multiples inputs into multiples outputs. In the general case, a DMU uses multiples inputs X=(x1,...,xs) to produce multiples outputs Y=(y1,...,ym) and its efficiency score is defined by the following quotient: efficiency u1 y1 um ym v1 x1 vs xs U Y V X
(1)
where V=(v1,...,vs) and U=(u1,...,um) denote the weights assigned to the inputs and outputs quantities respectively. Charnes, Cooper and Rhodes (1978) suggest that the vectors U and V must be determined by the linear programming problem (LPP) (2) at Table 1, called CCR or CRS (Constant Returns to Scale) input oriented in the multiplier form. Table 1. DEA/CRS input oriented Multiplier form
Envelopment form
m
efficiency Max ui yi , j0 u ,v
efficiency Min
(2)
,
i 1
s.t.
s.t. s
m
i 1
i 1
N
X j j X j
vi xij ui yij 0 j 1,..., j0 ,..., N s
v x i 1
(3)
i i , j0
0
j 1
N
Y j0 j Y j
1
j 1
j 0j 1,..., j0 ,..., N
ui 0 i 1,m vi 0 i 1,s
The evaluated DMU (DMUj0) is efficient if the objective function is equal to one and all weights are positive at the optimal solution. Otherwise, the DMU is inefficient. Under the resources conservation approach (input orientation), the measure of technical efficiency (0 1) of a DMU is defined as the maximum radial contraction of the input vector X that can produce the same amount of products Y: efficiency = Min { | (X,Y) production possibilities set T(X,Y) }
(4)
Using the duality theory in linear programming (COOPER et al, 2002), one can derive an equivalent model known as DEA model in the envelopment form under input orientation whose mathematical formulation corresponds to the model (3) at Table 1. In this case, the DMU evaluated is efficient if and only if =1. Otherwise, the DMU is inefficient. It should be emphasized that the LPP (2) or (3) must be solved for each DMU in order to compute its efficiency score. Later, Banker, Charnes and Cooper (1984) added the constraint 1+…+N=1 in the envelopment form of the CRS model (3). The result is a DEA model called BCC or VRS (Variable Returns to Scale). The VRS model in the multiplier form and input oriented is illustrated below (5), where the unconstrained variable u0 corresponds to the constraint 1+…+N=1 in the dual model.
2
11th International Conference on Data Envelopment Analysis, Samsun, Turkey, 2013 m
efficiency Max ui yi , j0 u0 u ,v
(5)
i 1
s.t. s
m
i 1
i 1
vi xij ui yij u0 0 j 1,..., j0 ,..., N s
v x i 1
i i , j0
1
ui 0 i 1,m
vi 0 i 1,s
An R code for DEA The R code can organized in three parts: loading input data, processing and output reporting. In order to illustrate the R code for DEA model, consider the dataset with the 18 biggest Brazilian distribution utilities for the year 2009. Each utility is characterized by four variables: the annual operating expenditures in R$ (OPEX), the total length of the distribution network in kilometer (NETWORK), the total electricity consumption (MWH) and the number of consumers supplied by the utility (CONSUMERS). The main outputs of the distribution utilities are the amount of distributed energy and the number of consumers. In addition, the operating expenses are also influenced by non controllable factors, for example, the geographical dispersion of consumers. In order to address this issue, the size of the distribution network can be included as an additional output variable. The outputs variables are the drivers of operating expenditures. For a given level of output, the utility should operate at the lowest cost. Thus, in order to obtain an efficiency score that quantifies the potential reduction of the operating expenditures, we propose an input-oriented DEA model wherein the OPEX is the unique input variable and the outputs are those aforementioned. Consider that the data are stored in a MS Excel file called data.xls at directory c:\example. The data importing can be done by the following commands (commentaries after #): require(lpSolve) # load lpSolve package previously installed require(XLConnect) # load XLConnect package previously installed setwd('c:/example') # set work directory wb