Robust CSR Investment Screening. Filip Van den Bossche ... - Lirias

4 downloads 78 Views 980KB Size Report
2 Composite CSR indices for ethical investment screening: .... Based on the limited information at hand, which of the scenarios should be used for an investment ...
Robust CSR Investment Screening.

Filip Van den Bossche, Nicky Rogge, Kurt Devooght & Tom Van Puyenbroeck

HUB RESEARCH PAPER 2009/05 FEBRUARI 2009

1

Robust CSR Investment Screening

Filip Van den Bossche * , Nicky Rogge, Kurt Devooght & Tom Van Puyenbroeck

Hogeschool-Universiteit Brussel (HUBrussel) Centre for Economics & Management (CEM) Stormstraat 2, 1000 Brussels (Belgium) and Katholieke Universiteit Leuven (KULeuven) Faculty of Business and Economics Naamsestraat 69, 3000 Leuven (Belgium)

February 2009

Abstract Although a priori company screening is a constitutive feature of socially responsible investment (SRI) funds, it is not easy to substantiate that such screening effectively differentiates between companies on the basis of their Corporate Social Responsibility (CSR) calibre. Fundamentally, this is because CSR comprises several dimensions for which an undisputed aggregative model is lacking. We assess the robustness of companies’ CSR rankings with respect to several modelling assumptions. We then build on Gini’s transvariation concept to select/reject specific companies in the SRI eligible universe of assets. We illustrate our approach with some specific screening issues as confronted by the ethical advisory committee of a large Belgian bank.

Keywords: Corporate Social Responsibility, Asset Selection, Composite Indicators, Robustness Analysis

This paper benefited from comments of the members of KBC's SRI Department. remaining errors are, of course, our sole responsibility.

*

Corresponding author. Tel.: +32 2 608 14 17; fax: +32 2 217 64 64. E-mail address: [email protected]

Any

2

1

Introduction

In the slipstream of the growing attention for Corporate Social Responsibility (CSR), the measurement of the CSR performance of companies has become a booming business. Governments, non-governmental organisations, academics and not at least the companies themselves gather interest in quantitative indicators that measure companies’ corporate governance and ecological, social, ethical and economic performance. The growing industry of Ethical Investment or Socially Responsible Investment (SRI) also requires detailed CSR-related information to identify companies deemed acceptable for socially responsible investment. In fact, for ethical investment funds an adequate measurement of CSR is crucial. Their defining quality is the presence of so-called investment screens that are used to select or exclude assets prior to the actual investment decision (e.g. Renneboog et al., 2008) 1 . Taking it that the deliberate use of a restricted universe of assets is a characteristic property of SRI, the important issue addressed in this paper is to what extent the inclusion/exclusion of specific assets can be substantiated. The core difficulty when deciding whether or not an asset will be included in the SRI-universe is that the underlying selection standard, CSR, is a complex multi-dimensional phenomenon, which is described by a large array of one-dimensional indicators. In line with a fairly large literature, we will take it that a screening exercise requires reducing the information contained in various indicators in such a way that the eventual ranking of companies (and the concomitant selection of a “best-in-class” subset of these ranked companies) is based on the construction of company-specific composite CSR indices 2 . Yet this marks only the beginning of the real problem. For it is well-known that different construction methodologies for such composite indices exist and that, lacking any clear theoretical guidance as to what unequivocally constitutes the best underlying CSR model, the eventual ranking of companies may largely be driven by the specific modelling assumptions taken. Here, it should be stressed that the latter problem is not genuine to CSR indices; it is pervasive in the many fields for which composite indicators have been developed. Thus, we will address this concern using an appropriate methodology that has been advocated for composite indicators in general (see e.g. Saisana et al., 2005) and has been endorsed by supranational organisations such as the European Commission and the OECD (Nardo et al., 2008). What distinguishes the problem at hand from the conventional use of composite indicators is that SRI screening obliges to go beyond a mere ranking of companies. In turn, this requires complementing the existing robustness assessment methodology with a tool that helps to decide whether a company should be in the eligible universe or not.

1

As a rule, one indeed adheres to using a two-step procedure. One first creates an “eligible universe” of assets based on CSR criteria, after which the department that normally assumes the investment function selects assets from this universe, basing itself on standard investment criteria. The problem addressed in this paper is cast against this realistic division-of-labor background, although integrated approaches are conceivable (see e.g. Hallerbach et al., 2004).

2

Evidently, other avenues for CSR assessment have been explored as well. Ness et al. (2007) discuss, next to indices, also product-related assessments and integrated assessment tools. Specifically for CSR, Schäfer (2005) provides a closer look at the different rating approaches that are used, thus documenting a crucial stage in composite indicator construction that is not addressed in this paper. As pointed out by Schäfer, such approaches produce ratings that serve as an input for sustainability and CSR indices.

3 The SRI company selection problem is primarily a practical one. Indeed, to illustrate our proposed methodology we will address some concrete questions arising within the External Council of Sustainability Analysis that creates the SRI eligible universe for a large Belgian bank. Yet such practical problems relate to deeper issues regarding the desired level of transparency that should be upheld in SRI. Taking its own transparency seriously, the organization we use for our case study extensively documents the workings of its screening procedure on its website (www.kbcam.be). While this provides an exemplary signal to both screened companies and (potential) investors, this disclosure occasionally leads to discussions on the appropriateness of some of the dimensions included in the CSR index, the suitability of the defined dimension weights, and the potential influence of the used aggregation scheme. To deal with this dilemma, our methodology puts central the reality that it is effectively impossible to justify any exact CSR ranking of companies given the many modelling assumptions that such an exact ranking requires, and given the modelling uncertainty that inevitably arises when trying to gauge corporate social responsibility. This instigates (i) the need for a thorough uncertainty analysis, and (ii) an asset selection procedure that is both robust and easy to convey to stakeholders, including the (sometimes unsolicited) firms that eventually will find themselves positioned at one side of an investment screen. In this sense, the problem we study and the solution we propose are evidently not confined to the practical difficulties confronted by one particular SRI fund. More generally, the need for transparently and judiciously communicating CSR ratings bears on the credibility of various “infomediaries” that act as information brokers between firms and stakeholders in the field of CSR, and which have a potentially crucial role in fostering the CSR performance of firms (see e.g. Dubbink et al., 2008). 3,4 The rest of this paper is laid out as follows. The next section briefly recalls the subjectivity problem inherent in the construction of composite indicators, and its bearing on an investment screen by means of a simplified example. Section 3 describes the data and some specific questions raised by those responsible for the investment screen in our case study. Section 4 then assesses the robustness and sensitivity of companies’ rankings to different modelling assumptions. The toolbox we discuss broadens the scope of the robustness analysis advocated by Graafland et al. (2004). Building on Gini’s concept of “transvariazione”, we present a robust screening method in Section 5. Finally, Section 6 offers some summarizing comments.

2

Composite CSR indices for ethical investment screening: handle with care

The idea to use a composite index that aggregates the diverse dimensions of CSR into one number builds on the recognition that the very multi-dimensionality of CSR may impede an adequate comparison among companies. Ideally, a composite index enables to assess the CSR performance of a company quickly and efficiently and, moreover, allows for benchmarking companies within a particular sector (Krajnc and Glavič, 2005). Yet, while its use may be

For example, stakeholders may want to know why a firm is left in or out a CSR financial index (such as e.g. the Domini Social 400 Index, the FTSE 4 Good indices, the Ethibel Sustainability Index, the Dow Jones Sustainability Index, etc.).

3

4 Our methodology may thus be at least partially useful in addressing the critique of Schäfer (2005) that “while CSR rating institutions call for corporate transparency and operate as ‘social accountants’, the industry itself is currently lacking sufficient transparency.”

4 conducive to better CSR performance in general, and instrumental for CSR rating in particular, it is important that the transparency and simplicity induced by resorting to a summary index recognizes “that the researchers who construct the benchmark make choices for the stakeholders who want to judge the company. Stakeholders should thus be aware that these outcomes depend on many assumptions.” (Graafland et al., 2004, p. 139). When composite indices are to be used for investment screening purposes such concerns evidently become even more important, as the assumptions may ultimately be major drivers for companies’ rankings and, accordingly, for the composition of the SRI eligible universe. An example recalls the type of assumptions that have to be made when constructing an index. Suppose for simplicity that only three indicators are available, each of them capturing an important CSR aspect. The “People” indicator could for instance refer to the female/male ratio in top level jobs; the “Planet” indicator is based on a quantitative rating by experts from NGOs of the companies’ environmental policies, while the “Profit” indicator reports the figures for some ratio such as return on assets. Indicator scores for three firms are shown in Table 1. For each of the three indicators, a higher value for the indicator indicates a better performance, even if they do not all measure this performance using the same scale. The purpose is to select the best company based on an overall performance analysis. Table 1: Three dimensions of Corporate Social Responsibility

Company A Company B Company C

People 0.8 0.4 0.1

Planet 0.4 0.1 1

Profit 0.053 0.093 0.055

Being best practice in one dimension, all three companies are outperformed in the other two dimensions. Thus, although we consider only three companies we already face the problem that they cannot be ranked according to all indicators together, unless we somehow aggregate them. We therefore circumvent this difficulty by constructing a composite index. We consider the generic (and popular) form: ,

(1)

the composite indicator representing the overall performance of company on with the weight assigned to indicator , and the (possibly Corporate Social Responsibility, normalized) value for company weights are bounded in that

( = 1, …, ) on indicator i ( i = 1, …, and

). In general,

. While equation (1) certainly does not

encompass all possible types of composite indices that have been proposed in the literature, it is sufficiently flexible to capture some of the major concerns regarding the modelling options that First, there is the question which the constructor of a composite index faces. 5 dimensions/indicators should be included. Second, it refers to the problem that these original

Equation (1) implicitly allows for indices that use weighted geometric averages, but rules out other aggregation schemes such as those obtained by the “Benefit of the Doubt” method (see e.g. Cherchye et al., 2007) or non-compensatory multi-criteria analysis (see e.g. Munda, 2005). 5

5 indicators may well be incommensurable, and to the concomitant need to render them so by normalizing the original data in one way or another. And third, it displays the feature that these indicators must be weighted before they can be aggregated. Consider then four different scenarios to deal with these options. Specifically, a composite index is constructed: (a) using the simple arithmetic average of normalized numbers, where the original , where and indicators are normalized using the standardization method represent the sample average and standard deviation; (b) using the simple arithmetic average of normalized numbers, using the “rescaling” method

, where

and

(c) using unequal weights ( normalization method

,

; and

) and the

;

(d) same as (a), but without the people dimension (e.g., because data are missing or because the quality of the reported data is suspect). Applying scenario (a) produces a company ranking A-B-C, scenario (b) yields B-A-C, scenario (c) results in B-C-A, and scenario (d) generates the ranking C-B-A. In fact, any company ranking can be achieved – and hence any “best-in-class” company can be selected – contingent upon the choice of specific modelling assumptions. Of course, all reasonable composite indices would return the same logical ordering of companies in the trivial case where a multi-dimensional dominance relation at the level of original indicators exists. But settings in which a complete ordering can be achieved in such an uncontested manner are rare, if they exist at all. In actual fact, the ranking problem is endemic as more indicators are taken into account. The point we want to recall with our brief example is not so much that “anything goes” with composite CSR scores of companies. We refer to Wilson et al. (2007) and Mayer (2008) for some telling examples of diverging (country) rankings based on existing sustainable development indices. The deeper issue is that there are few theoretical guidelines for narrowing down the range of possible composite index construction options. The scenarios (a) and (b) illustrate the sensitivity of the composite CSR index with respect to the employed normalization scheme. This is quite worrisome given that there is no agreement on which procedure should be preferred: standardization is in se not superior to re-scaling or to other normalization methods. 6 Scenario (c) additionally demonstrates another disturbing source of dependency, as a set of weights is often selected ad hoc. Moreover, cases where weights are explicitly derived from experts’ or stakeholders’ opinions are often characterized by (possibly strong) inter-individual disagreements 7 . Finally, scenario (d) highlights the potential impact of yet another choice that has to be made, pertaining to the best (or best feasible) selection of underlying indicators.

Nardo et al. (2008) provide an elaborated discussion of the advantages and disadvantages of several normalization methods. In some cases, the choice of a specific method may be driven by insights from measurement theory. Ebert and Welsch (2004) use this perspective to advocate “meaningful” indices, where a composite indicator is only meaningful when its resulting ranking is independent of the normalization procedure as applied to the original indicators.

6

7

See e.g. Cherchye et al. (2008) for an illustration as well as a proposal to deal with this phenomenon.

6 Based on the limited information at hand, which of the scenarios should be used for an investment screening decision is truly an open question. As indicated in the Introduction, this open question may generate a testing criticism in the specific context of SRI. When an SRI fund fully discloses its screening procedure, companies that are excluded from the eligible spectrum may point at particular choices in the model specification to question the credibility of the fund itself. Transparency then could become selfdefying. Yet such companies have a point when the inclusion/exclusion decision is sensitive to what are in the end subjective modelling assumptions. We do not, however, take this as a sufficient reason to dispel with CSR composite indices. On the contrary, we concur with Graafland et al. (2004) and with Saisana et al. (2005) that composite indices remain (or even, become) reliable tools, provided their outcomes have been subject to an in-depth robustness and sensitivity analysis. We will apply such an analysis in section 4, based on the case that we introduce in the next section.

3

SRI screening in a Belgian bank

Decisions regarding the SRI investment screen in the bank that we use as our case study are taken by an External Council of Sustainability Analysis, gathering academic experts on CSRrelated aspects such as corporate governance, business ethics, human rights, economics, the environment, etc. To assess the CSR performance of companies, this advisory committee builds on the information that is collected in the EIRiS database, out of which it selected 29 indicators. 8 Each of these indicators gauges company performance on a specific facet of corporate socially responsible behaviour. Most indicators are based on a Likert(-like) point scale, but even then a limited incommensurability problem arises since the number of points on the scale may vary over the indicators. 9 The advisory committee further grouped the 29 criteria into the following six dimensions (see Appendix 1 for a detailed breakdown and the weights assigned to the 29 indicators): + Economic Policy + Corporate Governance & Business Ethics + Environment + Internal Social Policy + Human Rights - Involvement in Questionable Technologies/Practices The first five dimensions contribute positively to the companies’ overall evaluation while the opposite holds for “Involvement in Questionable Technologies/Practices”. This negative

The original EIRiS database collects approximately 60 ethical performance indicators relating to environment, corporate governance, human and labor rights, business ethics and social performance for over 2,700 companies worldwide, spread over multiple sectors. The data are gathered primarily by screening information made public by the companies themselves in annual reports, other company documents, as well as in completed EIRiS questionnaires. The bank itself adds (Belgian) firms to this database using the same indicators. 8

The indicator “customer/supplier relation”, for instance, evaluates the companies on a four-point scale with points ranging from 0 to 3. The indicator “Code of ethics” is an example of an indicator on which companies’ performances are evaluated on a five-point scale with points ranging from 0 to 4.

9

7 dimension entails a correction for companies that participate or invest in disputable activities or technologies. In cases where such engagement is too strong companies are directly excluded. 10 The introduction of the group structure and the penalty in the general equation (1) yields the following CSR composite index: ,

with weights

the performance of company and normalized indicators

),

in dimension

(2)

(using within-dimension

the weight of dimension

, and

the

correction term, in percentages, associated with the disputable activity or technology ( = 1, …,7). The composite index used by the bank builds on (i) the six dimensions listed (see appendix), (iii) the re-scaling normalization method above, (ii) unequal weights , and (iv) equal weights

for the aforementioned CSR dimensions.

A

company ranking is then constructed per industry sector, and the 40% highest ranked companies in each sector constitute the eligible universe. 11 The bank’s asset management group independently operates several ethical investment (equity and/or bond) funds based on these screening decisions. A recurring issue within the advisory council revolves around the exact specification of the weight values. Is, for instance, equal weighting of the five first dimensions the most appropriate choice? A second issue pertains to the inclusion/exclusion of dimensions itself, and of the “Economic Policy” dimension in particular. On the one hand, this dimension is important for an integral assessment of CSR (cf. the “Profit” aspect). On the other hand, the underlying economic data are typically taken into account in a regular investment analysis that is conducted anyhow (and at a more detailed level) by the asset management department. Third, there is the meta-issue that, whatever position is eventually upheld regarding the former two problems, these choices may affect the composition of the SRI asset universe.

The seven activities/technologies involved are the military industry, genetic engineering, nuclear power, alcohol, gambling, tobacco and animal testing. The extent of the correction further depends on a company’s participation level. For instance, when a company is present in the military industry, it is excluded from the eligible universe if it produces nuclear weapons or generates more than 5% of its revenue from investing in vital weapon components or complete weapon systems. If a company’s revenue from such activities is less than 5% of its total revenue, it gets a negative correction of 6%. For producing non-vital components the negative correction is smaller, etc. 10

For both the mining and the energy sector, only the 20% best are admitted. Furthermore, an additional negative exclusion criterion is applied; a company ranking across all sectors is applied for each dimension separately and the bottom 10% of companies within each dimension is excluded from the SRI universe. Further details about the screening procedure are provided at https://www.kbcam.be/IPA/D9e01/~E/~KBCAM/~BZL4LUJ/BZL3T9B/BZL1W9X/BZL1WKJ/BZL4QB5/BZL4QC1/~-BZL7P9U. 11

8

4

Robustness and sensitivity analysis of the CSR index

Given equation (2), the concerns of the advisory council are part of a broader set of modelling options that have to be specified. Elements that could be questioned are the dimension and ), the dimension and indicator weights ( and ), the indicator selection ( and normalization method, represented by the

in

, and the correction terms,

. We will

here focus on the impact of choices with respect to , and the normalization method on the ranking of 120 companies within the general retailing sector. Evidently, the subsequent analysis can readily be extended to include the impact of the other subjective choices as well and/or to other sectors. Our analysis builds on the methodology developed by Saisana et al. (2005). Consider , where is the model output, the composite CSR index, and is a vector of 3 input variables representing the we modelling decisions. For each of these variables several scenarios are defined. For consider six possible dimension selections: one selection includes all six dimensions, the other five selections each time exclude one of the positively contributing dimensions. 12,13 For the we consider several scenarios, with weights varying around the equal dimension weights weighting scheme that is currently used. Based on the information stemming from members of the advisory committee a limited degree of variation seems reasonable. Accordingly, we allow the weights of the positive dimensions to vary between 10% and 30% (using a 5% interval) under the constraint that the weights sum up to 100%. In total, this results in 381 weight combinations, all included in . To normalize the data, three normalization : the ratio or percentage differences from the mean procedures are defined in , the standardized normalisation , and the re-scaling normalization procedure

. Hence,

= (6 × 381 × 3) = 6,858 combinations

of modelling options are possible, each combination resulting in a particular construction method for the composite CSR index and, consequently, a different ranking and selection of companies in the eligible universe. We slightly depart from Saisana et al. (2005) in that we consider all these scenarios in the sequel, rather than a randomly generated subset. Consequently, for each of the 120 general retailers 6,858 composite CSR indices and a range of associated ranks are computed. The scenario as used by the advisory committee (i.e., all dimensions included and equally weighted, and the re-scaling normalization procedure) is one of these scenarios. The results are displayed in Figure 1. Companies are ordered from best to worst according to the “original” ranks, i.e. those generated by the advisory committee’s scenario (and indicated in the figure by a cross). The dots show the companies’ median ranks as computed from the 6,858

Note that, given the constraint that dimension weights should sum to unity, disregarding a dimension requires the weight of the other dimensions to be rescaled from 20% to 25%. 12

13

The penalty for involvement in questionable technologies and practices is not included in the sensitivity analysis, for two reasons. Firstly, the corrections are usually low for general retailing compared to other sectors. They range from 0.01 to 0.07 and apply to 28 out of 120 firms. Secondly, there is no discussion in the Advisory Committee about the inclusion of the correction in the decision making process. Nevertheless, the methodology is easily extended to the case where corrections are completely excluded or treated as an uncertain decision variable.

9 different scenarios, while the whiskers show the companies’ maximum and minimum ranks. The horizontal line evokes the “best-in-class” approach; only the 40% best performers within each sector are eligible for inclusion in an ethical fund. Figure 1 provides a mixed message regarding the robustness of CSR ratings. For most companies the 6,858 alternative CSR construction scenarios produce a median rank that is not too far away from the committee’s rank. Looking at the range of possible rankings per company, the figure further reveals that companies on top and at the bottom of the ranking are, on average, not too sensitive to alternative CSR model specifications. However, companies in the middle of the ranking are more susceptible to modelling uncertainty. Furthermore, for quite a few companies, including many that are unequivocally admitted to the CSR universe by the committee’s model, having a rank above or below the cut-off line in fact largely depends on the choices made by the advisory committee. We observe a few companies with a median rank that would tilt them over to the SRI universe. From this perspective, judging whether a company can satisfactorily be considered among the “best-in-class” of its sector becomes a more intricate matter.

0

20

Ranking

40

60

80

100

Bank's ranking Median ranking

C13 C3 C71 C14 C99 C75 C17 C57 C47 C65 C51 C62 C114 C76 C108 C31 C111 C118 C89 C68 C16 C107 C97 C63 C22 C40 C102 C73 C60 C38 C11 C83 C110 C2 C86 C82 C55 C7 C66 C117 C25 C18 C53 C45 C115 C19 C27 C9 C44 C28 C30 C88 C81 C85 C32 C69 C58 C74 C120 C23 C20 C90 C104 C64 C52 C77 C70 C42 C100 C35 C105 C78 C1 C37 C12 C43 C39 C96 C21 C10 C24 C33 C36 C5 C54 C113 C109 C48 C80 C112 C106 C116 C61 C67 C79 C95 C92 C93 C91 C4 C101 C34 C56 C59 C84 C26 C15 C50 C41 C87 C103 C119 C6 C46 C8 C72 C49 C98 C94 C29

120

Companies

Figure 1: A range of CSR company rankings

It is valuable to have a better idea about which modelling options have the largest impact on rankings. To perform this sensitivity analysis we employ variance-decomposition techniques (Saltelli et al., 2004) 14 . The basic idea of these methods is to examine the sensitivity of our output (viz. the composite CSR rank of a company) to a single modelling factor by examining the variance in the CSR rankings as driven by changes in this factor. Specifically, one can fix at one of the values it can take (e.g. ) and then calculate the expected but varying over the other scenarios, i.e. rank of a company given this fixed . The output variance as caused by factor is then obtained by can take, i.e. calculating the variance of the so-obtained statistic over all possible values that . This figure would be zero if does not contribute to the variation in a company’s CSR rankings. That is, given the value of all other input factors, . Analogously, we would obtain the same expected rank irrespective of the value of

14

Saisana et al. (2005) and Nardo et al. (2008) provide similar applications to composite indicators.

10 would equal the total, unconditional variance for this company.

if this dimension solely drives the final results

Figure 2(a) displays the total output variance and its decomposition along the lines just indicated for all companies, with the companies ordered as in Figure 1. Hence, companies whose ranking is relatively more sensitive to alternative CSR model specifications are located in the middle part of the graph. The figure shows that the output is most sensitive to choices regarding the ), followed by the choice of the dimension (exclusion of) CSR index dimensions ( weights ( ). Less variation in company rankings is explained by the experts’ ) and the interaction effects preferences in the choice of the normalization scheme ( (non additive variance). Exclusively looking at these first-order effects is legitimate if there are no interactions between the different modelling options. In that case one obtains and the model is said to be additive. Conceivably though, such interactions exist. For example, a company’s ranking may be particularly high (or low) given the exclusion of a specific dimension in which it is a weak (strong) performer and the parallel selection of a normalization scheme that is relatively sensitive to outliers. The total output variance is then composed of the sum of the dimensionspecific first-order effects of all input factors and the (higher order) interaction effects. The number of first and higher order sensitivity indices increases exponentially with the number of input factors. Calculating all possible interaction effects is therefore often not desirable. Alternatively, a mirror image of the first-order approach outlined above is obtained by considering , which is the part of total are fixed. As this unexplained output variance that remains unexplained if all factors but part implicitly captures all (first and higher order) effects associated with , it is called a total has a sensitivity effect (Saltelli et al., 2004). Again, a high value for this statistic means that large influence on the final ranking of a company. The magnitude of the difference between the first and total order indices for input factor indicates the importance of interactions with this factor in the total variance . Below, we divide the so-obtained numbers by the total variance to obtain total effect sensitivity indices . Thus, for each of the modelling factors ( = 1, 2, 3) we get: .

(3)

The total effect sensitivity indices are presented in Figure 2(b), with companies ranked similarly as in Figure 2(a). Both figures basically convey a comparable message. First, since the sum of is greater than unity for each company, Figure 2(b) implicitly signals the presence of all interaction terms (which are double-counted if we sum expression (3) over the different modelling factors). As we use sensitivity indices in Figure 2(b), we can readily observe that the largest part of total output variance can be attributed to the specific composition of (the inclusion/exclusion of specific dimensions), and that the exact definition of the dimension weights ( ) is the second-most significant decision. In this specific case, discussions on the proper normalization method are more a matter of achieving extra completeness instead of being fundamental to the trustworthiness of the composite indicators and their associated ranks. The advisory committee’s concerns with the possible impact of changing the CSR index construction methods on company rankings are, hence, largely justified. That is, the discussions within the advisory committee basically relate to the two modelling options that are deemed most influential by the above sensitivity analyses. As for the choice of dimensions, we recall

11 that the particular concern of the committee was whether the “Economic Policy” dimension should be included in the CSR index. Additional analysis reveals that this dimension is in fact the most influential of the six dimensions considered. 15

100

200

300

Non additive Aggregation Weights Dimensions

C13 C3 C71 C14 C99 C75 C17 C57 C47 C65 C51 C62 C114 C76 C108 C31 C111 C118 C89 C68 C16 C107 C97 C63 C22 C40 C102 C73 C60 C38 C11 C83 C110 C2 C86 C82 C55 C7 C66 C117 C25 C18 C53 C45 C115 C19 C27 C9 C44 C28 C30 C88 C81 C85 C32 C69 C58 C74 C120 C23 C20 C90 C104 C64 C52 C77 C70 C42 C100 C35 C105 C78 C1 C37 C12 C43 C39 C96 C21 C10 C24 C33 C36 C5 C54 C113 C109 C48 C80 C112 C106 C116 C61 C67 C79 C95 C92 C93 C91 C4 C101 C34 C56 C59 C84 C26 C15 C50 C41 C87 C103 C119 C6 C46 C8 C72 C49 C98 C94 C29

0

Variance of company rank

400

(a)

Company

0.5

1.0

1.5

Aggregation Weights Dimensions

C13 C3 C71 C14 C99 C75 C17 C57 C47 C65 C51 C62 C114 C76 C108 C31 C111 C118 C89 C68 C16 C107 C97 C63 C22 C40 C102 C73 C60 C38 C11 C83 C110 C2 C86 C82 C55 C7 C66 C117 C25 C18 C53 C45 C115 C19 C27 C9 C44 C28 C30 C88 C81 C85 C32 C69 C58 C74 C120 C23 C20 C90 C104 C64 C52 C77 C70 C42 C100 C35 C105 C78 C1 C37 C12 C43 C39 C96 C21 C10 C24 C33 C36 C5 C54 C113 C109 C48 C80 C112 C106 C116 C61 C67 C79 C95 C92 C93 C91 C4 C101 C34 C56 C59 C84 C26 C15 C50 C41 C87 C103 C119 C6 C46 C8 C72 C49 C98 C94 C29

0.0

Total effect sensitivity index

2.0

(b)

Company

Figure 2: (a) First-order (variances) and (b) Total order sensitivity indexes

5

Robust screening

The results of the sensitivity analysis can in principle be ignored, implying that the committee falls back on the status quo selection. But this is hardly a defensible way of coping with the underlying problem. A second option is to intensify internal discussions about the particular scenario that will be chosen as an alternative. But it seems equally difficult to sustain that

Excluding “Economic Policy” results in an average shift in the ranks of 5.59 places as compared to the original scenario (with a standard deviation of 2.21). Excluding “Corporate Governance & Business Ethics”, “Environment”, “Internal Social Policy”, and “Human Rights” respectively leads to average shifts in ranks of 5.10 (stdev.: 1.89), 4.58 (stdev.: 1.28), 4.68 (stdev.: 1.39), and 4.42 (stdev.: 1.39).

15

12 agreement on such an alternative is more likely than the one that was originally questioned. A third option, which we here advocate, is to keep on acknowledging the uncertainty that comes along with the construction of a CSR index. If different methodological alternatives produce company-specific ranges of CSR index values and rankings, then the SRI screening exercise itself should accordingly recognize the possibility of making assumption-driven classification errors. The relevant issue then becomes how to minimize such possibilities, i.e., how to come up with a robust “best-in-class” CSR categorization of companies. From the latter perspective, an encouraging observation stemming from the robustness analysis is that the impact of the modelling uncertainty problem is less important for companies on top or at the bottom of the original CSR ranking. In particular, the committee can single out companies that achieve rankings above the cut-off line in all possible scenarios. In our sample of 120 general retailers, 30 retailers always receive a rank smaller than or equal to 48 (the 40% cut-off line). As an example, consider general retailer #57 in Figure 3, with mean rank equal to 11.92, and a best and worst ranking of respectively 6 and 28. Similarly, there are 48 companies in the sample that are excluded from the top 40% performers in each of the 6,858 modelling scenarios. General retailer #43 in Figure 3, with a mean ranking of 76.84, and, respectively, a best and worst ranking of 52 and 93, illustrates such a case. Employing this line of reasoning, we can make robust inclusion/exclusion statements for 78 out of the 120 companies. Conversely, this means that there is a group of 42 companies for which the eventual classification is scenariodriven (i.e. with ranking distributions extending at both sides of the cut-off line).

2000 Company #57 Company #43

Frequency

1500

1000

500

0 0

20

40

60

80

100

120

Ranking

Figure 3: Robust ranking distributions for a strong (left) and poor (right) performing retailer

Figure 4 provides an aggregate overview. Contrary to Figure 1, this graph first puts a company in one of three groups before ordering them in terms of their median rank. The left group collects the 30 companies that would be included in the top 40% in each of the 6,858 scenarios (i.e. companies similar to general retailer #57 in Figure 3). At the right are the companies that are never included. The middle group collects the 42 companies for which inclusion/exclusion in the SRI universe is scenario-driven. We note that some companies in this middle group are ranked higher by the bank and/or have a higher median rank than those in the robust (first) SRI eligible group. This again demonstrates that any specific scenario may select general retailers which, in other scenarios, do not end up in the top 40%. Obviously, the solution cannot be to include only those companies that robustly belong to the “best-in-class”: it is plainly inconsistent to say that the top 40% performers comprises 25% of the firms, and trying to get around this contradiction by focusing on those that are (robustly) in the top 25% merely shifts the problem. The only CSR-universe that would be consistent with

13 itself in the sense just used would actually comprise all 120 companies. We therefore assume that the bank still uses an exogenously determined definition of “best-in-class”, which we take here to remain at the top 40% performers within the sector. But we propose to append this selection criterion with one that builds on the full distribution of the rankings. Specifically, we will consider companies that have a small probability of exceeding the cut-off rank and/or only do so to a relatively limited extent. As for our application, this approach allows all (30) companies in the first group to be in the eligible universe. Thus (18) additional companies have to be selected from the middle group in order to get at the top 40%.

0

20

Ranking

40

60

80

100

Bank's ranking Median ranking

C13 C3 C14 C71 C99 C75 C17 C47 C65 C51 C57 C62 C114 C76 C108 C89 C31 C111 C118 C16 C63 C107 C22 C73 C102 C40 C82 C86 C2 C55 C68 C97 C60 C110 C38 C11 C83 C66 C117 C7 C25 C18 C28 C44 C45 C53 C115 C19 C9 C27 C88 C81 C30 C32 C85 C69 C74 C23 C20 C58 C42 C64 C120 C35 C104 C70 C78 C96 C39 C36 C80 C112 C77 C90 C52 C100 C1 C105 C37 C43 C12 C21 C24 C109 C5 C33 C10 C54 C113 C48 C106 C79 C116 C61 C67 C92 C93 C95 C91 C4 C34 C56 C59 C84 C101 C15 C26 C50 C41 C103 C87 C119 C8 C6 C49 C72 C98 C46 C29 C94

120

Companies

Figure 4: Companies that are always/sometimes/never in the SRI universe

We use the information of the 6,858 scenarios to construct the scatter diagram of Figure 5, each point representing one of the 42 middle group companies. The horizontal axis indicates companies’ exclusion probability, i.e. the observed frequency with which a company ends up having a rank larger than 48. It is immediate that even within this group of companies there is large variation as regards this probability. At one extreme are companies that exceed the cut-off line less than 10 times out of 6858, whereas on the right side of the axis there are companies with more than 99% of their distribution mass at the wrong side of the cut-off line. It should be noted that companies with an exclusion probability not exceeding 0.5 are in fact companies with a favourable median rank, but with some mass of their ranking distribution situated at the other, “wrong” side of the cut-off line. We point out that for such companies the exclusion probability can directly be linked with Gini’s concept of “Transvariazione”. 16 Basically, a rank distribution and a cut-off rank are said to transvariate on , with if the sign of some respect to a measure of central tendency such as e.g. the median rank of the differences ( = 1, …, 6856) is opposite to that of . To get an idea about the extent of such transvariations, different measures have been proposed.

In the income inequality literature, this and related notions have been exploited to measure the distance between two distributions (i.e. the complement of the overlap of two distributions) or the “economic advantage” of one group over another (see e.g. Deutsch and Silber, 1999, and the references cited therein). 16

14 Dagum (1960) defined the concept of transvariability. Applied to companies with a favourable median ranking (that is: ), the appropriate measure is 17 : .

(4)

This is nothing but the exclusion probability we referred to above. In fact, exactly halve of the companies in the middle group have a favourable median rank, and since this is a sufficient number to complete the eligible spectrum we will focus on this group. 18

40 35

First Moment

30 25 20 15 10 C18

5 C9

0 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Exclusion Probability

Figure 5: Exclusion probability and expected non-compliance with inclusion rule of middle group companies

The vertical axis in Figure 5 is similarly based on another measure associated with the transvariation concept, to wit, a measure of its expected intensity as given by the so-called first moment of transvariation (see again Dagum, 1960):

,

(5)

where the “transvariation” connotation is as such directly applicable to companies with xmed ≤ xcut −off , although equation (5) can evidently be used without such underlying

Dagum (1960) distinguishes transvariability as given in equation (4) from the “probability of transvariation”, which normalizes transvariability by dividing it with the probability that . 17

Since we use the median as measure of central tendency, this means that the probability of transvariation of a company equals two times its transvariability. Still, it is worthwhile to point out that there is also a straightforward connection between companies with an exclusion probability of more than 0.5 and the transvariability concept. For such companies (i.e. , the latter is given by , and the exclusion companies with 18

. Put otherwise, in general we advocate to use companies with a small probability is then probability of transvariation if , and, if necessary, to complement these with companies with , but with a large probability of transvariation.

15 interpretation for the other companies as well. Again, we observe a large variation, from a minimal value close to zero to expected cut-off rank violations of more than 38 places. Although it is clear that one would ideally like to select companies that are close to the origin, it is less obvious whether either the exclusion probability (i.e. equation (4)) or the expected intensity of exclusion (equation (5)) should be the most important statistic on which to base inclusion/exclusion decisions in conflicting cases. Since our major concern is with robustness, we here follow Montanari (2004) who pointed out that minimizing transvariation probabilities, i.e. “the bare counting of the transvariations”, is not affected by possibly large transvariations due to outlying data, and performs better with non-normally distributed data. Thus, we effectively read the graph from left to right and include the first 18 observations we encounter to complement the eligible asset universe in a robust way. This approach in fact means that we chose to put the dividing line between company #18 (included) and company #9 (excluded), although the first moment of transvariation of the former (6.42) is considerably higher than that of the latter (3.76). In fact, the ranking distribution of company #18, reproduced in Figure 6, provides additional evidence in favour of its inclusion. As can readily be inspected, the ranking has a bi-modal distribution. Detailed inspection of the scenarios reveals that the second peak is entirely driven by exactly one modelling assumption, viz. the option whether to include or exclude the “environment” dimension in the computation of the CSR composite index. Specifically, since excluding this dimension considerably penalizes the excellent environmental performance of this company, and recalling that exclusion of this dimension is never contemplated by the advisory committee itself, this in fact provides additional justification for keeping the company in the eligible spectrum.

500

Frequency

400

300

200

100

0 0

20

40

60

80

100

120

Ranking

Figure 6: Ranking distribution of the “marginal” company (#18)

We finally compare the ranking of the advisory committee with the one developed using the transvariability rule. Figure 7 plots the bank’s original ranking on the horizontal axis and the transvariability ranking on the vertical axis. In this figure we again focus on the companies which appeared in the middle-part of Figure 4. That ranking differences exist straightforwardly follows from the fact that companies are not situated on the 45 degree line. The relevant question is whether these differences carry over to the composition of the SRI universe. This can be inspected by adding the cut off line to the

16

80 20

40

60

Accepted by the SRI screening model Rejected by transvariation

Accepted by transvariation Rejected by the SRI screening model

0

Transvariation ranking

100

figure. The lower left and upper right quadrants collect companies for which there is no difference in the final decision when using the bank’s ranking or the one resulting from the transvariation rule. Companies in the lower left corner are selected by both criteria (and, to recall, added to the 30 companies that always remained in the eligible group, i.e. companies for which the observed transvariation probability was zero) while the ones emerging in the upper right quadrant are rejected in both settings.

0

20

40

60

80

100

Screening model ranking

Figure 7: Comparison of rating institution and transvariation ranking for middle-rank performers

The upper left and lower right quadrant contain companies for which the employed procedure for SRI screening does matter. In our sample, different evaluations eventually hold for 2 × 2 = 4 companies. Bearing in mind that the transvariation rule has the advantage of taking into account the probability of exceeding the 48 frontier, there are strong robustness-based reasons to claim that the two companies appearing in the lower right quadrant have been wrongfully rejected by the bank whereas the ones emerging in the upper left quadrant have been accepted on a too narrow basis. Overall then, given the data of our sample and the robustness concerns as considered by the advisory committee, a robust SRI screen would switch the classification of at most two pairs of companies. This, evidently, is a positive message about the quality of the original SRI screen in itself. Still, it is recommendable to alter the eventual classification for the few companies involved.

6

Concluding remarks

Most probably, the message emerging from robustness analyses will not always be as reassuring as was, by and large, the case for our specific example. In such cases, the results are still valuable to an advisory committee. For instance, they may be helpful in revealing the potential tension between an exogenously imposed cut-off rule and the robustness of the company classification that this rule entails. One obvious avenue for further research is indeed exactly concerned with endogenizing the cut-off rule, which requires to trade off robustness concerns with the need for an eligible spectrum that is neither so large as to include “all” companies regardless of their CSR performance nor too small to be useful for genuine investment portfolio construction. Similarly, by indicating where one should look for narrowing down possible construction scenarios, the sensitivity analyses implicitly guide the agenda about the substantive discussions regarding CSR assessment. They eventually may even lead to the conclusion that a

17 genuine ranking of companies is plainly too volatile to be useful. Even in this negative case, they still have the virtue of being based on an in-depth exploration of the data. The use of company-specific composite CSR indicators is not universally appreciated. Some praise composite indices for their ability to provide an adequate summarizing picture. Were it not for the presence of such summarizing figures, stakeholders would become lost in a myriad of information and, notwithstanding trivial cases, final ratings would remain difficult to justify in any case. On the other hand, opponents question the credibility of composite indices due to the lack of a generally accepted construction methodology, and, in particular, due to the inescapable subjectivity involved in their construction. Particularly worrisome from this perspective is the likely dependence of companies’ rankings on some of these subjective choices. If composite indicators are constructed and used without first having them subjected to an in-depth robustness analysis, such worries are surely legitimate. But it is precisely the same conviction which implies that composite indicators can be a helpful and reliable evaluation tool, provided their construction is sufficiently transparent. As we have indicated in this paper, an approach that fully recognizes and exploits the existence of company-specific probability distributions of scores/rankings may be fruitful in justifying the SRI classification of companies as well. In this sense, the methodology of the previous pages may help SRI funds to additionally assure stakeholders that the CSR screening issue is taken seriously.

18

References Cherchye, L., Moesen, W., Rogge, N., and Van Puyenbroeck, T., 2007. An introduction to ‘Benefit of the Doubt’ composite indicators. Social Indicators Research, 82(1):111-145. Cherchye, L., Moesen, W., Rogge, N., Van Puyenbroeck, T., Saisana, M., Saltelli, A., Liska, R., and Tarantola, S., 2008. Creating composite indicators with DEA and robustness analysis: the case of the technology achievement index. Journal of the Operational Research Society, 59(2):239-251. Dagum, C., 1960. Teoría de la Transvariación- sus Aplicaciones a la Economía. Metron, 20:2260. Deutsch, J., and Silber, J., 1999. Inequality Decomposition by Population Subgroups and the Analysis of Interdistributional Inequality. In J. Silber (Editor), Handbook of Income Inequality Measurement. Norwell, MA, Kluwer Academic Press, pp. 363-397. Dubbink, W., Graafland, J., and Van Liedekerke, L., 2008. CSR, Transparency and the role of intermediate organisations. Journal of Business Ethics, 82(2):391-406. Ebert, U. and Welsch, H., 2004. Meaningful environmental indices: a social choice approach. Journal of Environmental Economics and Management, 47(2):270-283. Graafland, J.J., Eijffinger, S.C.W., and Smid, H., 2004. Benchmarking of Corporate Social Responsibility: Methodological Problems and Robustness. Journal of Business Ethics, 53:137152. Hallerbach, W., Ning, H., Soppe, A. and Spronk, J., 2004. A Framework for Managing a Portfolio of Socially Responsible Investments. European Journal of Operational Research, 153(2):517-529. Krajnc, D. and Glavič, P., 2005. How to Compare Companies on Relevant Dimensions of Sustainability. Ecological Economics, 55(4):551-563. Mayer, A.L., 2008. Strengths and Weaknesses of Common Sustainability Indices for Multidimensional Systems. Environment International, 34(2):277-291. Montanari, A., 2004. Linear Discriminant Analysis and Transvariation. Journal of Classification, 21(1):71-88. Munda, G., 2005. Measuring Sustainability: a multi-criterion framework. Environment, Development and Sustainability, 7(1):117-134. Nardo, M., Saisana, M., Saltelli, A., Tarantola, S., Hoffman, A. and Giovannini, E., 2008. Handbook on Constructing Composite Indicators: Methodology and User Guide, OECD/Joint Research Centre for the European Commission. Paris/Ispra. Ness, B., Urbel-Piirsalu, E., Anderberg, S., and Olsson, L., 2007. Categorising Tools for Sustainability Assessment. Ecological Economics, 60(3):498-508. Renneboog, L., Ter Horst, J., and Zhang, C., 2008. Socially Responsible Investments: Institional Aspects, Performance, and Investor Behavior. Journal of Banking and Finance, 32(9):1723-1742. Saisana, M., Saltelli, A. and Tarantola, S., 2005. Uncertainty and Sensitivity Analysis as Tools for the Quality Assessment of Composite Indicators. Journal of the Royal Statistical Society Series A, 168(2):307-323.

19 Saltelli, A., Tarantola, S., Campolongo, F., and Ratto, M., 2004. Sensitivity Analysis in Practice. A Guide to Assessing Scientific Models. Wiley, Chichester. Schäfer, H., 2005. International Corporate Social Responsibility Rating Systems: Conceptual Outline and Empirical Results, Journal of Corporate Citizenship 20:107-120. Wilson, J., Tyedmers, P., and Pelot, R., 2007. Contrasting and Comparing Sustainable Development Indicator Metrics. Ecological Indicators, 7(2):299-314.

20

Appendix 1 Dimensions and sub-criteria of Corporate Social Responsibility Code

Indicator Economic policy Economic performance Customer/supplier relations Community involvement Corporate governance and business ethics Code of ethics Problems with the legal framework Management composition and working practices Stakeholder involvement Environment Policy Management Reporting Biodiversity Performance Internal social relations Equal opportunity policy Union and employee representation Job creation and security Training Employee participation Health and safety Human rights Responsibility for human rights Supply chain responsibility Bribery and corruption Presence in controversial countries/areas Socially questionable practices and technologies Weapons and military industry Genetic manipulation Nuclear energy Alcohol Gambling Tobacco industry Animal welfare

Weight 20% 24.0% 38.0% 38.0% 20% 33.0% 12.0% 33.0% 22.0% 20% 7.5% 12.5% 5.0% 10.0% 65.0% 20% 24.0% 17.0% 17.0% 17.0% 8.0% 17.0% 20% 42.0% 28.0% 15.0% 15.0%