Variability in Comparable Performance of Urban Bus Operations

0 downloads 0 Views 233KB Size Report
ever, the literature review revealed few examples of benchmarking activity within the public bus service industry. Mulley describes the. Variability in Comparable ...
Variability in Comparable Performance of Urban Bus Operations Mark Trompet, Richard J. Anderson, and Daniel J. Graham This can lower the direct comparability of data, at least when they are looked at directly without explanatory background information. This study seeks to understand whether comparing the performance of urban bus operators through a benchmarking exercise is both useful and justifiable. Benchmarking could be deemed useful if the performance comparisons exhibit sufficient significant variability in performance between operators such that operators can learn lessons from one another. The exercise could be viewed as justifiable if different external conditions do not affect performance to the extent that the variability in performance is mainly a function of these conditions, rather than true differences in performance that are within the control of management.

Whether comparing the performance of urban bus operators through a benchmarking exercise is useful and justifiable is examined. Benchmarking can be deemed useful if performance comparisons exhibit sufficient significant variability in performance between operators such that lessons can be learned from one another. The exercise can be viewed as justifiable if different external conditions do not affect performance to the extent that the variability of the results can be judged as incomparable. The data used for the study were collected by the International Bus Benchmarking Group, facilitated by Imperial College London, and related to 10 medium to large bus operators from nine countries for 2001 to 2007. After data stratification and normalization, especially for differences in vehicle size, demand profile, and commercial speed, the results suggest that comparing performance of urban bus operations through benchmarking is both useful and justifiable as long as there is a sufficient number of operators in the comparison that exhibit similar operating characteristics and urban environments.

PUBLIC TRANSPORT BENCHMARKING RESEARCH Benchmarking as a tool for performance comparison and bestpractice finding has been described and defined in many articles and papers. Fong et al. state that benchmarking draws wide attention from various disciplines and that definitions of benchmarking differ according to the process or practice that was benchmarked and the actual methodology itself (1). Fong et al. therefore provided three working definitions for benchmarking. Their working definition most closely related to the IBBG process is by Lema and Price: “A systematic and continuous measurement process; a process of continuously measuring and comparing an organisation’s business process against business leaders anywhere else in the world to gain information which will help the organization to take action to improve its performance” (2). This paper does not provide an overview of benchmarking definitions nor aims to add to this list. However, an adapted working definition is given to better represent the IBBG benchmarking process, as follows: Benchmarking is a systematic process of continuously measuring, comparing, and understanding organizations’ performance and change in performance of a diversity of key business processes against comparable peers anywhere else in the world to gain information that will help the participating organizations to take action to improve their performance. Benchmarking is applicable to many sectors, including the public transport sector. An overview of public transport benchmarking initiatives has been provided in a variety of reports and papers (3–5). Other papers have described lessons learned from specific public transport benchmarking initiatives. Lessons from Railbench are described by Vaglio (6). Gudmundsson et al. describe lessons from Benchmarking European Sustainable Transport and Benchmarking of Benchmarking (7 ). Lessons learned from the CoMET and Nova metro benchmarking groups are described by Anderson (8, 9). However, the literature review revealed few examples of benchmarking activity within the public bus service industry. Mulley describes the

In 2003, two large urban bus operators formed a group to compare performance and share best practices with peers in other large cities. For one of these operators, there were no sufficiently comparable bus systems within its country, and comparison had to be found at an international level. Other interested organizations were approached, and in August 2004 the International Bus Benchmarking Group (IBBG) was founded. As IBBG approaches its fifth annual phase, it is a good time to reflect on the benchmarking work of the group and to communicate to a wider audience whether quantitative benchmarking has been a useful and valid tool. In many countries, operators in large cities often have no domestic peers with which to compare their performance. Benchmarking has the advantage that similar operators, comparable in size and operating characteristics, can nonetheless be found elsewhere in the world. Furthermore, lessons or otherwise interesting practices can be shared with organizations from other cultures and backgrounds. This can open the minds of managers to ideas and practices that otherwise would not have been apparent to them. However, these different cultures and backgrounds can lead to exogenous factors affecting the performance of operators beyond the control of management. Even in countries where peers can be found domestically, differences exist in local conditions that can affect performance. Railway and Transport Strategy Centre, Centre for Transport Studies, Department of Civil and Environmental Engineering, Imperial College London, Skempton Building, SW7 2AZ London, United Kingdom. Corresponding author: M. Trompet, [email protected]. Transportation Research Record: Journal of the Transportation Research Board, No. 2111, Transportation Research Board of the National Academies, Washington, D.C., 2009, pp. 177–184. DOI: 10.3141/2111-20

177

178

process and lessons learned from the U.K. Bus benchmarking Group (5), which started in 2001 and is based on the benchmarking handbook developed by EQUIP (10). Furthermore, Mulley states that some in-house benchmarking is performed by large private U.K. operators (e.g., Stagecoach). The history of IBBG, the benchmarking process, and lessons learned in developing IBBG are described in detail by Randall et al. (11). Little research has been accomplished into the impact of different internal and especially external factors on overall bus performance. The latter is more relevant to this study since the variability of comparable performance of urban bus operations is discussed. Hensher and Daniels describe a productivity measurement initiative in 1993 by using data of 24 private and eight public bus operators in Australia, focusing on the role of different institutional and regulatory constrains on relative financial performance (12). They found that the ability to attract patronage to each kilometer of vehicle provision (e.g., high-capacity utilization) is closely linked to costeffectiveness. Bus operators that serve less-busy routes as part of a public service obligation would therefore be less comparable than operators that serve densely populated areas. Hoffmann and O’Mahony describe the impact of adverse weather conditions, in particular rain, on urban bus performance measures (13). They found that ridership as well as commercial speeds are lower on rainy days. However, bus services are more regular. Odeck researched the impact of congestion, ownership, region of operation, and scale on urban bus operation performance by using information from 33 Norwegian bus operators (14). He found no evidence of differences in performance caused by ownership and region of operation. He did conclude an effect related to the suboptimal input allocation, which is essentially predetermined by the size of bus operations.

DATA Data used for this study have been collected through IBBG, which is facilitated by the Railway and Transport Strategy Centre at Imperial College London. IBBG is now in its fifth year. Its current members are TMB in Barcelona, Spain; STIB in Brussels, Belgium; Dublin, Ireland, Bus; Carris in Lisbon, Portugal; London Buses; EMT in Madrid, Spain; STM in Montreal, Quebec, Canada; NYCT in New York City; RATP in Paris; and STA Sydney, Australia, Buses. All members are public organizations that provide normal passenger public bus service operations in large urban areas. Nine of 10 IBBG members operate bus services themselves, but London Buses is an authority regulating the provision of bus services in London operated by mostly private bus operators. These 10 bus organizations have provided data for 90 key performance-related data items. Some of these items are broken down into further subcategories such as vehicle type or are outsourced versus in-house labor. These performance-related data are supported by another 38 background data items to provide context and understanding. Up to 7 years of data, from 2001 to 2007, are available. IBBG was developed by taking into account lessons learned from other public transport benchmarking exercises and is specifically based on 14 years of experience developed under the CoMET and Nova metro benchmarking groups, which are facilitated by the Railway and Transport Strategy Centre at Imperial College London. Anderson stated that benchmarking requires a long-term approach (8). One-off benchmarking studies are rarely successful because it can take many years and iterative cycles to achieve comparable indicators that are reported on a consistent basis. For IBBG, it took

Transportation Research Record 2111

3 years of iterative definition development, data collection, and analysis before the member operators were sufficiently satisfied with the level of comparability to be able to use the key performance indicators (KPIs). Because comparability of data of different organizations is key in this study on understanding variability in performance, it is helpful to summarize what has been done in IBBG’s first 3 years to gradually make the data set more comparable: 1. The original eight member organizations, from seven countries, were chosen on the basis of their similar characteristics. All IBBG members are urban bus operators and therefore do not operate interurban or international bus services. One of the selection criteria was fleet size, which was set to be 1,000 or more buses. Other criteria were similarities in the service characteristics, technological comparability, and the role of the operator within the city. 2. New members are chosen unanimously by the existing members. The self-selection of new members of the group focused on world-class cities and the mentioned similarities in characteristics. 3. Organizations that belong to an international group of privatesector operators are not invited to join IBBG. This is to ensure that existing members feel comfortable about openly sharing (confidential) data. 4. The members own the group and decide what topics and KPIs will be studied and measured. 5. The strict confidentiality agreement and the willingness of members to help and learn from each other creates an open and honest information-sharing environment. 6. Member organizations are not directly (financially) rewarded on the basis of the results of benchmarking. This minimizes the datashielding effect that De Bruijn (15) warned of in his “law of diminishing effectiveness.” De Bruijn states that there is no motivation to supply better than truth information, also frequently called gaming the numbers (16). 7. Based on the balanced scorecard success dimensions (17 ), performance is measured in different areas of urban bus operations. This ensures a holistic approach to performance comparison. As learned from the CoMET and Nova benchmarking groups, IBBG started with a concise number of only 23 KPIs spread over the different success dimensions. This number slowly increased over 4 years to 40, after assurances that the original KPIs were clear, useful, and comparable. This has been reported as good practice for bus benchmarking by Mulley (5). 8. The 90 data items necessary to produce the KPIs have been carefully defined. Many items were adjusted or made more specific within the first 3 years on the basis of the pilot benchmarking results and member suggestions. When definitions changed, members reissued previously submitted information to adhere to the improved definition, ensuring correct time-series analysis. 9. The data items include only data from normal service operations, with the effect of charter, tourist bus, paratransit, and school bus services filtered out. 10. Background information and data are collected; this clarifies differences in performance. Examples are information on regulatory regimes, the importance and role of the bus operator within its city, and fare and funding structures. 11. The mentioned background data cannot yet be used to normalize quantitatively for differences between operators because of limitations in the number of observations. However, background data have been found to be very useful in qualitatively understanding differences in performance. This enables IBBG to show less

Trompet, Anderson, and Graham

comparable organizations within the same key performance indicator graph, although with appropriate caveats.

METHODOLOGY To compare performance among bus operators, performance data must be normalized for scale as far as possible. When IBBG began, common scale denominators used to normalize the data were passenger boardings, passenger kilometers, number of vehicles, vehicle kilometer, capacity kilometer, vehicle hours, and staff hours. For each KPI, the most suitable denominator was chosen. Where a single best denominator did not exist, the data were normalized twice by using two different denominators. For example, most KPIs are normalized by both vehicle kilometer and vehicle hours. Vehicle hours are the preferred denominator as this normalizes for commercial speed. Hours are also the more significant cost driver. However,

TABLE 1

179

because not all urban bus operators collect vehicle-hour data, vehicle kilometer is also used as a denominator. Kilometers are also the main cost driver in maintenance and are the final output. Financial data are collected in local currencies. Because of differences in exchange rates, inflation rates, and purchasing power, financial data must be expressed in comparable units before being normalized. IBBG uses the World Bank’s Purchasing Power Parity exchange rates for this purpose. To answer the questions of whether quantitative benchmarking is a useful and justifiable tool for comparing urban bus operators, a subset of KPIs was chosen for this analysis. KPIs reflecting performance areas from all balanced scorecard success dimensions were chosen for which data were available for at least eight operators before data stratification. The performance areas selected are listed in Table 1. The sample for each KPI were then stratified (cleaned) to remove extremas in the data. A second round of stratification removed those

Main Factors Affecting Key Performance Indicators of Urban Bus Operators

Key Performance Area

Main External Factors Affecting Bus Operator Performance

Service availability Dynamic passenger info Accessibility: low floor Accessibility: ramp Peak-time spare factor

Macroeconomic conditions, income, bus fares, fuel prices, availability other modes, quality other modes Regulation, fuel prices Demand profile, public service obligation routes, network design Demand profile, public service obligation routes, network design Demand profile, public service obligation routes, network design Lost km for external reasons Funding, customer expectations Regulation, funding Regulation, funding Demand profile

Vehicle utilization

Lost km for external reasons

Fleet age

Regulation, funding

Network efficiency Maintenance productivity Staff absenteeism Vehicle reliability

Demand profile, depot location Work time rules and regulations, culture Culture, regulation, union power Road conditions, climate, funding

Lost kilometers: internal Lost kilometers: external

Staff accidents Passenger accidents Fuel economy

Union power Weather, demonstrations, road blocks, traffic accidents, congestion, bus priority Road conditions, road congestion, behavior other road users Health and safety regulations Passenger behavior, passenger demographics Road conditions, road congestion

Operating cost Recovery ratio

Labor cost, commercial speed, fuel prices Fare policy, public service obligation routes

Ridership change

Staff training Average bus load Capacity utilization: all Capacity utilization: seat

Vehicle accidents

NOTE: Factors in italics can be influenced both externally and internally.

Main Internal Factors Affecting Bus Operator Performance Quality bus service, level-of-service supply

New technologies, focus on safety, emission reduction policy Fleet size, average vehicle capacity, off-peak service level, ridership, trip length, frequency, network design Fleet size, off-peak service level, ridership, trip length, frequency, network design Fleet size, off-peak service level, ridership, trip length, frequency, network design, average number of seats per bus Spare factor, vehicle reliability, absenteeism, maintenance effectiveness Technology availability, technology policy Procurement policy, average fleet age Procurement policy, average fleet age Maintenance effectiveness, maintenance scheduling, spare policy, reliability Maintenance effectiveness, spare factor, internal policy, vehicle reliability, route length, commercial speed, layovers Procurement policy, customer expectations, vehicle utilization, service specifications Interlining, off-peak service level, depot location Absenteeism, training, work conditions Salaries, holidays, work conditions Maintenance effectiveness and policy, fleet age, vehicle design and specifications Absenteeism, spare factors, vehicle reliability, bus priority Not applicable Training, reporting policy, recruitment, vehicle condition, maintenance effectiveness Safety culture, reporting policy, work conditions, training, recruitment Training, passenger education, vehicle type, reporting policy Vehicle weight, air-conditioning, training, vehicle/engine design and specifications Labor productivity/efficiency/effectiveness, commercial speed Operating cost, ridership, nonfare commercial revenue

180

organizations whose performance cannot be directly compared to that of other organizations based on the expected impact of external factors or conditions that cannot be managed in the short or medium term. These factors are listed in Table 1. The list in Table 1 is produced on the basis of 14 years of public transport benchmarking experience in general and the 4 years of bus-specific benchmarking experience within Imperial College London. Comments from busoperation managers helped to develop these factors. The list is not exhaustive. Further econometric research is planned to understand which factors affect urban bus operator performance and to what extent. Naturally, where variability in performance exists because of different internal policies or management decisions, the data have not been stratified. On the contrary, this is the variability that one would like to observe to be able to ask the interesting questions that can lead to the identification of best practices. Descriptive statistics are used to describe the variation of performance within the different KPIs after stratification. These descriptive statistics are number of bus operators in the sample (B), total number of observations sample size (N), mean (µ), amplitude and standard deviation (σ). To compare variability of performance within the different indicators, the standard deviation is divided by the mean to create a normalized standard deviation (σ/µ). The higher this number, the more variability of performance was measured in that particular sample. For example, when (σ/µ) = 0.05, then 95% of operators perform within 5% of the group average. When (σ/µ) = 0.30, then 95% of operators perform within 30% of the group average; in other words, performance is more variable. Most normalized standard deviations have values between 0 and 1; however, where the standard deviation is bigger than the mean, the normalized standard deviation will be greater than 1.0. After researching the variability between comparable performances of bus operators, it is necessary to research if this variability is real. Real variability in performance effectively means statistically significant differences, and analysis of variance (ANOVA) is used to determine this. ANOVA can be thought of as an extension of the t-test. The purpose of ANOVA is the same, to test the significance of differences between groups, but this test can be applied to a much broader range of situations and is not limited to the two-sample case. For ANOVA, the null hypothesis is that the populations from which the samples are drawn are equal on the characteristic of interest. Means calculated from random samples are unlikely to be exactly the same even if the null hypothesis is true because of error or chance fluctuations in measurement. The question being asked in ANOVA is not, “Are there differences among the groups?” but, “Are the differences among the groups large enough to justify a decision to reject the null hypothesis?” ANOVA proceeds by making comparisons of the amount of variation between groups (i.e., operators) with the amount of variation within groups. The greater the difference between operator performance relative to the difference within operator performance, the more likely the null hypothesis of “no difference” can be rejected. The ANOVA test statistic is called the F-ratio and it is the ratio of the between-group estimate of population variance to the withingroup estimate of population variance. The greater the variation between groups relative to the variation within the higher the F-ratio, the more likely the rejection of the null hypothesis. The precise names and definitions of the KPIs and the data items from which they are constructed cannot be described because of an IBBG confidentiality agreement. Moreover, the example graphs of KPIs have been anonymized and ranked. In Table 2, the amplitude

Transportation Research Record 2111

of the data (the value between the maximum and the minimum values in the sample) in each key performance indicator is shown, rather than the minimum and maximum values, which are more commonly used to describe a data set. However, for the purpose of this research, more-detailed descriptions of the KPI system and its data are not necessary. The important statistics needed to answer the research questions are the normalized standard deviations and sample size in the set of KPIs after data stratification and the ANOVA results.

RESULTS Variability in Performance Using IBBG KPIs After Data Stratification As described earlier, during the last 4 years much effort has been put into ensuring that the benchmarking output is as comparable as reasonably possible and desirable. These efforts resulted in a set of KPIs in which IBBG members have sufficient confidence. Members have used the results for setting realistic and objective targets, understanding on what to focus resources to get maximum performance improvements, and internal decision making. Furthermore, members use the IBBG results to explain performance relative to international peers in anonymized presentations to government, authorities, media, unions, and other stakeholders. Table 2 shows three types of capacity utilization indicators: average bus load, percent of total capacity used, and percent of seat capacity used. Within IBBG, after a pilot data comparison, the average bus load is no longer used to compare performance because of the substantial differences in average vehicle size. After stratification of the data on the basis of calculations of weighted average vehicle capacity, only 4 of 10 organizations have been found to be comparable. This sample is too small for comparison. Here, insufficient comparability was found between operators, and the indicator was removed from the key performance indicator system to be replaced with the more comparable indicators based on capacity kilometers. Another example of how the performance indicators have been gradually made more directly comparable is the example of “lost kilometers.” As shown in Table 2, the subset of KPIs shows two types of lost kilometer: those due to internal reasons and those due to external reasons. When IBBG began, only total lost kilometers were collected. Although easier to collect, and interesting background information in itself, the total lost kilometers were not adequate for performance comparison because of the impact of locally different external factors, such as congestion, demonstrations, road blocks, and snowfall. Currently, lost kilometers are collected for both internal reasons (see examples in Table 1), which are thought to be comparable, and external reasons as described. Lost kilometers due to external reasons are by default incomparable. When both average bus load and lost kilometers due to external reasons are not included for further analysis for the described reasons, there are 27 different KPIs in the subset for this study. To understand variability of directly comparable performance between operators, it was necessary to clean the data set used by IBBG in two steps: (a) remove extreme values that could not be explained and (b) remove data entries that were incomparable because of external factors or conditions that cannot be managed in the short or medium term. With 10 bus operators and 7 years of data, the maximum number of observations in most samples is N = 70. The number of observations in the first key performance area, change in ridership, is

Trompet, Anderson, and Graham

181

TABLE 2 Variability in Key Performance Indicators with Direct Comparable Organizations After Stratification, 2001–2007 Key Performance Area

B

N

µ

Amplitude

σ

σ/µ

% ridership changea Staff training Average bus load % capacity utilization: all % capacity utilization: seat % service availability % dynamic passenger information % accessibility: low floor % accessibility: ramp % peak-time spare factor Vehicle utilization, km Vehicle utilization, hour Fleet ageb % network efficiency Maintenance productivity, km Maintenance productivity, hour % staff absenteeism Vehicle reliability, km Vehicle reliability, hour % lost kilometers: internal % lost kilometers: external Vehicle accidents, km Vehicle accidents, hour Staff accidents Passenger accidents Fuel economy Operating costb, km Operating costb, hour Recovery ratio

10 8 4 7 10 10 10 10 10 10 10 8 10 8 9 8 8 8 8 9 0 9 7 9 9 7

10 45 23 34 48 64 63 57 52 64 63 45 59 52 52 41 44 50 46 52

6.7 9.6 16.79 25.55 45.21 97.12 4.70 62.34 52.88 86.07 47,485 3,246 8.00 84.53 78.59 5.15 6.78 4,915 359.89 0.94

22.54 7.62 1.84 3.61 10.55 2.69 10.08 25.61 26.89 3.37 7,090 551 3.23 2.79 34.03 1.82 2.11 1,836 126.89 0.69

3.38 0.79 0.11 0.14 0.23 0.03 2.14 0.41 0.51 0.04 0.15 0.17 0.40 0.03 0.43 0.35 0.31 0.37 0.35 0.73

47 32 50 51 31

0.60 0.89 63.36 2.52 0.0081

71.89 24.73 5.55 14.36 34.12 12.06 61.70 80.74 96.75 12.63 27,613 1,775 13.25 9.63 122.67 5.43 9.36 6,946 482.30 2.73 not applicable 1.06 1.46 174.56 5.91 0.0088

10 8 10

60 41 63

5.45 63.02 0.52

6.72 76.81 0.50

0.31 0.41 45.99 1.71 0.0022

0.52 0.46 0.73 0.68 0.28

1.98 24.41 0.13

0.36 0.39 0.25

NOTE: Factors in italics can be influenced both externally and internally. B = number of bus operators in sample; N = number of observations in sample; µ = sample average; amplitude = maximum value − minimum value; σ = standard deviation. σ/µ > 0.1 = 24/27 = 88.9%; σ/µ > 0.2 = 21/27 = 77.8%; σ/µ > 0.3 = 18/27 = 66.7%. a Ridership change over five years (2002–2007). b This information relates to 2000–2006.

N = 10, as this indicator already incorporates 5 years of data per organization. In this research N < 70 as bus operators have not always supplied data for all years, and through the process of data stratification additional data points were taken out of the sample. Therefore, the data sets used for the statistics are unbalanced panels, that is, equal numbers of years of data from each operator do not always appear in the sample. As shown in Table 2, the sample size ranges from 31 < N > 64, based on data from 7 > B < 10 bus operators. Admittedly, the process of stratification has been performed subjectively and by using qualitative experience with the IBBG data and comments from bus operator managers. However, in most cases, variability in performance is considered to be the result of real differences in performance due to differences in management decision and internal policies. The maximum number of organizations that needed to be removed from a sample was three, leaving a comparable sample of N = 7. This was the case in the capacity utilization and fuel economy indicators because of different types of capacity definitions. The results of variability in performance after stratification are presented in Table 2. To show differences in variability, two examples of KPI graphs are included.

Figure 1 shows the KPI “Mean hours between technical failures,” representing vehicle reliability. This is an example of an indicator that after stratification and being normalized in vehicle hours still shows an interesting variability (σ/µ = 0.35). An example of an indicator in which performance between operators is more equal (σ/µ = 0.03) is the service availability expressed by the KPI: “Actual revenue vehicle kilometers per scheduled revenue vehicle kilometers.” This example is shown in Figure 2. It is important to note that smaller differences in performance do not imply that there is no room for improvement and that no lessons can be learned. In the case of service availability, the amplitude (difference between the best and the worst performing operator) is still 12.06%. If one of the low performers could learn from other operators how to increase service availability by just 5%, that would be a substantial improvement in service quality from the customer’s point of view. Twenty-seven different KPIs were analyzed for this study, and the results are shown in Table 2. This subset shows a reasonably useful amount of variability in which 77.8% of KPIs have variability in the sample of σ/µ > 0.2 and 66.7% of KPIs have variability in the sample of σ/µ > 0.3. Average variability is σ/µ = 0.54. Without the indicators

182

Transportation Research Record 2111

INDEX

200 180 160 140 120 100 80 60 40 20 0 1

2

3

4

5

6

7

8

FIGURE 1 Example of variability in comparable performance after stratification: vehicle hours between technical failures, indexed to mean ⴝ 100.

normalized by vehicle kilometers, there are 22 different KPIs in the subset. In this situation, 77.3% of KPIs in the sample have variability of performance of σ/µ > 0.2 and 63.6% have variability in performance of σ/µ > 0.3. The average variability is σ/µ = 0.58. This indicates that in at least 77% of the researched KPIs, most bus operators perform in the range of 20% above or below the average performance. This provides evidence that there are sufficient differences between operator performance to be worthwhile to look for the reasons for these differences and to learn from the best performers.

cant. ANOVA is the verification tool used to determine this. As explained in the methodology section, ANOVA proceeds by making comparisons of the amount of variation between performances of operators with the amount of variation within the performance of operators. Since ANOVA can deal with unbalanced panels, an ANOVA has been produced for all 27 indicators, as shown in Table 3. These ANOVA tests confirm that the observed variability is in all cases significant. All ANOVAs show F > Fcrit with a significance of P < .001. Therefore it can be concluded that the differences in performance between operators are based on real differences.

Variability of Comparable Performance Within and Between Operators

CONCLUSIONS

Now that variability between comparable performances between bus operators has been established, it is necessary to understand if this variability is real, that is, whether the differences are statistically signifi-

Results from this study show that comparing urban bus operations through international benchmarking is both useful and justifiable. It is useful, because on the basis of stratified data from 2001 to 2007,

INDEX

105 100 95 90 85 80 75 70 65 60 1

2

3

4

5

6

7

8

9

10

FIGURE 2 Example of low variability in comparable performance after stratification: actual revenue vehicle kilometers per scheduled revenue vehicle kilometers, indexed to mean ⴝ 100.

Trompet, Anderson, and Graham

TABLE 3

183

ANOVA Results

Key Performance Area a

Ridership Staff training % capacity utilization: all % capacity utilization: seat % service availability % dynamic passenger information % accessibility: low floor % accessibility: ramp % peak-time spare factor Vehicle utilization, km Vehicle utilization, hour Fleet ageb % network efficiency Maintenance productivity, km Maintenance productivity, hour % staff absenteeism Vehicle reliability, km Vehicle reliability, hour % lost kilometers: internal Vehicle accidents, km Vehicle accidents, hour Staff accidents Passenger accidents Fuel economy Operating costb, km Operating costb, hour Recovery ratio

dfT

F

P-Value

68 44 33 47 63 62 56 51 63 62 44 58 51 51 40 43 49 45 51 46 31 49 50 28

312.84 17.13 43.72 139.39 11.15 9.20 14.11 21.72 21.16 500.91 282.52 16.23 151.84 190.94 171.14 32.40 21.20 16.13 10.36 169.69 127.05 71.38 79.93 60.26

.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000

59 40 62

178.36 216.27 139.31

.000 .000 .000

NOTE: dfT = total degrees of freedom; F = F-ratio: mean square between operators/mean square within operators; P = significance. a Not equal to % ridership change as used in Table 2. b This information relates to 2000–2006.

a sufficient variability (σ/µ > 0.2) of performance between comparable urban bus operators in most (77.8%) researched KPIs could be concluded. The ANOVA tests confirm that the observed variability is in all cases significant (F > Fcrit with P < 0.001), that is, based on real differences in performance. A sufficiently high variability in comparable performance will lead to interesting questions about why operators achieve these differences in performance. Answers to these questions can in turn lead to suggestions for improvements for operators. Furthermore, apart from the comparable performance differences between urban bus operators, the trend of each KPI within an operator is useful. Even when external factors affect performance, the trend of the KPI data within each operator is always based on the same grounds. Understanding these trends can provide valuable information about why and how performance has changed, which can lead to suggestion for improvements. The 11.1% of KPIs researched that show little variability (σ/µ < 0.1) between urban bus operators are useful in their own right. They indicate which areas are less promising for performance improvements and help operators to focus resources on those areas in which performance improvements are more likely. External factors and conditions that cannot be managed in the short or medium term and that cause direct incomparability of key performance indicator data between different international urban

bus operators were identified. Important factors taken into account in the data stratification process are differences in commercial speed, demand profile, vehicle capacity, vehicle weight, and reporting policies. Although direct comparison may not be possible because of the effect of these conditions, with sufficient understanding of these effects, an indirect (estimated) comparison may be possible. Within IBBG, such interpretation of the data occurs qualitatively. However, future studies by Imperial College London will research the extent to which these factors quantitatively affect urban bus operations. Different KPIs are affected by different external factors. This understanding is important: where Bus Operator 1 may not be comparable in KPI A, this operator may well be comparable in KPIs B through Z. Some KPIs are completely comparable (for example, vehicle accessibility: low floor). However, in many KPIs there are a number of operators for which additional information is necessary before an equal comparison can be made. In some cases, outlier operators should not be considered at all, and data must be stratified for this. International benchmarking of urban bus operations is also justifiable. Of a total of 10 bus organizations, in each of the KPIs after data stratification a critical mass of B ≥ 7 comparable bus operators could be distinguished. In most indicators (88.9%), the comparable sample is B ≥ 8. The maximum sample size is N = 70, and in 77.8% of indicators the sample size was N ≥ 45. This implies that in all KPIs, most operators can be compared directly. In combination with the observed variability, this makes benchmarking a worthwhile exercise. Of course, this is valid only on the condition that there is a sufficient critical mass of operators in the benchmarking comparison that exhibit similar operating characteristics and urban environments. On the basis of the preceding statistics, this number of operators is suggested to be eight or higher to ensure that there is a high probability of five or more comparable operators in the benchmark with which an operator can compare itself. This research shows that to compare the performance of an international group of medium and large urban bus operators, benchmarking is a useful and justifiable tool. The true usefulness of benchmarking will prevail only when the benchmarking results are disseminated throughout organizations and are used as a catalyst for change and to promote initiatives that can improve performance within the participating organizations.

ACKNOWLEDGMENTS The authors thank the members of the International Bus Benchmarking Group for permission to use the benchmarking data. The authors also acknowledge the work of Eric Randall, former project manager of the International Bus Benchmarking Group, who helped improve the data and KPIs used for this study.

REFERENCES 1. Fong, S., E. Cheng, and D. Ho. Benchmarking: A General Reading for Management Practitioners. Management Decision, Vol. 36, No. 6, 1998, pp. 407–418. 2. Lema, N., and A. Price. Benchmarking: Performance Improvement Towards Competitive Advantage. Journal of Management in Engineering, Vol. 11, No. 1, 1995, pp. 28–37. 3. EQUIP—Extending the Quality of Public Transport. DG TREN, Brussels, Belgium, March 2000.

184

4. Geerlings, H., R. Klementschitz, and C. Mulley. Development of a Methodology for Benchmarking Public Transportation Organizations: A Practical Tool Based on an Industry Sound Methodology. Journal of Cleaner Production, Vol. 14, 2006, pp. 113–123. 5. Mulley, C. Improving Efficiency as a Means to Improving Transport Quality. Proceedings of the Institution of Civil Engineers: Municipal Engineer, Vol. 157, 2004, pp. 17–24. 6. Vaglio, M. Creating the Conditions for Successful Benchmarking. RailBench, Brussels, 2003. 7. Gudmundsson, H., A. Wyatt, and L. Gordon. Benchmarking and Sustainable Transport Policy: Learning from the BEST Network. Transport Reviews, Vol. 25, No. 6, Nov. 2005, pp. 669–690. 8. Anderson, R. Lessons from an International Railway Benchmarking Study: Process and Benefits. Presented at 2001 APTA Rail Transit Conference, Boston, Mass., 2001. 9. Anderson, R. Metro Benchmarking Yields Tangible Benefits. European Rail Outlook, March 2006, pp. 22–25. 10. EQUIP: Extending the Quality of Public Transport. The Benchmarking Handbook. DG TREN, Brussels, Aug. 2000. 11. Randall, E. R., B. J. Condry, and M. Trompet. International Bus System Benchmarking: Performance Measurement Development, Challenges,

Transportation Research Record 2111

12. 13.

14. 15. 16. 17.

and Lessons Learned. Presented at 86th Annual Meeting of the Transportation Research Board, Washington, D.C., 2007. Hensher, D., and R. Daniels. Productivity Measurement in the Urban Bus Sector. Transport Policy, Vol. 2, 1995, pp. 179–195. Hofmann, M., and M. O’Mahony. The Impact of Adverse Weather Conditions on Urban Bus Performance Measures. Proc., 8th International IEEE Conference on Intelligent Transportation Systems, Vienna, Austria, 2005, pp. 431–436. Odeck, J. Congestion, Ownership, Region of Operation, and Scale: Their Impact on Bus Operator Performance in Norway. Socio-Economic Planning Sciences, Vol. 40, 2006, pp. 52–69. De Bruijn, H., R. Van Wendel de Joode, and H. Van der Voort. Potentials and Risks of Benchmarking. Journal of Environmental Assessment Policy and Management, Vol. 6, No. 3, 2004, pp. 289–309. Osborne, D., and T. Gaebler. Reinventing Government. Addison–Wesley, Reading, Mass., 1992. Kaplan, R. S., and D. Norton. The Balanced Scorecard: Measures That Drive Performance. Harvard Business Review, No. 92105, 1992, pp. 70–79.

The Transit Management and Performance Committee sponsored publication of this paper.

Suggest Documents