Estimating and applying service request effort data ...

3 downloads 8534 Views 388KB Size Report
analysis and management, computer vision, business analytics and service management. She has .... number) of the consultant who handled the ticket.
22

Int. J. Business Process Integration and Management, Vol. 7, No. 1, 2014

Estimating and applying service request effort data in application management services Ying Li*, Ta-Hsin Li and Kaan Katircioglu IBM T.J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, 10598, USA E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] *Corresponding author Abstract: In application management services (AMS), high resource utilisation, effective resource planning and optimal assignment of service requests to resources are critical to success. Meeting these objectives requires a systematic and repeatable approach for determining the best way of measuring resource utilisation, assessing workload and assigning service requests. In this paper, we present a two-step approach to help achieve the above objectives. We first estimate the amount of effort that each resource likely spends on handling each service request (SR) based on a metadata model and a set of SR handling priority rules. Then, we proceed to measure resource utilisation and assess SR assignment process based on such effort data. Furthermore, we propose a maximum likelihood method to estimate the mean effort for a class of tickets based on some statistical assumptions. A simulation study has been conducted to validate its feasibility. Keywords: application management services; AMS; resource utilisation; resource planning; service requests; service request assignment; service request effort estimation. Reference to this paper should be made as follows: Li, Y., Li, T-H. and Katircioglu, K. (2014) ‘Estimating and applying service request effort data in application management services’, Int. J. Business Process Integration and Management, Vol. 7, No. 1, pp.22–33. Biographical notes: Ying Li has been a research staff member at IBM T.J. Watson Research Center since 2003. Her research interests include digital image processing, audiovisual content analysis and management, computer vision, business analytics and service management. She has authored around 60 peer-reviewed conference and journal papers, as well as six books and book chapters on various multimedia, computer vision and service analytics related topics. She obtained her MS and PhD degrees from the University of Southern California (USC) in 2001 and 2003, respectively. She is a senior member of IEEE. Ta-Hsin Li is a research staff member at IBM T.J. Watson Research Center. He received his PhD in Applied Mathematics from the University of Maryland, College Park. His main research interests include business analytics, time series analysis and statistical signal processing. He is a Fellow of the American Statistical Association and senior member of the Institute of Electrical and Electronic Engineers. Kaan Katircioglu is a Senior Research Scientist and a Manager at IBM T.J. Watson Research Center. He has 17 years of industry experience in operations research/management science. He joined IBM Research in 1996. He has over 35 scientific publications, book chapters and working papers. He appeared in many business and academic events as an invited speaker or panelist by well recognised organisations such as World Bank, NSF, academic institutions, and trade associations. His research areas include business analytics, optimisation, supply chain, sustainability and IT service management. He is a member of IBM Academy of Technology. This paper is a revised and expanded version of a paper entitled ‘Measuring and applying service request effort data in application management services’ presented at the IEEE International Conference on Services Computing, CA, USA, 27 June to 2 July 2013.

I

Introduction

As the number and complexity of applications grow within an organisation, application management, maintenance, and development tend to need more effort. In order to ensure

Copyright © 2014 Inderscience Enterprises Ltd.

availability, improve performance levels, and support mission-critical tasks, more and more businesses have recognised the importance of managing, integrating, and enhancing their application portfolios.

Estimating and applying service request effort data in application management services Effective management of application costs requires deep expertise and know-how, yet many companies do not find this within their core competency. Consequently, companies have turned to application management service (AMS) providers for assistance. AMS providers typically assume full responsibility for many of the application management tasks including application development, enhancement, testing, production maintenance and support. In order to control management costs, some providers establish well managed AMS centres in less expensive venues. However, good cost management takes much more than just moving to a low cost location. It requires well designed and executed management processes and analytics tools that help optimise planning and management of service operations. Over the last decade, AMS outsourcing grew to be a multi-billion dollar market and it is still growing. This gave service providers a good reason to invest in technologies that enable more efficient delivery of AMS. Supporting processes and methodologies have been developed. A good source of what is involved in AMS delivery can be found in Toigo (2001). A part of AMS providers’ responsibility is to plan and manage human resources (a.k.a. service consultants) in a client account. Resources have different levels of experience, skill sets and job roles and they could be located on-site or in remote geographies. Consequently, how to optimally plan resources and maximise their utilisation is critical for cost effective management of an account. In this paper, we will describe our recent work on estimating the amount of effort spent on each AMS service request (SR) based on a metadata model and a set of task handling priority rules. We then illustrate how to use the effort data to measure resource utilisation and assess the optimality of task assignment process. Furthermore, considering that there are likely undesired factors contributing to the estimated request effort, we propose a maximum likelihood method to estimate the mean effort for a class of tickets. Such group-level effort data can be utilised for optimal resource planning. Figure 1 shows the high-level block diagram of the proposed system. It is our hope that this approach can help AMS providers to better manage service demand and resources. Figure 1

A high-level block diagram of the proposed system (see online version for colours) Service request data

Mean effort estimation for a class of tickets

Effort estimation for service requests Effort data

Resource utilisation analysis

Assessment of the service request assignment process

23

Task effort measurement is not new and many references can be found in the literature. For instance, a general approach to infer the service time of different job class in a networked-server environment is proposed in Zhang et al. (2002), which is developed based on a queuing network optimisation technique. However, such approaches cannot be readily applied to AMS domain, as in this case, SRs have priorities that vary with problem severities, applications and other factors. Moreover, human resources can only work for a finite amount of time every day. Buco et al. (2012) propose to use a time volume capture (TVC) tool to measure the effort spent on each task in a large IT service delivery environment. TVC integrates several methods for effort measurement (Zandin, 2002), such as the pre-determined motion time systems (Zandin, 2002), activity sampling and analytical estimation (Thomas and Daily, 1983). Such a tool can capture effort data on tasks fairly accurately. However, since it has to be installed on each practitioner’s computer and works as a stop watch, it is time consuming, and can be perceived as intrusive. Therefore, its implementation is likely to have resistance. Assigning resources with specific skills to available jobs is a complex task. If not optimised, it can result in service performance and resource utilisation problems. A realistic formulation of this assignment problem requires one to model many practical constraints mentioned earlier. In Naveh et al. (2007), a novel approach is provided that uses constraint programming. Many realistic constraints and situations regarding job roles, skill levels, geographical locations, language requirements, and potential retraining are considered. For this paper, however, our focus is not on proposing a new approach for task assignment; instead, we propose a way to assess how good the existing task assignment process is. It is our hope that by monitoring this process, we can identify ways to improve the overall task handling performance. The remainder of the paper is organised as follows. Section 2 describes the kind of SR data that we use in our approach. Section 3 has the details of our approach on estimating effort for each SR. Section 4 demonstrates the usage of the effort data, specifically, for performing resource utilisation analysis and assessing the SR assignment process. Section 5 illustrates the statistical approach on estimating the mean effort for a class of tickets. Finally, Section 6 has our conclusions along with a short discussion of some practical considerations and future research directions.

2

Data description

Our main data source is the SR data, which is pulled from ticketing systems. Generally speaking, a SR could be related to production support and maintenance (i.e., application support), application development, enhancement and testing. Sometimes, a SR is also conveniently referred to as a ticket.

24

Y. Li et al.

Each SR, or ticket, consists of multiple attributes. The actual number of attributes could vary with different accounts depending on the ticket management tool as well as the way ticket data is recorded. Nevertheless, the ticket data almost always have the following attributes, which contain important information about each ticket. •

Ticket number, which is a unique serial number.



Ticket status, such as open, resolved, closed or other in-progress status.



Ticket open time, which indicates the time when the SR is received and logged.



Ticket resolve time, which indicates the time when the ticket problem is resolved.



Ticket close time, which indicates the time when the ticket is closed. A ticket is closed after the problem has been resolved and the client has acknowledged the solution.

3



Ticket severity, such as critical, high, medium and low. Ticket severity determines how a ticket should be handled. For instance, critical and high severity tickets may need to be immediately handled no matter when they have arrived.



Ticket category, which indicates specific modules within a specific application. To some extent, the ticket category indicates the skills needed to handle the ticket.



Assignee, which is the name (or the identification number) of the consultant who handled the ticket.



Assignment group, which indicates the team to which the assignee belongs.

Besides the above critical attributes, there could be some other attributes that share additional information about the tickets, for instance, information about the service level agreement (SLA) performance and assignees’ geographical locations. Figure 2

Another data source that can be used is the consultants’ skill set information. Since ticketing systems do not keep records of this we followed a practical approach where we assumed that ticket attributes indicate (hence stand for) the skill needs.

Effort estimation for SRs

Generally speaking, a SR usually goes through multiple stages such as in progress and resolved, during its whole life cycle from open to close. Figure 2 shows an example of such life cycle. However, while service management systems are able to record the timestamp of each such stage, timestamps do not necessarily indicate the actual amount of effort spent on each SR. There are two big challenges in estimating effort for each ticket: 1

The temporal duration between its open time and resolve time would potentially include non-business hours such as nights, weekends and holidays. Moreover, the time that the resource spends on nonticket work during this period such as training and meeting, should also be excluded from the effort.

2

When there are multiple tickets in the resource’s queue at a given time instant, we need to determine which ticket he or she is actually working on.

Our solution to the first challenge is to introduce a metadata model which allows users to define any business hours w.r.t. ticket handling, as well as entering resource work/shift schedules. Information about resources’ other activities such as meeting, training and vacation, can be gathered from other systems that manage resources’ planned hours. Our solution to the second challenge is to introduce various ticket handling priority rules, which determine the processing order of tickets assigned to the same resource. Details of the above two solutions are given below.

An example of a ticket’s life cycle from open to close (see online version for colours) Forwarded to / Queue

Open

Ticket

Assigned to

Open Time

Consultant

Resolve problem

Resolve Time

Resolved Ticket Close ticket

Close Time

Closed Ticket

Awaiting feedback

Client

Reply Awaiting confirmation Acknowledge

Estimating and applying service request effort data in application management services

3.1 Metadata model The metadata model is designed to capture any information that is related to or associated with either tickets or account operation data. Below, we describe the metadata needed for ticket effort calculation.

3.1.1 Business hour definition By business hour, we mean the time window during which the consultant is working, for instance, from 9 am to 6 pm during weekdays. Nevertheless, with the trend of global sourcing, the consultants could be physically located in different countries even for the same account. Consequently, many of them will have different work schedules in different time zones. On the other hand, the business hours could also be dependent on application criticality and/or severity of the SRs. For instance, if a ticket is issued against a business-critical application, then a 24 × 7 time window could be requested. In short, different client accounts will likely have different business hour definitions, which could range from very simple to very complicated. Table 1

An example of ‘business-hours-attributes’

Field Geography Severity

To address this issue, we first define a metadata table called business-hours-attributes which, as the name implies, specifies the attributes that determine the actual business hours. An example of such table is shown in Table 1, where geography and severity are indicated as the key attributes. Table 2 Geography

25

We then define the actual business hours in business-hours table. Table 2 shows an example of such table. Here, the first several columns indicate the attributes defined in the business-hours-attributes table. In this case, they are geography and severity as defined in Table 1. The next four columns specify the starting and ending hours, the corresponding time zone, and the business days. Let’s take the first data row as an example. When the geography is North America (NA) and the severity of ticket is 1, the required business hour will start at 0 and end at 24, i.e., a 24-hour window. The time zone is CST and the work days range from Monday to Sunday, i.e., the entire week. In short, it defines a 24 × 7 time window for the case of ‘geography = NA and severity = 1’. When the severity drops to 3 or 4, a less stringent work schedule is defined. For instance, it is 10 × 5 for the case of ‘geography = EU and severity = 3’.

3.1.2 Consultant’s timeline definition Consultants are likely to work with different shifts, which are necessary considering that certain tickets need to be handled around the clock. Consequently, we have defined another metadata table called ‘employee-timeline’, to capture such information. Table 3 shows an example of such table. Here, one single consultant could have multiple rows with each row specifying one consistent work shift. For instance, the first row indicates that resource ‘J. Smith’ has worked five days in the week of 4/1/2012 with each day starting at 8 am and ending at 5 pm. Then, during the week of 4/8/2012, he switched to a late shift starting from 5 pm and ending at 2 am, and worked six days in that week.

An example of ‘business-hours’ metadata table w.r.t. the attribute definition in Table 1 Severity

Start

End

Time zone

Days

NA

1

0:00

24:00

CST

MTuWThFSaSu

NA

2

0:00

24:00

CST

MTuWThFSaSu

NA

3

7:00

17:30

CST

MTuWThF

NA

4

7:00

17:30

CST

MTuWThF

EU

1

0:00

24:00

CET

MTuWThFSaSu

EU

2

0:00

24:00

CET

MTuWThFSaSu

EU

3

8:00

18:00

CET

MTuWThF

EU

4

8:00

18:00

CET

MTuWThF

Table 3 Name

An example of ‘employee-timeline’ metadata table From

To

Work day start

Work day end

Time zone

Work days

J. Smith

4/1/2012

4/7/2012

8:00

17:00

CST

MTuWThF

J. Smith

4/8/2012

4/14/2012

17:00

2:00

CST

MTuWThFSa

R. Clark

4/1/2012

4/7/2012

17:00

2:00

CST

MTuWThF

26

Y. Li et al.

Generally speaking, account management would normally maintain such shift schedule for its consultants. However, it is not uncommon that such information is scattered around, and is not necessarily in the desired format. So, it is very important to work closely with the account management team to get all necessary information. In the worst case, when no such shift data is available, we assume that everyone works with one single shift.

be evened out eventually when we use it to calculate resource utilisation. Below are the five steps used in calculating ticket effort. 1

For each consultant or assignee A, identify the list of tickets that were handled and resolved by him/her over the entire ticket dataset. Denote them by Γ = {T1, … TN}.

2

For each ticket Ti ∈ Γ, retrieve its open time and resolve time. Then, for every minute within this time range, check if it is within the defined business hour for this particular ticket and for assignee A. If yes, we term it as a working minute (WM); otherwise, an idle minute (IM). This step essentially eliminates any non-business hours/minutes from the ticket’s life cycle.

3

Align the time durations of the N tickets along the timeline and examine their temporal overlap status. Then for every working minute of ticket Ti, check if it is also a working minute for other tickets. If it is yes to other M tickets, then the effective portion of this minute which contributes to the effort on Ti, is measured as 1/M (i.e., an equal fraction of the minute). This essentially resolves the double-counting problem.

4

For each working minute of every ticket, measure its effective portion that contributes to the ticket effort.

5

Finally, for each ticket, sum up the effective time portion of all of its working minutes. This gives us the estimated amount of time spent on this ticket (i.e., the effort).

3.2 Ticket handling priority rules When there are multiple tickets in a consultant’s queue, he/she needs to decide the processing order according to certain ticket handling priority rules. Depending on the way an account operates, many different priority rules can be applied. 1

First in and first out (FIFO), i.e., tickets that arrived earlier get processed first. This is the simplest and most straightforward approach.

2

Severity-based, i.e., higher severity tickets get processed first than lower severity ones. This also implies that if the consultant is already in the middle of handling a ticket, while another ticket of higher severity arrives, he stops working on the previous ticket and starts handling the new ticket. This makes sense since tickets of higher severity usually have more stringent SLA requirements.

3

4

Equal time share, i.e., every active ticket in a consultant’s queue gets an equal share of his time during a certain time period. This is where the multi-tasking coming into play. Any customer-specific rules, for instance, tickets of a particular type or application might need to be handled with a higher priority.

Once the ticket processing rule is clear, we can subsequently relate the consultant’s time with individual tickets, and from there we will be able to estimate each ticket’s effort. The detailed algorithm is described below.

3.3 Ticket effort estimation It is true that sometimes tickets of higher severity have a higher processing priority, but it is not always the case. In fact, many factors (explicit or implicit, static or dynamic) can affect a ticket’s processing priority which cannot be simply regulated by one or two rules. Perhaps, only the consultant can explain why he/she worked on a specific ticket at a specific time. For us, trying to sort that out based on ticket data would be very challenging, if not impossible. Consequently, assuming that all tickets handled by the same employee get a fair share of his/her time for a certain time period seems to better fit most of scenarios. It will also smooth out many complications which are inherent to the ticket management process. While this approach may not give us the most accurate effort information, the offsets will

Figure 3 illustrates the process of effort estimation. Here, two tickets are temporally aligned. Assuming that each square indicates 1 minute, we see that Ticket 1 takes six minutes in total, while Ticket 2 takes 4. On the other hand, there is 1 idle minute for Ticket 1 and 2, respectively, which are indicated by greyed squares. Now, by examining the temporal overlapping between these two tickets, we see that two of Ticket 1’s working minutes, as indicated by blue squares, overlap with the first two working minutes of Ticket 2. Consequently, the effective portion of each of these four working minutes is ½. Now, if we add up the effective time of each working minute for Ticket 1, we get 1 + 1 + ½ + ½ + 1 = 4. Figure 3

Ticket 1:

An illustration of ticket effort estimation (see online version for colours) 1

1

1/2

1/2

Ticket 2:

1/2

1/2

1

Total effort: 4 1

Total effort: 2

In summary, this approach essentially slices a ticket’s time duration into many small slots, then for each slot, it determines if the consultant is working on the ticket or not, and with how much effort. Note that although here we are using 1 minute as the smallest slot unit, any other integer number can be used (e.g., 30 minutes.)

Estimating and applying service request effort data in application management services There are three additional things that could affect the ticket effort estimations. 1

Consultant’s ‘legitimate idle time’ such as the time on training, meeting, vacation, etc. Generally speaking, such information is usually recorded in employees’ labour claim database. If this data is provided, this type of idle time must be excluded from a ticket’s time duration.

2

Consultant’s other non-ticket work. Typically, a customer account has both ticket and non-ticket work, where non-ticket work includes application development and enhancement. Employees are normally shared by these two types of work. Consequently, it will be important to understand a consultant’s time allocation between ticket and non-ticket work, as well as their priority order. Depending on the answers, we need to make different adjustments to our effort calculation algorithm. For instance, if ticket work always has a higher priority than non-ticket work, then the current algorithm should work just fine; otherwise, we either need detailed information on when the consultant is working on each item, or we prorate the effort amount with a given time allocation percentage.

3

The time period that a resource put a ticket on hold waiting for client’s feedback. In reality, unless such waiting time is explicitly logged in the ticketing tool, there is no way to gather such time information. We will address this issue in Section 5.

4

Leveraging the ticket effort data

Once the ticket effort is estimated, we can use it to derive deeper insights. For instance, we can examine how each resource is being utilised. We can also assess the existing ticket assignment process to see if every ticket was assigned to the right person. Figure 4

27

4.1 Resource utilisation analysis By resource utilisation (U), we mean the percentage of time that each assignee A spends on ticket work (TW), out of his/her total legitimate work hours (or capacity CP). That is, U = TW/CP. To measure TW, we simply sum up the efforts of all tickets handled by assignee A. To measure CP, we first identify A’s active working period by setting its starting and ending points as the earliest and latest dates of all A’s tickets, respectively. Alternatively, if the employee-timeline data is provided, we can get more accurate period information from there. Then, we determine A’s total amount of legitimate work hours within that active period. Again, more accurate result can be obtained if we have the employee-timeline data. Otherwise, we assume that each employee works 8 hours a day and five days a week. Such information, however, can be refined based on the account’s input. In fact, we have defined another metadata table to capture such data. We would like to point out that this calculation only captures a resource’s utilisation on ticket work. In case the resource also worked on application development and enhancement, then his/her ‘real’ utilisation would be higher. Once the utilisation data is obtained for all assignees, the account manager can use it to balance workload, find out potential reasons on why certain people are poorly utilised, and plan cross-skill and/or up-skill training. We implemented this methodology successfully in dozens of AMS accounts in various industries. The feedback was positive. We also conducted a comparative analysis where our utilisation estimates were compared to the actual results from the accounts, performed by the account teams themselves. We identified similarities and commonalities and showed the effectiveness and accuracy of our approach. Figure 4 shows an example of resource utilisation distribution of a particular account. The figure shows there are a couple of people over-utilised (due to possibly working overtime), while around a dozen of people with utilisations below 50%.

An illustration of resource utilisation distribution (see online version for colours)

Assignees

28

Y. Li et al.

A breakdown of this distribution is shown in Table 4, where we see that around 70% of resources have more than 80% utilisation rate. On the other hand, there are 4 employees whose utilisations are below 20%. Clearly, close attention should be paid to them to find out the reasons. The account manager should check if these people mainly work on application development rather than tickets; or if the applications/business streams in which they are involved use a separate ticketing tool. In some cases, team leads have duties other than ticket handling and this can make their utilisation appear to be low. In other cases, these resources may be inexperienced, or only have limited skills for certain types of tickets. In this case, up-skilling and cross-skilling them would be helpful. Table 4

A breakdown of resource utilisation distribution

Utilisation range

Headcount

Percentage

>=1

15

24.59%

(0.8, 1)

29

47.54%

(0.6, 0.8)

6

9.84%

(0.4, 0.6)

4

6.56%

(0.2, 0.4)

3

4.92%

(0, 0.2)

4

6.56%

Sum

61

100.00%

Once the individual resource utilisations are calculated, they can be aggregated to measure the utilisation for groups, countries, customer sectors, etc. Figure 5 shows an example where we see that certain countries (e.g., Country B) have much larger average resource utilisation rates than the others (e.g., Country E). In this case, the account manager might want to find out the potential reasons that lead to unbalanced utilisation. Some adjustment on the staffing in different countries might be necessary. Figure 5

Resource utilisation per staffing country (see online version for colours)

4.2 Assessment of ticket assignment process Let us first briefly describe the typical process for handling a SR. When a customer calls in to report a problem, the help

desk (level 1 service) first attempts to resolve it online. If the problem remains, then the help desk consultant files a ticket and assigns the ticket to a group. The ticket then gets routed to the designated group, and the group leader reviews the ticket by reading its problem description. Based on this review, he/she either assigns the ticket to one of resources in the group returns the ticket (to be reassigned) if he/she believes that the ticket needs to be handled by another group. Note that the reassignment can continue for multiple rounds. It is also possible that the assigned consultant rejects the ticket due to the skill-problem mismatch or by any other unforeseen reasons. Consequently, it is important that every ticket is assigned to the right people with the right skills. Incorrect assignment will result in frequent ticket re-routing, extended resolution time, or poor resource utilisation. In this work, we propose an open framework for evaluating the optimality of such ticket assignment process based on a set of assessment rules. Specifically, the input to the framework is historical ticket data and the effort data. The output includes an indication score for each ticket signifying if assigning the ticket to the designated assignee was the best decision to make at that particular time instant. Figure 6 shows the high-level architecture of the proposed framework. Specifically, given a ticket T0, we first apply one of the following rules to find the best candidate E from the list of assignees who have the necessary skills to handle T0. 1

Leading time-based, where the leading time is defined as the time that each assignee needs to finish existing tickets in his/her queue, plus the expected time to complete ticket T0. The one who has the shortest leading time is considered as the optimal candidate for assigning the ticket.

2

Load-based, where the assignee that has the lightest load in his queue is considered as the optimal candidate.

3

Skill-based, where the best candidate has just enough skills for handling T0.

4

Cost-based, where the assignee with the least cost is considered as the best candidate. In this case, the cost is measured as assignee’s hourly rate multiplied by the time that he/she needs to handle the ticket

5

Availability-based, where whoever is available at that particular time will be assigned to handle T0.

These rules can also be combined to make a joint decision. Note that any other rules specific to an account can also be easily incorporated into this open framework. Next, we check if the identified candidate E is the same assignee who actually handled ticket T0. If yes, we consider this ticket assignment as a good one; otherwise, it is marked as a poor assignment. Once all tickets have been assessed, we measure a metric called loss-of-efficiency to capture the overall efficiency of the task assignment process.

Estimating and applying service request effort data in application management services Figure 6

A high-level architecture of the proposed ticket assignment assessment framework (see online version for colours)

will need to complete all the backlogs. This is done as follows. a For each queued ticket, retrieve its ticket category and Ei’s average amount of effort for handling such category. b Sum up Ei’s efforts for all queued tickets, and denote it as LTi. c Measure the amount of time that he/she has already spent on handing the tickets in the queue, and denote it as ETi. This can be calculated as the amount of business time elapsed between the earliest-dated ticket in the queue and D0. d Subtract ETi from LTi. By now, LTi indicates the amount of time that Ei will take to clean up his/her backlogs.

Ticket T0

Find the best assignee candidate E to handle T0, based on the following rules: 1. Leading-time-based 2. Load-based 3. Skill-based 4. Cost-based 5. Availability-based

Is candidate E the same as the assignee that handled T0?

Yes

A good assignment

4

Retrieve the average amount of effort that Ei will take to handle C0, and add it to LTi. By now, LTi indicates the leading time that assignee Ei would need to complete T0.

5

Following the same process, measure the leading time for all assignees in set Φ\O for T0, and identify the one who has the minimal value. That is, find EJ where

No

J = arg min i ( LTi )

A poor assignment

Below we give details about a specific assessment approach we implemented using the shortest leading time rule. As we mentioned earlier, the assessment is an off-line process and it takes all ticket data as an input. We then collect the following information from the data for each assignee. •

the ticket categories that the assignee is able to handle, which as we mentioned earlier, will be used to indicate the assignee’s skills



the average amount of effort that the assignee takes to handle each ticket category



the active working period of the assignee, which will be used to determine if a specific person is active or available to handle tickets on a specific day.

Now, assume that we have a ticket T0 with category C0, assigned to assignee E0, and arrived on date D0. We assess such ticket assignment in the following seven steps. 1

Find all assignees that are capable of handling category C0. Denote them as set Φ.

2

Identify the assignees within set Φ that are not active yet, or, have already been deactivated, on date D0. Denote them as set O. Consequently, set Φ\O indicates the list of assignees that are capable and also available to handle T0 on date D0. Denote it to be Φ\O = {E1, …, Ee}.

3

For every assignee Ei in set Φ\O, examine the work load accumulation by date D0. In other words, check the number of tickets in his/her queue by that date. If the queue is not empty, then calculate the time that he/she

29

6

Check if EJ indicates the same assignee as E0 who is the assignee that actually handled ticket T0. If yes, we consider this ticket assignment as a good one; otherwise, it is a poor assignment. Furthermore, we measure the loss of ticket handling time for this assignment as (LT0–LTJ), where LT0 and LTJ are E0’s and EJ’s leading time on completing T0, respectively. For the purpose of measuring the loss-of-efficiency in the end, we record LT0 and LTJ for all tickets, and term them as materialised ticket effort and desired ticket effort, respectively. Note that theoretically E0 should refer to one of the assignees in set Φ\O (i.e., {E1, …, Ee}), yet it is specifically singled out for the convenience of illustration.

7

Once all tickets have been assessed, we sum up the number of good and poor assignments and use them as performance indicators of the ticket assignment process. Moreover, to measure the loss-of-efficiency, we first sum up the materialised ticket effort and desired ticket effort for all tickets, and denote them as MTE and DTE, respectively. Then we calculate the loss-of-efficiency as: Loss -of -efficiency =

( MTE

– DTE ) / DTE

As we can see, the smaller the loss-of-efficiency, the more efficient or optimal the ticket assignment is. There are several dynamic factors that could potentially affect the accuracy of the proposed assessment approach. Some of them are listed below.

30

Y. Li et al.



The actual amount of time that each assignee spends on handling a ticket could vary substantially, even for tickets within the same category. Hence, using the average effort in measuring the leading time might not reflect the true status.



It is possible that some tickets in assignee’s queue are on hold (i.e., the assignee needs customer’s response in order to proceed on the ticket). Consequently, such tickets should be disregarded when considering the assignee’s availability or calculating his/her leading time on a given ticket.



The ticket handling priority rule can play an important role in the ticket assignment process. For instance, if a ticket of higher severity should be processed with a higher priority, then a newly arrived ticket could possibly bypass existing tickets in the assignee’s queue. Consequently, our proposed leading time calculation approach needs to be adjusted.



The actual availability and capability of each assignee at time D0 could be different from our assumption. For instance, a potential candidate might be on vacation around D0 although he is generally available. Moreover, assignees’ skills could change over time due to crossskill training.

Nevertheless, what we proposed here is a way to monitor the existing ticket assignment process. The methodology itself is open to include as many business rules or information as possible. The more information we have, the better we are able to recover the real situation when the ticket T0 arrives, and the more accurate the assessment will be. The proposed methodology has been applied to the same accounts as in utilisation analysis to assess their current task assignment processes. Our analysis shows that while the assessment accuracy could be potentially affected by various factors, as we discussed above, the assessment outcome has been acknowledged by the account teams. One example of such assessment outcome is shown in Table 5. Here we see that around 56% of ticket assignments have been considered as poor assignment. Moreover, the loss-of-efficiency is slightly above 1.0, meaning that around half of the actually spent ticket effort could have been saved, if all tickets were assigned to the best possible candidate. Table 5

An example output of the proposed ticket assignment assessment

Measurement

Value

Number of good assignment

187

Number of poor assignment

244

Percentage of poor assignment

56.6%

Sum of desired ticket effort

4,951.23 (hour)

Sum of materialised ticket effort

10,252.8 (hour)

Loss-of-efficiency

1.07

It is our hope that by monitoring the task assignment process through the methodology we introduced here, we can contribute to task handling performance improvements in AMS practice.

5

Mean effort estimation for a class of tickets

As shown in Figure 2, a consultant can potentially put a ticket on hold if he needs further information from the client for ticket resolution. Such waiting time, however, is usually not consistently logged in ticketing tools – in fact, for most of time it is missing. Consequently, when there is indeed a waiting period for a particular ticket during its life cycle, the effort estimated using our aforementioned approach would be inaccurate as it did not take such waiting time into account. We have thus made an effort to apply a statistical model to estimate the mean effort for a class of tickets. While it is not at the individual ticket level, such group-level effort information could be very useful for account leaders to optimally plan their staffing. Specifically, we propose a maximum likelihood method to achieve the above goal, based on the following two assumptions: 1

a ticket’s effort duration as estimated in Section 3 comprises of two segments of unknown length, one of which is the true effort

2

the two time segments can be modelled as independent random variables, distributed according to certain probabilistic models with unknown parameters.

We describe the detailed algorithm below. Let the estimated effort duration of a ticket be denoted by a random variable Z, and let Z = X +Y,

where X indicates its true effort and Y is the remaining time. We further assume that X and Y are independent random variables with probability densities f(⋅|θ) and g(⋅|η), where θ and η are unknown parameters. Now, given a class of N tickets whose effort durations are denoted as z1, …, zN, we apply the maximum likelihood estimator (MLE) of θ and η (Casella and Berger, 2001) to maximise the following function A N (θ ,η ) =

N

∏ = ( z | θ ,η ) , i

i =1

where =(⋅ | θ ,η ) is the convolution of f(⋅|θ) and g(⋅|η), i.e., = ( z | θ ,η ) =



z

0

f ( x | θ ) g ( z − x | η )dx.

Now, if we assume that the true effort data of a small number of tickets is available, and denote them as {xN+1, …, xN+M} where M  N. Then, combining this M true effort samples with the N estimated effort durations will give us an MLE of θ and η, which maximises

Estimating and applying service request effort data in application management services

A N , M (θ ,η ) =

N



= ( zi | θ ,η )

i =1

M

∏ f (x

N+ j

)

|θ .

j =1

As we can see, a naive estimator of θ, denoted by θ, can be obtained from the true effort sample data by maximising M

∏ f (x

N+ j

| θ ). Consequently, the corresponding estimator

j =1

of η, denoted as η , can be obtained by maximising A N (θ,η ). However, since M is usually very small, these naive estimators are not expected to be accurate. Nonetheless, we can leverage them as initial values for the maximisation of AN,M(θ,η). Finally, with θˆ denoting the final estimate of θ, the mean effort of the class of N tickets can be estimated as

∫ (

)

In the following, we present a simulation result to show the feasibility and superiority of the proposed approach. Let us assume that f(x|θ) and g(y|θ) are exponential distributions. The exponential models have been widely used in queuing models for performance evaluation of IT services (Gross et al., 2008). Specifically, we denote f(x|θ) = θexp(–θx) and 1 and the g(y|η) = ηexp(–ηy), thus the mean of X equals

θ

mean of Y equals

1

η

. We then generate a set of simulated

data with N = 90, M = 10, θ = 5 and η = 1, and feed these data to the likelihood function AN,M(θ,η). Figure 7 shows its contour plot, from which we see that the likelihood function has a well-defined maximum near the true values of θ and η Figure 7

(i.e., 5 and 1, respectively). This suggests that a standard optimisation routine can be used to calculate the maximiser based on suitable initial values. Figure 8 shows the root mean-square error (RMSE), calculated on the basis of 5,000 Monte Carlo runs, of the proposed estimator for θ and η based on different size M of the effort sample, which is shown as a fraction of the total sample size N + M (up to 20%). It also shows the corresponding RMSE of the naive estimates θ and η , where 1

θ = M

−1



,

M j =1

xN + j

and

η =

μˆ = xf x | θˆ dx.

31

1 N −1



N i =1

zi − M −1



M j =1

. xN + j

As we can see from Figure 8, a great improvement in accuracy has been achieved by the proposed estimator over the naive alternative especially when M is small relative to N. Note that without additional information the problem of estimating the mean effort from the estimated effort duration has an inherent ambiguity because it is impossible to tell which of the two exponential distributions is associated with the effort. The naive estimator has helped establish the fact that the parameter θ for the effort is greater than the parameter η for the remaining time, as the maximiser of the likelihood function is obtained in its neighbourhood.

Contour plot of the likelihood function for a simulated dataset with exponential distributions (N = 90, M = 10, θ = 5 and η = 1) (see online version for colours)

32

Y. Li et al.

Figure 8

Root mean-square error (RMSE) as a function of the fraction M/(N+M) for estimating the parameters of exponential distributions (N + M = 100, θ = 5,η = 1) (see online version for colours)

Notes: Dotted lines indicate naive estimates and solid lines indicate final estimates. The two lines on top are RMSE for estimating θ and the two lines on bottom are RMSE for estimating η.

We would like to point out here that in practice, the number of effort-only data (M) needed for maximising AN,M(θ, η), can be extracted from tickets with effort information or approximated by an identifiable set of tickets that most likely require uninterrupted work (e.g., high-priority tickets). Also, to ensure the meaningfulness of the mean effort estimation, all effort samples should come from a ticket class with similar underlying complexity so that the estimated effort duration data and true effort data can be well represented by the probabilistic models. Needless to say, the ticket class needs to be consistent with business requirements and constraints such as the organisational structure, ticket type, etc., in order for the mean effort to be useful in performance measurement and resource planning.

6

Discussion and future research

This paper presented our recent work on deriving deeper insights about the operational performance in AMS delivery practice. Particularly, we have proposed an analytical method that estimates effort spent on each AMS SR, calculates resource utilisations and assesses the effectiveness of ticket assignment process. We have further proposed a maximum likelihood approach to estimate the mean effort for a class of tickets for the purpose of resource planning. Our initial study with a few dozen accounts has received positive feedback. We should mention a number of practical considerations regarding our approach. Due to various factors such as the co-existence of ticket and non-ticket work, and the potential lack of accurate daily schedule of employees, the measured resource utilisation may not capture the true utilisation of

each employee. Likewise, the proposed ticket assignment assessment is a rough estimate, although it is our best effort. Nevertheless, in an AMS operation which involves several hundred resources, large number of groups and many applications, some critical information and good opportunities can easily be missed by managers without the help of tools that use techniques such as ours. A simple report on resource utilisation distribution can give valuable insights to account managers about their resources. In fact, understanding resource utilisation, knowing top and bottom performers and their reasons can result in actionable insights. On the other hand, some simple statistics on the ticket assignment evaluation would also help the account managers to assess and improve the existing ticket routing process. The ultimate decision comes from the management as they bring their experience and irreplaceable human judgment into the process. However, tools such as the one we proposed here, can make the puzzle much easier to solve. Moreover, they can also provide valuable information when the account managers run cross-account analysis, which is another important topic in service operations and management. For our future work, we are hoping to leverage more data sources (e.g., the employee’s labour claim data) to improve the accuracy of both effort and resource utilisation analysis. Moreover, we shall design ways to validate the means effort results achieved using the maximum likelihood approach. We have already identified a few customer accounts that actually provide ticket effort data, so a fair comparison between real effort and estimated effort could be conducted. We will report our progress in our future publications.

Estimating and applying service request effort data in application management services

References Buco, M., Rosu, D., Meliksetian, D., Wu, F. and Anerousis, N. (2012) ‘Effort instrumentation and management in service delivery environments’, 8th International Conference on Network and Service Management. Casella, G. and Berger, R.L. (2001) Statistical Inference, 2nd ed., Duxbury Press, Connecticut, USA. Gross, D., Harris, C.M., Shortle, J.F. and Thompson, J.M. (2008) Fundamentals of Queueing Theory, 4th ed., WileyInterscience, New Jersey. Naveh, Y., Richter, Y., Altshuler, Y., Gresh, D.L. and Connors, D.P. (2007) ‘Workforce optimization: identification and assignment of professional workers using constraint programming’, IBM J. Res. & Dev., Vo 51, Nos. 3/4, pp.263–279.

33

Thomas, H. and Daily, J. (1983) ‘Crew performance measurement via activity sampling’, Journal of Construction Engineering and Management, Vol. 109, No. 3, pp.309–320. Toigo, J.W. (2001) The Essential Guide to Application Service Providers, Prentice Hall, PTR, Upper Saddle River. Zandin, K.B. (2002) Most Work Measurement Systems, 3rd ed., CRC Press, New York. Zhang, L., Xia, C., Squillante, M. and Mills III, W.N. (2002) ‘workload service requirements analysis: a queueing network optimization approach’, Proceedings of the 10th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Fort Worth, Texas.

Suggest Documents