automation of remedy tickets categorization using ...

56 downloads 91639 Views 77KB Size Report
The whole process is automated using Business Intelligence Tools. ... A Remedy Tickets Clustering Application can be developed using BI Tools like ETL.
Dr. M Rajasekhara Babu et al. / International Journal of Engineering Science and Technology (IJEST)

AUTOMATION OF REMEDY TICKETS CATEGORIZATION USING BUSINESS INTELLIGENCE TOOLS DR. M RAJASEKHARA BABU SCSE, VIT University, Vellore, Tamil Nadu, India [email protected]

ANKITA TIWARI SCSE, VIT University, Vellore, Tamil Nadu, India [email protected] Abstract: The work log of an issue is often the primary source of information for predicting the cause. Mining patterns from work log is an important issue management task. This paper aims at developing an application which categorizes the issues into problem areas using a clustering algorithm. This algorithm helps one to cluster the issues by mining patterns from the work log files. Standard reports can be generated for the root cause analysis. The whole process is automated using Business Intelligence Tools. This paper can be helpful in minimizing the recurrence of issues by informing the technical decision makers about the impact of the issues on the system and thus providing a permanent fix. Keywords: Clustering; Parser; Comma Separated Value (CSV); Business Intelligence (BI); Extract Transform and Load (ETL); Reporting Tools; Root Cause Analysis. 1. Introduction In many organizations, IT related issues or incidents which are being faced by the users are logged in the form of Remedy Tickets using Incident Management Tool. An incident is any event that is not part of the standard operation of a service and causes, or may cause, an interruption to or reduction in the quality of that service [1]. The Incident Management Tool enables the users to record and manage the incidents within their area of operation. The logged tickets are routed to the respective teams and then the assigned team is responsible for solving the issue. Incident can be a Bug Fix, Service Request or Change Request. Incidents should be resolved as quickly as possible as this will restore the normal operations of the system and minimize the adverse impact on either the business or the user. Remedy Ticket is created by providing details like Ticket ID, Category, Type, Submitted By, Assigned To, Assigned Date, Resolved Date, Priority, Urgency, Description, Work Log, Summary etc. to Incident Management Tool. The data fields like Assigned To, Assigned Date, Submitted By, Resolved Date, and Work Log are filled by the person who finally solves the ticket. The data for the other fields is provided by the user at the time of Remedy Ticket creation. All the steps taken to resolve the ticket, including interaction with the user, are recorded in the Work Log. The Work Log, Summary and Description of the ticket can be used to find out the root cause of the issue. Repeat incidents cause unnecessary service disruptions, impact business operations and prevent organization from maximizing its productivity. We need to identify the underlying root cause of these recurring incidents. The permanent fix for the issues is provided by Problem Management team by a process called Root Cause Analysis. This papers depicts a clustering algorithm through which Remedy Tickets can be categorized automatically into the identified buckets (problem areas) based on the type of issue like Performance, Application, Server, Interface, Database etc. A Remedy Tickets Clustering Application can be developed using BI Tools like ETL Tool and Reporting Tools. This application will be helpful in minimizing the number of recurring incidents. The remainder of this paper is organized as follows-section 3 provides an overview of the related work on data clustering and feature mining. Section 4 describes the design of an automated parser which can cluster the

ISSN : 0975-5462

Vol. 4 No.06 June 2012

2591

Dr. M Rajasekhara Babu et al. / International Journal of Engineering Science and Technology (IJEST)

Remedy Tickets. Section 5 explains the implementation of the automated parser. Section 6 illustrates the results and analysis. Section 7 concludes the paper. 2. Related Work There are some related works on our problem. According to [1], many organizations have developed multitiered, IT support services delivered by help desks, network operations centers and engineering organizations. An incident or issue can cause an interruption to or reduction in the quality of the services. A problem is an unknown, underlying cause of one or more incidents. A single problem may generate several incidents. The root cause of the problem should be identified for providing the permanent fix. Our work in this paper has two main facets. First one is the categorization of Remedy Tickets using Clustering Algorithm and second one is the generation of standard reports to determine the root cause of the issue. The topic of text clustering has been extensively studied in many scientific disciplines and over the years a variety of different approaches have been developed [4] [6] [2]. In [3], a text clustering algorithm is described which involves constructing vector space model and representing documents by feature vectors. First, a set of keywords (or significant terms) are extracted from the document to form the feature vector. Second, each document is represented by the feature vector, which consists of frequency and weight statistics of all significant terms. Finally, clustering proceeds by measuring the similarity (usually a function of Euclidean distance) between documents and assigning documents to appropriate clusters. According to [5], from our past experience the important key features or keywords of all the problem areas can be determined. Event logs play an important role in modern IT systems, since they are an excellent source of information [9]. Using a text parser the keywords are mined from event logs [10]. Business Intelligence tools can be used for developing such automated parser for text clustering [7]. 3. Proposed System and Design In this section, we propose a clustering algorithm which plays an important role in categorizing the large number of Remedy Tickets into small number of meaningful clusters. Clustering Algorithm First, a set of features or keywords are extracted from the log files to form the feature vector. Second, each log file is represented by a column of keyword frequencies and all the columns form a “term frequency matrix”, say M. Specifically, the i,j-th entry, Mij , is the number of occurrences of keyword ti in log file dj [3] . We use a simple text parser for mining the keywords. Finally, clustering proceeds by finding out the keyword in each log file which is having the maximum frequency and creating appropriate clusters. Application Design We can develop a Remedy Tickets Clustering Application based on the clustering algorithm described above. Firstly, we find out the important keywords or patterns of all the problem areas from our past experience [5]. Hence, keywords belonging to each bucket are identified. For example, if a ticket details contain keywords like delay, production down etc. then the ticket belongs to “Performance” bucket. The bucket names and their keywords data is firstly stored in a Comma Separated Value or .CSV file format. CSV files are popularly known as “flat files”, since flat files contain a single table with finite number of rows and columns. Secondly, we pick up a Remedy Ticket Dump containing all the details of around 16000+ logged tickets from Incident Management Tool. This dump is also stored in Flat File format.

Fig. 1. Sample Remedy Ticket Dump CSV file

ISSN : 0975-5462

Vol. 4 No.06 June 2012

2592

Dr. M Rajasekhara Babu et al. / International Journal of Engineering Science and Technology (IJEST)

Having received the data, it will be processed and put into the database tables using Extract Transform and Load (ETL) tools. Informatica PowerCenter, Data Stage and SQL Server Integration Service are the most commonly used ETL tools for data transformation and analysis purpose [7]. Hence, the data from Remedy Ticket Dump and Bucket Keywords flat files are extracted, transformed and loaded into Remedy_Ticket and Bucket_Keywords Oracle database tables. Remedy_Ticket table has an extra column “Bucket Name” which stores the bucket name of each Remedy Ticket.s A text-parser which is capable of reading the file in a line structured way to find the count of keywords is designed. The text parser reads columns like work log, summary and description from the Remedy_Ticket table in the search of keywords present in Bucket_Keywords table. A score is calculated which is the number of times the keywords of the respective bucket is found out. Based on this score value the tickets are categorized into identified buckets. Reporting is done on top of this for Root Cause Analysis. Standard reports are generated showing the relation between bucket and number of tickets belonging to that bucket. Oracle Business Intelligence Enterprise Edition (OBIEE), IBM - Cognos and Hyperion - Essbase are popular BI Reporting tools. 4. Implementation The implementation of the Remedy Tickets Clustering Application can be divided into two parts. One is the categorization of Remedy Tickets into buckets or problem areas using Informatica PowerCenter (ETL Tool) by implementing clustering algorithm. The other part involves the generation of reports for the root cause analysis of the problems using OBIEE (Reporting Tool). These reports can be viewed as tables, line graphs, pie charts etc. In Informatica PowerCenter Designer, the source flat files i.e. Remedy Ticket Dump and Bucket Keywords are imported to Source Analyzer. Similarly, the target database tables i.e. Remedy_Ticket and Bucket_Keywords are imported to Target Designer. Mapping Designer is then used to create mappings for extracting data from source and loading into target after applying required transformations. Transformation helps in transforming the source data according to the requirements of target system and it also ensures the quality of the data being loaded into the target. Following mappings are created: (1) m_Populate_Bucket_Keywords: For loading the keywords or patterns from Bucket Keywords flat file into Bucket_Keywords Oracle table

Fig. 2. Mapping m_Populate_Bucket_Keywords

(2) m_Populate_Remedy_Ticket_Detail: For loading tickets data from Remedy Ticket Dump into Remedy _Ticket Oracle table. (3) m_Populate_Remedy_Ticket_Detail_New: The bucket name for each ticket is identified using clustering algorithm and stored in Bucket Name column of Remedy _Ticket Oracle table. (4) m_Remedy_Fact: For the creation of fact table which contains measure columns like number of tickets (# of tickets). For the implementation of clustering algorithm, the Oracle tables Remedy_Ticket and Bucket_Keywords are joined using OUTER JOIN operation on the discovery of the keywords in the work log or summary or description of the ticket details. Ticket Id and Bucket Name columns are then selected. The string function INSTR ( ) is used for searching the keywords. The result of above operation is then grouped by Ticket Id and Bucket Name columns using GROUP BY function. Text parser is then used for finding the score of the keywords with the help of aggregate function COUNT (*). For each output row, Bucket Name column value is firstly determined and the score of all keywords belonging to this bucket is calculated. Based on the highest score values, the tickets are clustered. For example, for Ticket Id 218780, bucket “APPLICATION” is having the highest score. Therefore, this ticket is assigned to “APPLICATION” bucket.

ISSN : 0975-5462

Vol. 4 No.06 June 2012

2593

Dr. M Rajasekhara Babu et al. / International Journal of Engineering Science and Technology (IJEST)

Fig. 3. Output generated by Clustering Algorithm

In Informatica PowerCenter Workflow Manager, workflows are created by defining a set of instructions to execute the created mappings. Using OBIEE, the reports can be created. These reports would facilitate the root cause analysis of different issues and thus help in providing a permanent fix. A repository (rpd) is created using fact and dimension tables in OBIEE Administration Tool. In OBIEE Presentation Services, the reports are formed using Answers Tool. These reports can be viewed as tables, line graphs, pie charts etc. Fig. 4. depicts a vertical bar graph between Bucket Name and Number of Tickets (# of tickets). We can see around 8500 tickets out of 16000 belong to Database cluster. The technical decision makers of the organization can take actions to minimize this number by rectifying the issues permanently.

Fig 4. Vertical Bar Graph

5. Result and Analysis Our Remedy Tickets Clustering Application efficiently detects, diagnoses, and resolves existing or potential problems that prevent the achievement of the desired availability and performance goals of the organization. Earlier, problem management team used to categorize Remedy Tickets manually. This manual clustering method is very tedious and time consuming. This application can automate the manual method and provide dynamic reporting for Root Cause Analysis of the problems. We picked up around 16000+ logged tickets details. We categorized them into identified problem areas like Application, Interface, Server, Database, Performance and User. OBIEE reports show that maximum number of issues i.e. around 8500 tickets is created because of database errors. We also note that around 5500 and 2000 tickets belong to Server and Application buckets. These figures can help problem management team in taking decisions for permanent problem resolution and prevent recurring tickets or incidents. 6. Conclusion and Future Work Effective problem management is essential to any organization’s productivity and profitability. Repeat incidents cause unnecessary service disruptions, impact the business operations and prevent the organization from maximizing its productivity. With the effective implementation of this paper one can resolve incident at the root. Our Remedy Ticket Clustering Application focuses only on mining the known patterns or keywords from log file. This helps one to find patterns that characterize the normal behavior of the system. However, this approach has several shortcomings. It is inefficient for mining longer patterns and it only focuses on finding known patterns or keywords, ignoring patterns of other sorts The mining of infrequent patterns is equally important,

ISSN : 0975-5462

Vol. 4 No.06 June 2012

2594

Dr. M Rajasekhara Babu et al. / International Journal of Engineering Science and Technology (IJEST)

since this might reveal anomalous events which represent unexpected behavior of the system, e.g., previously unknown fault conditions. In terms of future work, we are interested in implementing clustering method based on both frequent and infrequent patterns. References [1]

Victor Kapella, “A Framework for Incident and Problem Management”, International Network Services, April 2003, http://www.kwesthuba.co.za/downloads/02_ins_incident_management_0403.pdf [2] Document Clustering, http://en.wikipedia.org/wiki/Document_clustering [3] Bin He and Yongzheng Zhang, “Clustering Documents in Large Text Corpora”, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.119.8839.pdf [4] Yu-Bao Liu, Jia-Rong Cai, Jian Yin and Ada Wai-Chee Fu, “Clustering Text Data Streams”, Journal Of Computer Science And Technology, January 2008. [5] Samiran Ghosh , Saptarsi Goswami and Amlan Chakrabarti ,“Outlier detection from ETL Execution trace”, 3rd International Conference on Electronics Computer Technology, 2011. [6] A.K. Jain, M. N. Murty and P.J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, Vol. 31, No. 3, September 1999. [7] T.R.Gopalakrishnan Nair, Vithal. J. Sampagar, Suma V, Ezhilarasan Maharajan, “A Scheme for Automation of Telecom Data Processing for Business Application”, Swarm Evolutionary and Memetric Computing Conference (SEMCCO). [8] Informatica Tutorial, http://www.learnbi.com/informatica.html [9] Risto Vaarandi, “A Data Clustering Algorithm for Mining Patterns from Event Logs”, IEEE Workshop on IP Operations and Management, 2003. [10] Risto Vaarandi, “Tools and Techniques for Event Log Analysis”, A thesis submitted at Tallinn University of Technology, June 2005. [11] OBIEE Tutorial, http://www.oracle.com/technetwork/middleware/bi-enterprise-edition/tutorials/index.html

ISSN : 0975-5462

Vol. 4 No.06 June 2012

2595