We carried out a survey of the main market tools for database monitoring. ... Fathom, this is done by capturing alert messages of imminent errors before they.
DBSitter: An Intelligent Tool for Database Administration Adriana Carneiro, Rômulo Passos, Rosalie Belian, Thiago Costa, Patrícia Tedesco and Ana Carolina Salgado Universidade Federal de Pernambuco Centro de Informática Recife – PE Brasil Phone: +55 8132718430 E-mail: {apcc, ranop, rbb, tac, pcart, acs}@cin.ufpe.br Abstract: Database administration routines involve manual or semi-automatic methods for monitoring the environment and solving problems that arise. In this work, we propose a novel alternative approach to this task, which unifies two Artificial Intelligence techniques: Case Based Reasoning and Intelligent Agents. Thus, DBSitter consists of a Multi-Agent system for automatic monitoring and fault correcting in a Database environment. In searching for better solutions, the system can also interact with the Database Administrator. In DBSitter problems are represented as Cases, and the main functionalities are failure prediction and adaptation of problem-solving capabilities through the learning mechanism.
1
Introduction
By analyzing the Database Administrators’ (DBAs) daily work, one can observe that various types of problems (e.g. performance, security, bad physical dimensioning) are found repeatedly. Moreover, professionals frequently forget the best solution last applied to a problem, especially when working under lots of pressure. Remembering the correct solution can take much longer than anticipated, irritating users and compromising the quality of service. Nowadays there are some alternatives that help DBAs in this decision-making process, such as specialized Help Desks and the Expert Systems. Help Desk solutions, however, are limited to the fact they are only a repository for problems and their respective solutions. They cannot predict problems, act efficiently and learn new solutions. Furthermore, in this approach the problems repository must be filled manually, either by the expert or by operators. Thus, we have not considered such solutions in our work. Expert Systems such as Oracle Expert and Progress Fathom have some functionalities that help the DBA´s work. For example, they enable the definition of monitoring events together with scripts that can be executed to solve the failure that is being monitored. Furthermore, they can capture alert messages, enabling the DBA to be pro-active.
The major drawback of these solutions is their inflexibility, together with the lack of a learning mechanism. They take actions or suggest solutions based in information previously stored in a Knowledge Base. Those systems are unable to, for example, solve unknown problems, combine previous solutions or even adapt catalogued ones. They also cannot add new information to their Knowledge Base. DBSitter helps the DBA in the decision-making process, suggesting the most adequate solution(s) to each problem, taking advantage of its library of previously solved problems. In order to do so, it combines two well-known Artificial Intelligence techniques: Case Based Reasoning (CBR) [1] and Intelligent Agents [2]. Our novel approach uses intelligent agents to monitor the Database environment and actuate on it, solving the problem or suggesting solutions to the DBA. By using CBR, we are able to adapt known solutions and enrich the base. DBSitter was implemented in Java[3]. We have also used XMI for communication between agents and JEOPS [4] as knowledge representation and inference mechanism. This paper is organized as follows: section 2 explains the current state of art in database monitoring systems; section 3 discusses our approach; section 4 presents a prototype, DBSitter; section 5 details a case study; and section 6 presents our conclusions and suggestions for further work.
2
Intelligent tools for Monitoring Databases
We carried out a survey of the main market tools for database monitoring. The tools were chosen due to their completeness as well as their similarity to DBSitter – for a comparative analysis: IBM Tivoli Monitoring Databases [5], CA Unicenter Database Administration for Distributed RDBMS [6], HP-Openview Database PAK 2000 [7], Progress VSTMON [8], I/Watch ™ [9], Progress Fathom [10] and Oracle Expert (from the Oracle Enterprise Manager Suite) [11]. These tools are used to set up, diagnose and manage database environments such as Microsoft SQL Server, DB2, Informix, Sybase, Progress and Oracle, as shown in Table 1. In our comparative study, the chosen evaluation criteria were: failure detection, automatic failure correction, database failure prevention capability, learning ability and capacity of giving solution suggestions to the DBA. Most analyzed tools presented functionalities for detection and automatic failure correction. Ca Unicenter tool, with this purpose, has an event correlation feature. In particular, Oracle Expert has a mechanism known as “fix it job”, which consists in a series of corrective scripts. These scripts are previously defined and are automatically executed when certain failures occur. The system then informs the DBA via email. Concerning the capability of failure prediction, Progress Fathom, Progress VSTMON, I/Watch, Ca Unicenter and HP-Openview implement it. In Progress Fathom, this is done by capturing alert messages of imminent errors before they actually occur. The others have statistic-based tools with this objective. Learning functionalities and case solution suggesting are not implemented by any of the tools described above. Both I/Watch and Oracle Expert have a similar feature that allows it to suggest configuration changes to improve the database environment as a whole.
In this light, we perceive that DBSitter is a software tool capable of suggesting to the DBA more than one option to solve certain problem, even showing its success degree and implementation difficulties, based on solutions were previously applied to similar cases. DBSitter has the differential of knowledge learning capability. When a new solution is created or adapted from others, it is then stored in the case repository (that is the way of representing problems) enriching the system and incorporating this knowledge to the system. This last feature is obtained by the use of CBR technology. DBSitter was implemented using Java based technologies conferring to it platform independence. Its architecture is based on distributed intelligent agents, which allows the monitoring of distributed databases in a network environment. Besides, this feature concedes to the system scalability and potentially flexibility to monitor any kind of database. Table 1: Comparative Study of Intelligent Monitoring Systems Tool
Progress Fathom Progress VSTMON I/Watch
Yes
Failure correction “fix it job” Yes
Yes
No
Yes
No
No
Progress
Yes
Yes
No
Configuration Changes
Oracle, MS SQL Server
IBM Tivoli Monitoring Databases CA Unicenter Database Administratio n for Distributed RDBMS HP-Openview Database PAK 2000
Yes
Yes
Yes (statistical analysis) No
No
No
Yes (event correlation)
Yes
Yes
No
No
DB2, Oracle, Informix and MS SQL Server DB2, Oracle, Sybase, Informix and MS SQL Server
Yes (analysis tool)
No
No
Oracle Expert
3
Failure detection Yes
Yes
Failure prevention No
Case learning No
Case similarity No
Database
Yes
No
No
Progress
Oracle
Oracle, Informix and Sybase
The proposed solution: DBSitter
In order to design a tool that helps the DBA to detect and prevent failures, we have used two AI techniques: Case Based Reasoning (CBR) [1] and Intelligent Agents [2]. The resulting prototype is called DBSitter. DBSitter is a Multi-Agent System [3], where a set of intelligent agents is dispersed in the environment collecting information for the system’s decision-making. Agents use their sensors for capturing changes in the environment (and thus finding problems). They also execute actions deemed necessary to correct problems through their actuators. DBSitter keeps a library of cases where well-known problems and their solutions are stored. Case retrieval, suggestion and learning have been done via CBR.
3.1 Cases in DBSitter As shown below, a case in DBSitter corresponds to a problem description together with its suggested solution. The system is also able to store new cases, via the analysis of all sensors in the system. Below we show an example case description in DBSitter. Table 2. Example case representation Category Class of problem Class of object Description of problem (symptoms)
Consequences
Existent solutions Level of Solutions Success Problem Gravity Level
2 – Performance Problems Object Fragmentation Table High number of extents (detected via SQL query in the database) – the table becomes divided into many pieces (extents). Waste of disk space and worse performance accessing data from the table, as it becomes necessary the access to the various extents that make part of it (reflected by slower queries and refreshes) Solution 1: Analyze the table, then export its data, next delete it, recreate it, with new and correct parameters of storage, and finally import back its data. Solutions 1 (high) Low
Legend:
• • •
Category: Type of Data Base Problem Level of Success: indicates the solution efficiency (high, medium or low). It works as a “thermometer” to evaluate which is the best solution among the available ones to a certain problem. When there are many solutions to a problem, (*) indicates the most efficient one. Gravity Level: Maximum, High, Medium or Low. The higher the gravity degree, the worse the impact of the problem on the system’s performance.
3.2 System Architecture DBSitter´s architecture (Fig 1) separates the monitoring agents (Sensor and Actuator), from the internal agents, which process and reason about the environment status information. The internal modules are linked by one main process and this works as a service to the external agents (i.e., the monitoring agents). These external agents fill the repositories with the current states of the environment (i.e., the database system). For this communication, the system demanded a Middleware layer, mediating the internal and the external modules. The system depends on four repositories (Case, Sensor Registry, Actuator Registry, and History), used as databases for queries and maintenance by the agents.
Agents The Reasoning Agent (RA) perceives the occurrence of failures, as well as predicts future problems based on the Case Repository. This perception is obtained through the union of the information provided by the Sensor Agents as well as from what is stored in the History. Once a problem situation is detected, the RA either sends a message to the relevant Actuator Agents or to the DBA. The Sensor (SA) and Actuator Agents (AA) are reactive, and specialize in specific tasks. A SA monitors the database system. SAs periodically gather information, verifying if there are any changes in a specific data in the environment. The AAs perform actions necessary to correcting faults in the system, or to exhibiting relevant information to the DBA. AAs perform specific actions when requested by the RA. Reasoning Agent Failure Predictor
History
Failure Detector
Logger Agent
Similarity Identifier
Case Repository
History Recorder Environment Changing Manager Agent
Timer
Coordenadores
Actuators Register Sensors Manager Agent
Sensors Register
Actuators Manager Agent
Application Manager
Middleware
Sensor Agent 1
Sensor Agent 2
...
Sensor Agent N
Actuator Agent 1
Actuator Agent 2
...
Actuator Agent N
DBA
Fig. 1. DBSitter´s Architecture
The Log Agent Stores all the states of the monitored system in the History, in predefined time intervals. It perceives changes in the environment (based in the data given by the Sensor Agents) and actuates recording the current state in the History. The Environment Changing Manager Agent is the agent used for verifying changes in the states of Sensor Agents, to verify the existence of changes in the environment. Sensor/Actuator Manager Agents are responsible for keeping track of the Sensor Agents and Actuator Agents of the environment in the system, more precisely in the Sensor Registry and in the Actuator Registry.
Processes A few processes are also necessary: The Similarity Identifier (SI) tries to establish a ratio of confidence between the data acquired from Sensor Agents and the cases in the Case Repository. That enables us to analyze also the cases that are reasonably similar to what the sensors read. The History Recorder stores in the History the current and complete state of the system, represented by the values of the sensors. The Timer controls the frequency in which the History is refreshed. It is based in a predefined reference time and in a real time analysis of the Sensor Agents states variation (done by the Environment Changing Manager Agent), to define the recording frequency. The Coordinators are responsible for managing the registry that monitors the entrance or exit of agents in the system. Sensor and Actuator Agents are remote and replicable processes and, thus, it is required that a registry stores and strictly identifies each reference to an agent instance. Once a new Agent is created in the environment it has to be registered to be able to communicate. The Applications Manager joins external client applications that are related to the system. 3.3 DBSitter Functionalities DBSitter presents the following functionalities: • Failure Prevention: In order to prevent a failure the system has to analyze the behavior of the values of the sensors involved in a certain case. This analysis is done in the log values that are generated by sensors (history values of the environment). DBSitter uses the linear least squares1 [13] method with a grade two polynomial approximation in order to estimate the value of the sensors along time. Once this is done, it is possible to preview the behavior of the sensors. Fig. 2 shows an example of the values for the table fragmentation sensors along time. Once the system has them, it is possible to estimate a polynomial parameters and thus predict possible faults. Behavior of an example table fragmentation 30
1
Fag 25 me nta tio 20 n (%) 15
2
Captured values 3
Estimated polynomy
10 5 0 0
20
40
60
80 100 Time unit
120
140
160
Fig. 2. Graph showing table fragmentation values along time. (1) Values approximation curve (2) Prediction (3) Threshold value indicating table fragmented. 1
The least square method is widely used for unidimenional estimative of parameters
•
• • • • •
4
This functionality is realized by the Failure Predictor, via a macro analysis of the History of system states, algorithms of prediction, verification in the Case Repository and the Similarity Identifier. Failure Detection: This is guaranteed by the interaction of intelligent agents and sensor agents, as the role of intelligent agents is to monitor the various parameters as captured by the sensor agents disperse in the database environment. Expressions are built from those parameters, and they characterize the cases to be detected by the system. The Failure Detector process then matches the current system state with the Similarity Identifier, as a way to assess the possibility of a system fault. Failure Correction: Corresponds to the action taken by the actuator agent when a solution for a problem is found in the Case Repository. Then the failure is corrected, or a message is sent to notify the DBA. Failure Cases Gathering: Specialized users (DBAs) can feed the Case Repository, adding new cases manually when required or when they pass through an unknown and not catalogued problem. Failure Seeking: Occurs when a DBA queries directly the Case Repository to search for information about a certain failure. Suggestion of solutions to failures: This happens when a certain problem has more than one solution, and it comes with a degree of success for each case. The DBA then chooses the most adequate solution. Solution Adaptation/Learning: Consists in the improvement of the Cases Repository when a new and unknown problem comes out. The solution can be an adaptation of previously known solutions catalogued in the repository. For that purpose, we will use Decision Trees [12], which are simple to implement and generate new rule. These can, in turn, be used to implement new cases. In our prototype, only the functionalities of Prevention, Detection and Correction of failures were fully implemented. The functionality of solution adaptation is partially implemented.
The Prototype
DBSitter was developed with the main objective of using technologies that were open source; that let the integration of the system and administration of any Database system, and that enabled the communication between intelligent agents. Hence, Java was the chosen programming language. This language provides object orientation, useful for modularity and essential for extensibility, as we adopted the proposal of creating a framework for Database solution developers. The virtual machine provided by Java brings about a multiplatform execution environment. For the communication middleware, we used Java’s Remote Method Invocation, RMI. It is used either for the communication between the system and the actuator and sensor agents as for the communication between the application and the database administrator’s interface. For the case descriptions, we used the framework Jeops [Figueira Filho e Ramalho 2000], with the aim of employing the logic paradigm integrated to system objects.
Through Jeops it is possible to define rules that determine the conditions that trigger events that indicate adverse situations in the database system. The system reasons about both run time information and history data. The descriptions enable the AAs to act either directly in the system or to provide information for the DBA to take over. For the system´s internal repositories, we have used: (1) MySql, the internal database system, storing the sensor and actuator registry; (2) XML Files, to save or load configuration parameters when the system shut down or starts. Target databases for the implementation of sensors and preliminary tests were Microsoft SqlServer and Oracle. A process call written in Object Pascal was used for the implementation of sensors. These processes are used for monitoring environment states (for example, processor or memory states), as for direct database queries for analyzing database states. The learning functionality, which is one of the differentials of this work, is still in implementation stages. 4.1 A use example of DBSitter As a sample of the proposed solution we have built a Case Repository with some initial cases catalogued by DBA experts that used an Oracle Database for monitoring the problems during the rush hours at Serpro2. In this example, we will consider that the problem described in Case 1, a table that has reached unacceptable fragmentation levels has just occurred. The process is launched when a Sensor Agent detects that a specific table has reached a critical fragmentation level. This is done through the event of fragmentation monitoring, witch is basically a cyclic SQL query to Oracle Data Dictionary looking for information of the table The Sensor then sends a message to the Environment Change Manager Agent, which in turn interacts with the Reasoning Agent. The RA then searches Case Repository for a case that best conforms to the problem. When a similar case is found, the RA sends a message to the Coordinator, witch notifies the Actuator Agent of the solution to be automatically done, later sending a message to the DBA, stating that an action was taken to correct the problem. After the action from the Actuator Agent, the table is defragmented, and the Log Agent is triggered to record the action in the History.
5
A Case Study
We have carried out a series of tests to evaluate the efficiency of DBSitter. This section describes the main results, together with the scenario of the experiments. The platform adopted for tests was a client-server environment using a Windows 2003 server with the SQL Server 2000 and Oracle Enterprise Edition 8.1.7.0.0 databases installed. The client portion had Windows XP SP1 and Oracle and Sql Server software clients both representing the databases we tested DBSitter with). They where connected via an Ethernet TCP/IP. Table 2 below describes the case tests. 2
Serpro stands for Brazilian Federal Data Processing Services.
Table 3. Cases Evaluated
Test Case
How we simulated the error
Actuator’s action
Database envolved
Database not available
The Database service was stopped.
Put the Database online
Oracle and Sql Server
Defragment the table
Oracle and Sql Server
Increase the SHARED_POOL_AREA parameter, which controls the memory area allocation.
Oracle
Increase the datafile of the overflown tablespace.
Oracle
Increase the datafile of the tablespace with free space near to overflow.
Oracle
Table’s Fragmentation Insufficient Memory
Tablespace overflow.
Preventing Tablespace overflow
A table partitioned in various small pieces was created. Then, various registers were inserted, until the table’s fragmentation limit was reached.(diagnosed by an Sql query). First, the Shared Pool Área was set to a small value. After that, a large procedure code that caused the memory hit ratio to be less then the acceptable 90% was invocated A tablespace with IM datafile was created. Then, a table with 100k extents was created. After that, an allocation of 10 extents was provoked on the same table, which provoked the ORA-01653 error, indicating tablespace overflow. Similar to the above, only that the actuator intervened before the amount of extents generated the error
It is worth remarking that the last case was included in the tests in order to the prevention functionality. The results obtained were promising, indicating that the solution works as predicted. An experienced DBA observed the results, and concluded that the actuator’s intervention was appropriate in all case tests above. Both problem resolution and solution performance were considered in this evaluation.
6
Conclusions and Further Work
The main goal of this work is to provide the Database professional with a tool that helps him/her in the difficult decision-making process that has to occur rapidly when there are database problems. The idea was not only to automate known solutions, but also to record novel approaches to problems. Thus, this paper has presented a tool that combined two well-known AI techniques in a novel approach: DBSitter. We emphasize that among the main differentials of DBSitter relative to the Expert Systems for Database Administration is the fact it can be used in the monitoring of any Database and also as a combination of AI technologies, Case Based Reasoning and Intelligent Agents, making possible the implementation of characteristics of detection, problem prevention, learning and adaptation of solutions. Furthermore, our prototype was implemented with open-source technologies, which makes it that much easier to be spread and refined by different developers. Another important contribution of this approach is the Case repository (open to the DBA), that assists in the propagation of knowledge among DBAS, thus making their daily tasks easies and contributing to the enhancement of the level of service. The automatic detection of problems done by sensors and the prediction capability of the system help DBA to act in a pro-active fashion by being able to detect and
earlier act on the errors. From our point of view, this is another important advantage of our approach, as discussed earlier in the paper. The platform was tested with two different databases (Sql Server and Oracle 8i). This first evaluation has yielded promising results in both cases, which not only indicates that the approach is an interesting one, but also reinforces the portability of the prototype. The next steps of our research and development include enhancing the implementation of the learning functionality and adaptation of solutions. It is worth remarking that we have already started this process, laying the necessary architectural foundations.
References 1. Aamodt A., Plaza E. (1994): Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. AI Communications. IOS Press, Vol. 7: 1, pp. 39-59. 2. Russel, s. j.; Norvig, p. : Artificial Intelligence: A Modern Approach (2nd Edition). Prentice Hall; 2nd edition. December, 2002. 3. The Source for Java [Java 2003] Technology –http://Java [Java 2003] .sun.com. Last Access: 21 set. 2003. 4. Figueira Filho, C.; Ramalho G. JEOPS – Java Embedded Object Production System. Monard, M. C; Sichman, J. S (eds). Advances in Artificial Intelligence, International Joint conference, 7th Ibero-American Conference on AI, 15th Brazilian Symposium on AI, IBERAMIA-SBIA 2000, Proceedings. Lecture Notes in Computer Science 1952. Springer 2000, pp. 53-62. 5. IBM 2004. http://www.ibm.com/br/products/software/tivoli/products/monitor-db/. Last Access: 16 feb. 2004 6. Computer Associates International 2004. http://www3.ca.com/Solutions/Product.asp?ID=1357. Last Access: 16 feb. 2004 7. Hewlett-Packard Company 1994-2003. http://www.openview.hp.com/products/dbpak2k/ prod_dbpak2k_0003.html. Last Access: 16 feb. 2004 8. Progress Software Corporation 1993-2004. http://www.progress-software.com.br/htm/ consultoria/pacotes_vstmon.htm. Last Access: 16 feb. 2004 9. Quest Software 2002. http://www.quest.com/i_watch/pdfs/IWatch.pdf. Last Access: 16 feb. 2004 10. Progress Software Corporation 1993-2004. http://www.progress.com/products/ documentation/startffm/index.ssp. Last Access: 16 feb. 2004 11. Oracle Coorporation: Oracle Enterprise Manager Oracle Expert User' s Guide - Release 1.4.0 - A53653_01 12. Machine Learning, T. Mitchell, 1997, McGraw-Hill. 13. Cálculo Numérico. Aspectos Teóricos e Computacionais. Márcia A. Gomes Ruggiero e Vera Lúcia da Rocha Lopes. Makron Books, 1997.