Application of Machine Learning in Water Distribution Networks: An Initial Study Luis M. Camarinha-Matos New University of Lisbon and Uninova, Faculty of Sciences and Technology Quinta da Torre, 2825 Monte Caparica, Portugal.
[email protected]
Abstract This paper describes an on going work on the application of machine learning techniques in the domain of water distribution networks. This research is performed in the framework of an European project called WATERNET, that aims to develop a system to control and manage water distribution networks and is composed by a supervision system, a distributed information management subsystem, an optimization subsystem, a water quality monitoring subsystem, a simulation subsystem, and a machine learning subsystem. This paper is focused in the machine learning subsystem, describing the approach followed and found difficulties. The basic raw material for this work are historical data from a Portuguese water distribution company that has 45 water stations and some of then with 6 years of collected data at every 5 minutes. The paper also shows the first results obtained, discusses difficulties found in the first experiments and introduces an architecture based on qualitative models/causal relationship to make more easy the process of knowledge extraction from the historical data and the assessment of the extracted knowledge.
Keywords Machine learning systems, water distribution networks
1
INTRODUCTION
Water distribution networks represent a promising application domain for machine learning. Such networks show a large heterogeneity in terms of control structures, management strategies, and varying geometry with continuous expansion and changes in demand along their
Fernando J. Martinelli New University of Lisbon and Uninova, Faculty of Sciences and Technology Quinta da Torre, 2825 Monte Caparica, Portugal.
[email protected]
life. Due to these characteristics, water distribution companies face the problem of data and knowledge integration related with control and optimal exploitation. In most cases there is no adequate model of the networks and its behavior and, therefore, the control and supervision strategies are based on manual procedures and some heuristic rules. There are, however, large quantities of data collected during the operation of the network that suggest the opportunity to apply learning techniques in order to find out more optimized operation strategies. In the context of the European Esprit project WATERNET, an evolutionary knowledge capture approach for advanced supervision of water distribution is being developed. This paper introduces the preliminary experiences on the evaluation of machine learning techniques in this domain as well as the applied methodology. The main difficulties found can be summarized in:
• Identification of potential learning tasks that can lead to useful knowledge from the users point of view, and that are feasible based on the available historical data;
• Selection of machine learning techniques to apply in each learning task;
• Preparation of raw data in order to feed the learning algorithms. Due to the huge number of parameters and stored situations this is a critical phase of the process. Additionally, some high level features have to be decided and extraction mechanisms implemented;
• Assessment of the generated knowledge. Either due to the limitations of the learning techniques or the quality of the training data, not all extracted knowledge is adequate or meaningful. A careful assessment procedure is necessary.
• Integration of the learning system with the supervision system. The devised methodology tries to overcome these difficulties.
Modelling and SimulationSubsystem
Distributed Information Management System Acquired knowledge Supervision System Instantiated Model Distributed Information Management System
Operational Model
Optimization Subsystem
Distributed Information Management System
Optimized plan
Acquired knowledge
Operational data constraints
Acquired data and actions Distributed Information Management System
Acquired data Qualitative Causal Model
Learning Subsystem
Distributed Information Management System Water Quality Monitoring Subsystem
Figure 1: WATERNET architecture It is widely recognized that applying standard learning algorithms to real world problems is quite an art. Some of these algorithms are unable to take into account background knowledge and, therefore, it is up to the engineer to perform a set of preparatory actions / transformations on raw data in order to get useful rules or decision trees. Generated knowledge has to be assessed by a domain expert. This process is usually interactive and iterative, very time consuming and might require addition / removal of training examples, modification of the example description language or even modification of parameters of the learning algorithm. These difficulties were found in this application domain. One of the key points of the followed strategy is the use of a qualitative model of the water distribution system to guide the learning process and to help in the assessment phase. The remaining of this paper is organized as follows. Section 2 introduces some information about the WATERNET project, presenting the current situation of the water distribution companies and the data used by the learning subsystem. Section 3 introduces the Learning System being developed in the project, focusing the adopted methodology. Section 4 presents some possible learning tasks in the context of water distribution networks and some preliminary experiments, discussing the difficulties found in the realization of these experiments. Section 5 describes the use of qualitative models to guide the learning process. Finally, section 6 presents some conclusions and open questions.
2
optimal operation and decision support of drinkable water distribution networks, in order to minimize the costs of exploration, guarantee the continuous supply of water with better quality monitoring, save energy consumption and minimize natural resources waste. The system that is being developed in this project is composed by: •
A Supervision System , that will supervise the network in a distributed way, monitoring its current status, identifying deviations from the desired states and making decisions about the next control and management actions;
•
A Distributed Information Management Subsystem , that will support the cooperation and information exchange among sites and their activities (Afsarmanesh, Camarinha-Matos & Martinelli, 1997);
•
An Optimization Control Subsystem , that will produce optimized operational strategies for control devices (pumps, valves, ... ) in order to minimize exploration costs, guaranteeing good services (flow, pressure, water quality) (Quevedo et al. 1988);
•
A Water Quality Monitoring Subsystem , that will monitor the water quality in the network and will guide the water treatment process to guarantee the sanitary safety of the water;
•
A Modeling and Simulation Subsystem , that will be the responsible to specify the network models, deriving the desired information for the other subsystems and making simulations about the network behavior;
•
And finally, a Learning Subsystem , that will be a system containing multiple learning algorithms, representing different paradigms, to support
THEWATERNETPROJECT
2.1 PROJECT OVERVIEW The WATERNET project is a two year Esprit project that aims to design and develop an evolutionary knowledge and management system towards the control,
programming by demonstration and data mining on historical operation databases.
3
THE LEARNING SUBSYSTEM
Figure 1 illustrates the WATERNET architecture.
3.1 WHY A LEARNING SYSTEM ?
2.2 CURRENT SITUATION
The motivation to use a machine learning system in this project is a consequence of the following factors:
One of the end users of this project is the SMAS Sintra, a Portuguese water company whose system represents a typical scenario for a modern control and management system for water supply.
• There are not adequate models, for water distribution networks, to ensure satisfactory supervision requirements;
Its network is composed by several stations that perform some operations in the hydraulic network (for instance: pumping, storage). In these stations there is a set of sensors to monitor the current status of the variables that influence or are being influenced by the station operation, and actuators to perform some actions to guide the distribution process. The actions can be started by a local human operator through a control panel, by a software module that controls the station, or by an operator in the remote central station. In all these situations, using the actual operational structure, is very difficult to do actions taking into considerations what is going on other stations, even when the stations are controlled by the software; actions are made based solely on knowledge about local characteristics that have to be reached. So, a coordination with the other stations is, in general, difficult to be obtained with out more knowledge about the network behavior. The function of the human operators is to observe the current status situation, watching the variable values and the incoming alarms. Based on their knowledge / experience about the water network, the operators could do some actions depending on the observed occurrence. These actions could be as simple as ignore the alarm or as drastic as stop the station and call a brigade to make any necessary repair. 2.3 AVAILABLE HISTORICAL DATA The readings of the sensors, and the actions performed on the actuators are periodically stored in a local database in each remote station and later transferred to the central station. Readings are made based on a periodicity of 5 minutes. If any operation is performed, an alarm is detected, or a communication is made to the central station, more readings are collected. These data are stored in flat text files. The data related to each day is stored in a different text file, one file for each station. The data made available for this project refers to 42 stations, some of then with 6 years of collected data. It represents about 500 Mbytes of compressed data.
• The knowledge on how to operate the networks is based on the experience of the human operators, that is difficult to extract and sometimes difficult to model; • One of the end-users partners has a historical database of data collected on its stations that seems appropriate for a detailed investigation using data mining and machine learning techniques; • The water distribution networks have a characteristic of constant expansion, needing a process to guarantee incremental learning of their functioning. 3.2 APPROACH AND METHODOLOGY As there are no known cases of application of machine learning in this domain, some effort had to be put on devising an adequate methodology for this case. However, we believe the approach followed can easily be generalized and applied in other domains The main steps of the adopted methodology are: 3.2.1
Understanding the problem domain
This phase implies a lot of interaction with the domain experts in order to understand the specific concepts and ontology of the domain, a careful analysis of the historical data to understand their meaning, and some visits to the physical sites. 3.2.2
Identification and characterization potential learning tasks
of
Through observations of the historic data and as a result of phase 1, a set of potential learning tasks was identified and characterized. In parallel with this process, and in order to get some “illustrative examples” of learned knowledge to discuss with end users, very simple learning experiments were performed. This approach revealed quite useful as it greatly facilitated the interactions with the water domain experts (that have no knowledge on machine learning). The last step in this phase is the assessment of the proposed set of learning tasks. This assessment is based on two aspects:
− The end users opinion about the usefulness of the task; − The feasibility cost (implementation effort).
Therefore this phase consists of three steps: •
Identification and preliminary characterization of potential learning tasks;
•
Realization of preliminary (illustrative) learning experiments;
•
Assessment/validation of proposed learning tasks and detailed characterization of the most important ones.
qualitative models defined in phase 5. In any case, an interactive approach seems mandatory. Complementarily, the use of a network simulation tool can be a very good help in this assessment phase. For this reason, the WATERNET project is also developing a simulation subsystem that can be fed with knowledge extracted from the learning system in order to assess its value. 3.2.9
3.2.3
Selection of learning techniques
The most promising techniques for each selected learning task are identified and a learning system containing algorithms representing multiples paradigms is designed in this phase. 3.2.4
Raw data transformation
The current representation and organization of the historic data (flat chronologically ordered text files) is not the most adequate to select subsets of training data. A transformation of these raw data into a more suitable database structure is necessary. 3.2.5
Elaboration of a qualitative model of the water distribution system
One of the big difficulties to address is the selection of appropriate subsets of training data out of the historic database. Various hundreds of variables are stored, both on a periodic basis and also on the occasion of exception events. However, for a particular learning task, only a small subset of these variables and some of the readings are useful. The question therefore is how to select the adequate training data. The proposed approach is to use a simplified qualitative model (see chapter 5) to guide this process. 3.2.6
Extraction of high level features
For some learning tasks it is convenient to use features of a higher level than the raw data. For instance, the historic data includes records of flows, whilst other features like accumulated water volumes, average flows, derivatives of variables, etc., may be adequate for some tasks. A set of features extraction procedures have to be implemented on top of the DBMS. 3.2.7
Guided learning phase
Integration into the supervision system
Finally the extracted knowledge should be added to the knowledge base of the supervision system.
4
IDENTIFIEDLEARNINGTASKS
4.1 EXAMPLES OF LEARNING TASKS Based on step 2, the following potential learning tasks were identified: a) Support for Production Planning These tasks intend to extract some knowledge that could be useful for the forecasting of the future water demand, and to give ideas on how the network is evolving.
• Identification and characterization of pattern repetitions along the time (seasonal patterns);
• Identification of tendencies (changes) in consumption; • Identification of working period of devices. b) Identification of the influence factors to the actions These tasks have the objective to discover factors that influence the operation of some devices of the network, in order to learn how to operate then in the future.
• Actions as a function of physical system parameters; • Actions as a function of time. c) Monitoring and alarm handling These tasks aim the system to learn how to detect, diagnose and recover some failures observed in the network. The observation of the human actions will be strongly used.
• Alarm detection based on readings of system variables; • Alarm detection function of time;
Application of one or more learning algorithms to the selected training data sets.
• Consequences of alarms not processed;
3.2.8
• Actions that cause alarms;
Assessment of extracted knowledge
This phase needs the intervention of water domain experts, but their task can be greatly simplified with the help of some automatic “pre” assessment based on the
• Inter-relationships between alarms; • Actions after alarms. d) Preventive maintenance
Training Data: **ATTRIBUTE AND EXAMPLE FILE** LN-11: (FLOAT) -> Level LC-6-1: (FLOAT)-> Flow LC-6-2: (FLOAT)-> Flow LP-1: (FLOAT) -> Pressure Action: "AG_18(Off)" "AG_18(On)" "None"; @ 1.5 1.3 ? ? 1.3 1.5 1.5 1.5 1.8 1.8 2 ? ? 2 2 2 2 2 1.8 1.8 1.5 1.5 1.5 1.3 ? ? 1.3 1.3 1.3 1.3 1.5 1.5 1.5 1.5 1.5 1.5 2 ? 1.8 ? 1.8 1.5 ••• •••
4.2 0.4 ? ? 4.2 0.1 4.5 0 13.5 0.7 13.1 1.1 12.4 1.1 12.6 1.1 13.1 1.1 13.1 1.1 ? ? 13 0.9 1.9 0.1 4.7 0 4.4 0.4 4.4 0.3 4.4 0.4 4.4 0.4 4.2 0.4 4.4 0.4 4.5 0.4 3.7 0.4 4.5 0.4 ? ? 4.4 0.3 4 0 13.8 0.1 13.1 1.1 13.6 1.1 13.1 1.1 13 1.1 13.3 1.1 13.3 1.1 13 1.1 13 1.1 13.1 1.1 ? ? 12.8 1.1 13.6 0 1.4 0.1 4.2 0.3 4.4 0.4
2.8 ? 2.7 2.8 2.8 2.8 2.9 2.9 2.7 3.6 ? 3.2 3.4 3.8 2.9 2.8 2.8 2.9 2.8 2.8 2.8 2.8 3 ? 2.9 2.9 3 3 2.8 2.8 2.7 2.8 2.8 3 3 2.9 ? 2.9 2.9 2.9 2.9 3
"None"; "None"; "AG_18(On)"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "AG_18(Off)"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "AG_18(On)"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "None"; "AG_18(Off)"; "None"; "None";
Figure 2a These tasks will help the maintenance sectors of the water distribution companies on how to manage in a better way their devices, applying politics to improve their devices life.
• Characterization of devices life cycle; • Identification of behavioral changes in devices.
Generated rules: **RULE FILE** @ Time: [ Mon Sep 16 19:47:07 1996 ] Examples: g18p_cn2.txt Algorithm: UNORDERED Error_Estimate: LAPLACIAN Threshold: 0.00 Star: 5 @ *UNORDERED-RULE-LIST* IF
LC-6-1 < 3.40 AND 0.05 < LC-6-2 < 0.15 AND LP-1 > 1.75 THEN Action = "AG_18(Off)" [7 0 1.25] IF
LN-11 > 1.90 AND LC-6-1 < 2.70 THEN Action = "AG_18(Off)" [4 0 3.75] IF
LN-11 > 1.90 AND 2.30 < LP-1 < 2.75 THEN Action = "AG_18(On)" [0 1 1.75] IF
4.20 < LC-6-1 < 4.60 AND 0.05 < LC-6-2 < 0.25 AND LP-1 > 3.45 THEN Action = "AG_18(On)" [0 2 0.62 ] IF AND AND AND THEN
LN-11 < 1.90 LC-6-1 > 2.70 0.05 < LC-6-2 < 0.15 LP-1 < 1.55 Action = "AG_18(On)" [0 1.50 0.62]
IF
LN-11 > 1.90 AND LC-6-1 < 8.60 AND LP-1 < 1.55 THEN Action = "AG_18(On)" [0 0.50 2.25]
••• •••
Figure 2b 4.2 PRELIMINARY EXPERIMENTS After the identification of the learning tasks, some experiments designed to evaluate the feasibility of the suggested tasks. In addition, these tests were also performed in order to:
e) Improvement of user satisfaction
• Understand better the structure and contents of the historical data repositories;
The evaluations of the user complains could give some knowledge to the companies, specially in terms of anomalies detected in the network.
• Identify basic data transformations needed before these data can be applied to standard learning algorithms; and
• Identification of network anomalies. f) Other possibilities These other possibilities are related to the identification of the normal behavior of the network. • Identification of the range of values for system variables in normal operation; • Detection of abnormal variations of some variable.
• Start testing the behavior of some algorithms with some real data. For this purpose, a subset of the available data, representing 1 month of a SMAS station (Pedra Furada), and a set of inductive algorithms, generating decision trees or rule set were selected. The following steps were followed in these experiments: • Conversions of the data to a more convenient format;
Training Data: **ATTRIBUTE AND EXAMPLE FILE** LN-11: (FLOAT) -> Level G_18_status: "On" "Off" Action: "AG_18(Off)" "AG_18(On)" "None"; @ 1.5 "Off" "None"; 1.3 "Off" "None"; ? "Off" "AG_18(On)"; ? "On" "None"; 1.3 "On" "None"; 1.5 "On" "None"; 1.5 "On" "None"; 1.5 "On" "None"; 1.8 "On" "None"; 1.8 "On" "None"; 2 "On" "None"; ? "On" "None"; ? "On" "AG_18(Off)"; 2 "Off" "None"; 2 "Off" "None"; 2 "Off" "None"; 2 "Off" "None"; 2 "Off" "None"; 1.8 "Off" "None"; 1.8 "Off" "None"; 1.5 "Off" "None"; 1.5 "Off" "None"; 1.5 "Off" "None"; 1.3 "Off" "None"; ? "Off" "AG_18(On)"; ? "On" "None"; 1.3 "On" "None"; 1.3 "On" "None"; 1.3 13.6 1.1 2.8 "None";
Generated rules **RULE FILE** @ Time: [ Mon Sep 16 20:04:12 1996 ] Examples: g18a_cn2.txt Algorithm: UNORDERED Error_Estimate: LAPLACIAN Threshold: 0.00 Star: 5 @ *UNORDERED-RULE-LIST* IF
LN-11 > 1.90 AND G_18_status = "On" THEN Action = "AG_18(Off)" [4.50 0 16.50] IF
LN-11 < 1.40 AND G_18_status = "Off" THEN Action = "AG_18(On)" [0 4.50 10.50] IF
1.40 < LN-11 < 1.90 AND G_18_status = "Off" THEN Action = "None" [0 2.25 173.75] IF
LN-11 < 1.90 AND G_18_status = "On" THEN Action = "None" [4.50 0 102.50 ]
••• •••
••• •••
Figure 3a • Handling of unknown values: Due to the way that the data were collected, many examples occurred where some variables had unknown values. A simple way to deal with that problem is to work with some algorithms that deal with this kind of information, but it was realized that even with these kind of algorithms unknown values may bring many inaccuracies to the results. Therefore, simple strategies to estimate the missing values were developed; • Classification of the examples based on information contained in the data; • Directing the learning process to certain classes: In many cases, better results were obtained when the problem was focused in a reduced set of classes to learn, maybe due to a superposition of classes in some cases; • Evaluation of the useful knowledge. For illustrative purposes, some of the results with tests on the task “Actions as a function of physical system parameters” are shown in figure 2. The example shown in figure 2 illustrates the use of a inductive learning algorithm (Clark & Niblett, 1989) (Clark & Boswell, 1991) to extract operation rules. In this example only four variables (level, flow1, flow2,
Figure 3b pressure) are used. The task is to learn on which conditions the pump G_18 should be operated. Looking to the historical data, we identified that pressures and flows were consequence of the pumping process and not variables that directly influence the decision making, so a refinement was made. Figure 3 shows this situation. For this test the parameters used are the reservoir level and status information on the group 18 (On or Off). The improvement obtained with test was useful to show that additional knowledge about the network (like qualitative models) could improve the results obtained. In previous tests unknown values were fed directly to the learning algorithm. Considering that during the sample interval (5 min., in this case) the changes in the water level are small, a new test was performed replacing each unknown ("?") by the previous known value. In this case, for the same training set, the following rules were generated: IF THEN
LN-11 > 1.9 AND G_18_status Action = "AG_18(Off)"
= "On"
IF 1.65 < LN-11 < 1.90 AND G_18_status = "On" THEN Action = "AG_18(Off)" IF LN-11 "Off" THEN
< 1.40 AND G_18_status Action = "AG_18(On)"
=
IF LN-11 "Off" THEN
> 1.40 AND Action = "None"
=
G_18_status
Flow Q2 forecasting using time and last known value as inputs 30 25 20 Real value
15
Predictedvalue
Flow 10 5 0 0
1
2
3
4
5
6
7
8
9
10 11Time 12 13 14 15 16
17 18
19 20 21 22 23
Figure 4: Water demand forecast using neural networks IF LN-11 < 1.65 AND "On" THEN Action = "None"
G_18_status
=
(DEFAULT) Action = "None"
Another example was the application of a neural network to the forecast task. Figure 4 shows an example for the task of “Identification and characterization of pattern repetitions along the time (seasonal patterns)”.
5
MODELDRIVENLEARNING
As mentioned before, one of the difficult aspects is the decision on which variables to consider and which subset of readings to use in a particular learning task. A human expert, with some knowledge about the station can suggest with a certain degree of accuracy which variables have to be used in the training. This, in our point of view, occurs because the expert has, as background knowledge, the causal relationships present in those stations. Therefore, a strategy similar to that used by Clark & Matwin (Clark & Matwin, 1993) which use Qualitative Models to guide inductive learning was adopted. A Qualitative Model tries to represent in a simplified way the "macroscopic" behavior of a physical system. In our approach we are specially focused in the causal relations that could be derived from the stations. Our approach does not intend to extract or refine the model as in
(Nordhausen, & Langley, 1993), we only intend to use a known model in order to make ease the process of knowledge extraction. Other way to connect qualitative reasoning with machine learning cam be also found in (Bratko, 1994) The methodology behind our approach consists in expressing in a qualitative way the influences detected between system variables and use the knowledge of these influences to identify which variables have to be used in the training phase. The adopted causal relationships do not specify details about the variables’ relationship. They express concepts like “If variable A increases, variable B increases too”. Looking at a station diagram it is easy, in many cases, to define the necessary relationships in this way. Figure 5 shows a part of the Pedra Furada station diagram with the identified qualitative relationships. The elements used in this notation have the following meaning:
+ → - Represents a positive influence; − → - Represents a negative influence; & |
- The logical connective AND; - The Logical connective OR;
Relations
Flow Q1
−
→ N1 ( G1 | G2 ) & V5
Pressure P1
+
→ P1 ( G1 | G2 ) & V5
Reservoir Level CF2
+
Valve VC4
→ Q1 ( G1 | G2 ) & V5
Reservoir Level CF1
+
→ P1 G7 & V4 +
→ Q1 G7 & V4
Valve VC3
−
Pump B3
→ HP1 VC1 & B1
Reservoir Level SA1
−
→ HP2 VC2 & B1
Pump B2
+
→ P1 ( VC1 | VC2 ) & B1 +
→ Q1 ( VC1 | VC2 ) & B1
Reservoir Level HP1 Valve VC2
−
→ SA1 B2
Reservoir Level N2
+
→ P1 B2
Reservoir Level HP2
+
→ Q1 B2
Pump B1
Valve VC1
−
→ CF1 VC3 & B3
Valve V5
−
→ CF2 VC4 & B3
Valve V4 Group G2
Group G7
+
→ N2 ( VC3 | VC4 ) & B3
Reservoir (lake) Level N1
Reservoir (hole)
+
→ N2 Q1 +
→ Q1 P1
Group G1
Figure 5a
Figure 5b
Taking the first example, it can be said that: IF G1 OR G2 decreases AND V5 decreases, then N1 increases.
variables have to be used when learning a specific task. For instance, if somebody wants to learn something about HP1, he will need to consider information about VC1 and B1.
This notation has another advantage in this application. The station has many variables that could be expressed as a binary value. For example: G1 and G2 represent the situation of a group. They receive values “On” or “Off”;V5 represents the situation of a valve that could be Opened or Closed. Analyzing this equation it is easy to create a causal graph representing the relationships between every variable studied. The graph creation is made in a similar way as that used by Iri (Iri, M. et al, 1979) and shown by Dague (Dague, 1995). The causal graph created from the table above is shown in Figure 6. The black circles represent the existence of a causal relationship between the variables. From this graph it is possible to decide on which
G1 G2 V5 N1 G7 V4 VC1 VC2 B1 HP1 HP2 B2 SA1 P1 Q1
Figure 6: Causal graph derived from Pedra Furada station
Historic Data
Expert Guidance
Qualitative Filter
Qualitative Model
Pre-processing & Learning Algorithms
Generated Knowledge
Consistency Checking
Extracted Knowledge
Expert Judgment
Figure 7: Applying qualitative models to the learning process
Figure 7 shows some of the steps when applying these qualitative models/causal relationships to the learning process. The discussion done previously is related to the first application point shown in the figure. The knowledge about the model is used to select which variables will be used in the learning process. But, the intention is to use this model also to evaluate the knowledge generated by the algorithms, identifying possible inconsistencies and making the extracted knowledge more accurate. The final assessment will be, however, made by the human expert. The assessment phase was not implemented yet in WATERNET.
6
CONCLUSIONS
Water distribution networks represent a very interesting application case for machine learning techniques. A preliminary analysis has shown that there are a large number of possible learning tasks, appropriate for the use of diversified learning set of leaning techniques, that could improve the way water distribution systems are operated. A methodology for the application of learning techniques in this domain was sketched, but it can be generalized to other data mining applications. A very important aspect that came out of this ongoing research work is the importance of using qualitative models to drive the learning process, both in the phase of selecting training data subsets and in the assessment of generated knowledge. As the work described is part of an ongoing research project, there are still many open questions, namely in terms of validation of the generated knowledge and its integration on the water supervision system. Acknowledgements This work is funded in part by the European Commission, via the Esprit Waternet project. The authors also thank ESTEC and SMAS-Sintra for the supply of historic data and fruitful discussions. Fernando José Martinelli also thanks CNPq (Brazilian Council of Research and Development) for his scholarship. References Afsarmanesh, H., Camarinha-Matos, L.M., Martinelli, F.J. (1997) Federated Knowledge Integration and Machine Learning in Water Distribution Networks, to appear in ISIP97 International Conference on Integrated and Sustainable Industrial Production. Camarinha-Matos, L.M., Seabra Lopes, L., Barata, J. (1996). Integration and Learning, in Supervision of Flexible Assembly Systems, IEEE Transactions on Robotics and Automation, Vol. 12, N. 2, April 1996, p. 202-219.
Bratko, I. (1994). Machine Learning and Qualitative Reasoning - Extended Abstract, in Machine Learning Journal, n.14, p.305-312, Netherlands, 1994. Kluwer Academic Publishers. Clark, P. and Boswell, R. (1991). Rule induction with CN2: Some recent improvements, in Machine Learning EWSL-91 (ed. Y. Kodratoff), pages 151-163, Berlin, 1991. Springer-Verlag. Clark, P. and Matwin, S. (1993). Using Qualitative Models to Guide Inductive Learning, in: Proceedings of the 10th International Machine Learning Conference (ML93). (ed. P. Utgoff), San Mateo(CA-USA): Morgan Kaufmann Publishers inc., 1993. p.49-56. Clark, P. and Niblett, T. (1989). The CN2 Induction Algorithm, in Machine Learning Journal, nrs.3/4, p.261283, Netherlands, 1989. Kluwer Academic Publishers. Dague, P. (1995). Qualitative Reasoning: A Survey of Techniques and Applications. AI Communications, v.8, nrs.3/4, p.119-192. Sept./Dec. 1995. Iri, M. et al (1979). An algorithm for diagnosis of system failures in chemical process, in Computer & Chemical Engineering, n.3, p.489-493. Quevedo J. et al. (1988). A contribution to the interactive dispatching of water distribution system, in International Symposium on AI, Expert Systems and Languages in Modelling and Simulation, pp41-46, Barcelona, 1987. Nordhausen, B. and Langley, P. (1993). An Integrated Framework for Empirical Discovery, in Machine Learning Journal, n.12, p.17-47, Netherlands, 1993. Kluwer Academic Publishers.