Automating the raw data to model input process: an ...

13 downloads 0 Views 287KB Size Report
treatment plant of Eindhoven, The Netherlands, operated by Waterboard De Dommel. The code used for the data visualisation, analysis and reconciliation was ...
Automating the raw data to model input process: an open Python™ package C. De Mulder1*, T. Flameling2, J. Langeveld3, Y. Amerlinck1, S. Weijers2, I. Nopens1~ 1

BIOMATH, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Gent, Belgium 2 Waterschap De Dommel, PO Box 10.001, 5280 DA Boxtel, The Netherlands 3 Delft University of Technology, Stevinweg 1, 2628 CN Delft, The Netherlands * Presenting author: [email protected] ~ corresponding author: [email protected]

Abstract: Good-quality influent data is a prerequisite for model calibration and validation and therefor for the construction of a reliable model in general. All too often though, such data is not used to the fullest, because it contains noise or large gaps, making it impossible to use directly as model input. This contribution presents a tool that makes the usage of online highfrequency data as model input reproducible, reliable and transparent. Keywords: Wastewater treatment; data analysis; python package Introduction The availability of data is of great importance for any modelling study, as it is used for model calibration and validation, hereby increasing the predictive power of the model. Data collection and analysis are, however, also often the most time-consuming aspects of a modelling effort [1]. Moreover, even when high-frequency online data is available, a lot of information captured in the data is lost because the data reconciliation takes up too much time, and data gets thrown away. Martin & Vanrolleghem [2] showed in their review that completing a dataset instead of entirely generating it is also possible. The premise of this contribution is just that: instead of omitting noisy or unavailable data, an attempt is made to fill gaps in long-term data to make use of as much of the data as possible. A standardised workflow is proposed and a Python™ package that allows to save time in data reconciliation is showcased. Materials and Methods The online influent data used in this contribution was acquired in 2013 at the wastewater treatment plant of Eindhoven, The Netherlands, operated by Waterboard De Dommel. The code used for the data visualisation, analysis and reconciliation was written in Python™ (freely available, Python Software Foundation, Oregon, USA) and made into a package that will be made openly available through github (github.com) by the date of the conference. The analysis itself was performed in a Jupyter Notebook environment (Project Jupyter, jupyter.org), ensuring a clear and reproducible workflow. The influent model providing the output to substitute large parts of the data, was described in [3] and calibrated on the available online data to be representative for the influent composition. Next to using an influent model, the presented tool offers other ways to fill large gaps in the dataset (e.g. correlations, daily averages…) Results and Conclusions The online data obtained from the Waterboard was of (very) good quality for the influent flow rate. For other parameters, either noise or large periods of missing data showed the necessity to replace some of the data points to be able to use the data as influent data for the available model (see also Figure 1, top graph). Figure 1 illustrates the complete workflow, in this case for ammonia influent data. The second graph from the top indicates the data points that were

filtered based on the implemented filter algorithms, the third one the data points that were filled using a linear interpolation and the fourth one the completed dataset by additionally making use of the influent model output. For comparison, the influent flow is also plotted at the bottom of the figure, supporting the fact that realistic ammonia concentrations are present in the final dataset, also during rain events (i.e. diluted).

Figure 1 Plots of the complete filtering and filling procedure, here shown for 20 days of ammonia measurements. Top to bottom: original data; original data points and points filtered out by noise filtering and sensor failure detection; original and filtered data points, along with points replaced by interpolation; original data points and points filled by interpolation or by filling with modeled values; the influent flow rate at the WWTP.

All the above shows that, by making use of a streamlined workflow, a good influent model and thought-through replacement of missing or noisy data, a long-term influent dataset can be obtained with a minimum amount of effort and time. Cases were different gap filling methods are used will also be shown at the conference. References [1] H. Hauduc, S. Gillot, L. Rieger, and S. Winkler. Activated sludge modelling in practice: an international survey. Water Science and Technology, 60(8):1943–1951, 2009. [2] C. Martin and P. A. Vanrolleghem. Analysing, completing, and generating influent data for WWTP modelling: A critical review. Environmental Modelling & Software, 60:188 – 201, 2014. [3] J. Langeveld, R. Schilperoort, P. Van Daal, L. Benedetti, Y. Amerlinck, J. de Jonge, T. Flameling, I. Nopens, and S. Weijers. A new empirical sewer water quality model for the prediction of WWTP influent quality. In Proceedings of the 13th Conference on Urban Drainage, 2014.

Suggest Documents