Technological platform for complex processing of large data within ...

2 downloads 19101 Views 1MB Size Report
The paper presents the technological platform for large data processing within ... the BigData approach for diverse data services integration and management.
Technological platform for complex processing of large data within early warning systems Denis A. Nasonov, Timofey N. Tchurov, Alexandr S. Zagarskih, Sergey V. Kovalchuk ITMO University, Saint Petersburg, Russian Federation

Abstract The paper presents the technological platform for large data processing within Early Warning Systems (EWS). The core idea of general-purpose EWS platform is based on abstract data processing performed with the use of domain-specific imperative (procedures) and declarative (semantic structure) knowledge. The platform is based on the CLAVIRE cloud computing environment and exploits the BigData approach for diverse data services integration and management. Various visualization and interaction facilities are supported within the platform for complex data visualization and analysis. Keywords: early warning system, big data, cloud computing.

1 Introduction Today, the development and maintenance of early warning systems (EWS) that are capable in limited time efficiently to predict extreme events, which destabilize the economic and public-social processes, as well as to decrease their significant damage is topical complex problem. EWS applicability can be found in the following crucial areas: natural catastrophic detection such as tsunami, flood, earthquakes; detection of economic changes; distribution of an epidemic. Examples of scientific researches in this field are found in Refs. [1,2]. It is of special importance how the provided infrastructure for EWS is organized. Common approach to solve this issue is based on Service-Oriented Architecture (SOA). For example, it was proposed for EWS project UrbanFlood FP7 with loosely coupled replacement components as one of the fundamental advantage [3,4]. One of the most important issues of the EWS service environment is related to integration

WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line) doi:10.2495/ICCTS140711

626 Advances in Communication Technology and Systems

and processing of various data services (including implementation of procedures for data processing, transformation, and presentation). As a rule, a set of data services being incorporated into EWS provides large amount of diverse data. Today such solutions are usually developed within a scope of BigData approach [5]. One of the core issues within this area is integration and aggregation data from diverse data-sources and implementation of high-level domainspecific expressive toolbox for data analytics. Moreover contemporary complex scientific solutions usually require integration of data analytics tasks with computational services within Grid or Cloud environments (see, for example, [6,7]). On other hand, the usage of diverse services for visual data analysis is important for deep understanding of the situation in the observed field and for convenient interactive support for decision making. Mostly, a system should provide different tools for geo-informational data representation with the support of dynamic changes [4]. Such subsystem could show not only raw data, but data achieved after analyses, aggregation and simulation [8,9]. Also specific interactive tools (such as a multi-touch tables, 3D-projectors and even brain-computer interfaces (BCI) [10]) often allow us to extend the capabilities of EWS. In this paper, we propose a concept of technological platform for complex processing of large data within EWS, developed on the basis of Cloud Computing environment as a service platform for simulation-based solutions.

2 EWS aspects To accomplish the main EWS objects, we supposed that EWS can be represented by three main phases: monitoring phase, urgent phase, and recovery phase. During monitoring phase, two main aims are claimed to be achieved. First of all, computational scenarios that are running automatically in long-running mode [11] should process heterogeneous external data, which is retrieved from the sensors and then transformed, filtered, and saved in the storage. Secondly, scenarios should detect upcoming hazard by using provided external data information. On other hand, supportive activity directed to scientific researches and system improvement is also necessary. Urgent phase contains the most complex scenarios targeted to resist adverse hazard effects. Whereas recovery phase restores all the system processes back to the normal state. In general, all these cases need a technological base that can satisfy high function, reliability, and time-depended requirements. In this paper, CLAVIRE platform is used for these needs [12]. This cloud computational platform provides following principal features: i) ii) iii) iv) v)

Support of composite applications (abstract workflows) IWF technology [11] Models’ (software packages) unification Heterogeneity of computational resources Integrated classes of workflow scheduling algorithm with variable characteristics, including reliability vi) Resource and package hot deploy vii) Distributed storage with supported BigData mechanism.

WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line)

Advances in Communication Technology and Systems

627

Using workflows as formalism for scenario presentation is traditional approach due to its simplicity and utility. In our case, an abstract workflow is represented as directed acyclic graph that has models (software packages) in the nodes and edges that are assumed as data dependences. 2.1 Data aspects EWS shows the needs for flexible technology to manage large amount of data coming from various sources (services). More detailed analysis of the requirements discovers tight correlation between EWS specific and definitive characteristics of BigData (see, for example, Ref. [5]). Namely: i)

ii)

iii)

iv)

Volume. Typical EWS should dynamically process large amount of data to monitor the environment, discover significant events, and exploit databased models. Variety. Involved data sources often have different data format, semantics, access protocols, policies, etc. On the other hand, a set of computational and visualization services have their own requirements to input data formats which need to be taken into account. Velocity. Monitoring phase of EWS include management of data-sources which often provide real-time data (observations, sensors, monitoring services, etc.). To react to extreme events properly and promptly the EWS needs to process this data continuously. Veracity. Data sources used by EWS often vary in trust level and origin. For example, observations, sensors results, forecasts, simulation results can be combined within single solution.

To manage the set of data services, a technology for data collecting, integration, and analysis should be developed as an important part of the basic EWS platform.

3 EWS platform Architectural concerns of EWS platform are oriented toward processing large data and will be described in this paragraph. The platform provides several subsystems: i) ii)

iii)

Service environment based on the CLAVIRE platform, which enables composition of available services within interactive composite applications. Data management system, which allows integration of various data services within general purpose composite applications in cloud computing environment. Data visualization system, which enables high-level integration of data available within dynamic service environment and interactive facilities of EWF.

3.1 General architecture General concepts of EWS architecture is presented in Figure 1. It consists of three traditional tiers: presentation tier; service tier; and data tier, which is divided in

WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line)

628 Advances in Communication Technology and Systems

Figure 1: General EWS architecture.

two subtiers: interpretation and storage. Interpretation subtier determines the functionality necessary for processing data from external sources. Storage subtier gives possibility to filter, interpret, and store data in unified form using databases and distributed storage. While the service tier contains core platform logic than provides mechanisms for scenario lifecycle management using cloud communicational and computational fundamentals. The whole life cycle includes the following main steps: scenario description is parsed by ScenarioManagement service; found composite applications (CAs) are passed to CAInterpetation module that forms their object representations; Execution service takes ready-to-run CAs and extracts all models’ parameters with available computational resources from bases; then Execution verifies access permissions, makes scheduling plan using Multischeduler service and submits all this to ResourceControllerFarm module, finally Execution service monitors the execution progress and returns back results by services chain to ScenarioManagement. The user’s interaction level presentation tier includes IDEs for scenario and composite applications development that has essential impact on EWS design and enhancement stages. Also EWS Community and life cycle modules are included in presentation tier for providing workspace to operational staff and experts’ group.

WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line)

Advances in Communication Technology and Systems

629

3.2 Monitoring of data sources For the development of an efficient EWS monitoring phase, the following entities are involved: scenario, composite application, and adapter (specialized template package). The steps below represent the general monitoring development concept for external data sources: High-level design of all scenarios is made, including those, which are responsible for processing and monitoring of external data sources. ii) For each data processing scenario, besides the logic connectives, all composite applications are determined. iii) For each composite application main logic is defined which mainly consists of models (adapters are also included) and their transitions. iv) For each adapter required parameters (such as include URI, update frequency, source data format, etc.) are defined. i)

Adapters have two conceptual semantic levels: data extract level and data interpretation level. Data extract level contains already implemented data transfer protocols and mechanisms for parsing and preprocessing row data of different types. On the other hand, data interpretation level maintains complex processing data methods, such as filtering, transformation, and multiplexing. 3.3 Data system interpretation The developed EWS platform includes a technological solution for large diverse sets of data from different sources management (see Figure 2). The technology supports integration data-source of the following classes: (a) distributed data storages; (b) external services for accessing stored or real-time data. The main goal of the solution is provide high-level support for the following procedures: i)

Select data from large arrays stored in distributed data storages and available by the use of external services.

Figure 2: Data processing technology.

WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line)

630 Advances in Communication Technology and Systems

ii) Distributed data processing using code-to-data approach [13], which is appropriate for processing of large amount of data. Within EWS, this procedure can include simulation, forecasting, and other time-consuming actions. iii) Monitoring of event signals, which can be considered as a special kind of service data sources. iv) Aggregation of distributed and diverse data which may include statistical analysis, high-level data structures composition, etc. v) Interpretation of aggregated data for events occurrence identification and situation assessing. All these procedures should support integration into composite solution representing the core logic of EWS being implemented. To support the integration process, the request for data processing and response to it should be interpretable within CLAVIRE cloud computing environment. The data-sources are registered and described within data sources base. It defines the references between data-sources and domain-specific abstract data types. To support the semantic integration core data processing functions are performed using abstract data types which are linked with the following artifacts which can be considered as knowledge formalized by experts: Domain-specific semantics. Core semantics of the abstract data types is linked with domain-specific objects. These objects and related domainspecific simulation procedures (methods, models) are described using Virtual Simulation Objects (VSO) technology [14]. ii) Set of data-specific procedures, which allows reading different data formats and represents abstract data structures with certain interface are defined within a set of domain-specific libraries used by request processor. iii) Software-specific data structures and procedures are defined within PackageBase which describes the software services available in CLAVIRE.

i)

Being integrated together the mentioned artifacts (VSO description, domainspecific libraries, PackageBase) form the basis for abstract data types processing. To complete the integration process, the dynamically constructed domain-specific language (DSL) is presented as a tool for description of the requests processed by the system. The knowledge-based artifacts allows us dynamically extend the basic syntax of the DSL with the key words and structures specific for particular data sources, problem domain, and services involved into the composite application. 3.4 Data visualization EWS needs special formats of visualization of data and processes. Within the development of EWS platform a general approach is developed which enable integration of (a) complex data arrays available during the execution of composite application; (b) visual images corresponding to this data; (c) interactive facilities which enable visual analysis and interaction with visualized scene. WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line)

Advances in Communication Technology and Systems

631

Being based on semantic integration of the abstract data types provided by the data management system, this approach facilitates dynamic interconnection of all subsystems implemented within EWS platform. One of the usual ways for data visualization within EWS is geographic information systems (GIS). Raw data and first warning information should be localized on a map immediately, which makes the geo-information data representation the most important for user of EWS. Layer-based GIS framework could be most advantageous solution for these purposes, since it supports attachment of many standard layers from different data sources and custom layer predefined by users or preparing by EWS automatically. The standard layers provide a graphical representation of raster maps, roads, buildings, etc. User-defined layers could content domain specific static data such as power lines, rescue teams positions, and other points of interest. Auto generated layer is for analytical and simulation data achieved at the second phase of EWS work. All these layers could be presented on usual visualization tools, such monitors and screens, as well as on touch-tables for collaboration and tablets and mobile phones for mobility. The example of described type of visualization related to surge floods prevention in St. Petersburg is shown in Figure 3. a)

b)

c)

Figure 3: Example of EWS scenario for flood prediction.

WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line)

632 Advances in Communication Technology and Systems

Another interactive device that could be useful for presentation of simulated results could be 3D projectors. With this instrument the users of EWS could straightforwardly watch and evaluate potential effects of extreme situation. Finally it’s possible to use brain–computer interface (BCI) for deep and smooth interaction of users and EWS. For example, BCI could select fields which are more interesting for experts and automatically readjust parameters of simulation and visualization, show additional information, etc. Moreover, BCI could automatically evaluate physical sand phychoemotional states of users during hard situation and warn other participants in the case of exhausting [15].

4 Example The main example of EWS monitoring scenario for surge floods prevention in St. Petersburg is proposed as a concept in Figure 3. On the (a) designed adapters’ chain for extracting water level data in Neva Bay is presented. Firstly, the raw information is extracted from the HTML public page by HTMLContentParser. Than it is parsed by ImageParser adapter which transfers collected data in the string format interpretation block where it is transformed in the unified form and is saved in the storage (by ObjectConstructor and StorageDriver adapters). After data appears in the system, main computational composite application for surge floods prevention receives event of new data “appearance”. On the (b) completed iteration of the CA is shown. Adapter’s schema from (a) and CA from (b) are connected through the WF block LevelDataLocator. On the last picture (c) visualization of plan for surge floods prevention is executed for experts (last block PlanVisualizer in (b) corresponds to it). The presented scenario of the EWS monitoring phase as a concept clearly shows how proposed technological platform can be applied for efficiently solving EWS issues.

5 Conclusion In this paper, we proposed a general concept of EWS technological platform which is focused on large data processing and visualization. The generalization of the solutions become available with the support of the domain-specific semantics and abstract data processing based on cloud computational and communicational principles. As a result, we have demonstrated the application of the proposed approach to develop surge floods prevention EWS in St. Petersburg with the use of CLAVIRE platform.

Acknowledgments The research work was partly financially supported by Government of Russian Federation, Grant 074-U01. Data management facilities were developed within the project “Big data management for computationally intensive applications” (project #14613).

WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line)

Advances in Communication Technology and Systems

633

References [1] A. Berg, E. Borensztein, C. Pattillo, Assessing early warning systems: how have they worked in practice? International Monetary Fund Working Paper No. 04/52, 2004. [2] Y. Hong, R.F. Adler, Towards an early-warning system for global landslides triggered by rainfall and earthquake, International Journal of Remote Sensing, pp. 3713–3719, 2007. [3] B. Balis, M. Kasztelnik, M. Bubak, T. Bartynski, T. Gubała, P. Nowakowski, J. Broekhuijsen, The urbanflood common information space for early warning systems, Procedia Computer Science, 4, pp. 96–105, 2011. [4] V.V. Krzhizhanovskaya, G.S. Shirshova, N.B. Melnikova, R.G. Belleman, F.I. Rusadi, B.J. Broekhuijsen, B.P. Gouldby, J. Lhomme, B. Balis, M. Bubak, A.L. Pyayt, I.I. Mokhov, A.V. Ozhigin, B. Lang, R.J. Meijer, Flood early warning system: design, implementation and computational modules, Procedia Computer Science, 4, pp. 106–115, 2011. [5] M.D. Assuncao, R.N. Calheiros, S. Bianchi, M.A.S Netto, R. Buyya, Big data computing and clouds: challenges, solutions, and future directions, arXiv preprint, arXiv:1312.4722, 2013. [6] Y. Gil, V. Ratnakar, R. Verma, A. Hart, P. Ramirez, C. Mattmann, A. Sumarlidason, S.L. Park, Time-bound analytic tasks on large datasets through dynamic configuration of workflows, Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science, pp. 88–97, 2013. [7] M. Baranowski, A. Belloum, M. Bubak, MapReduce operations with WS-VLAM workflow management system, Procedia Computer Science, 18, pp. 2599–2602, 2013. [8] Zhi-xiang Xing, Wen-li Gao, Xiao-fang Zhao, De-zhi Zhu, Design and implementation of city fire rescue decision support system, Procedia Engineering, 52, pp. 483–488, 2013. [9] D. Balcan, B. Goncalves, H. Hu, Ramasco, V. Colizza, A. Vespignani, Modeling the spatial spread of infectious diseases: the GLobal epidemic and mobility computational model, Journal of Computational Science, 1, pp. 132–145, 2010. [10] M. van Gerven, J. Farquhar, R. Schaefer, R. Vlek, J. Geuze, A. Nijholt, N. Ramsey, P. Haselager, L. Vuurpijl, S. Gielen, P. Desain, The brain– computer interface cycle, Journal of Neural Engineering, 5, 2009. [11] K.V. Knyazkov, D.A. Nasonov, T.N. Tchurov, A.V. Boukhanovsky, Interactive workflow-based infrastructure for urgent computing, Procedia Computer Science, 18, pp. 2223–2232, 2013. [12] K.V. Knyazkov, S.V. Kovalchuk, T.N. Tchurov, S.V. Maryin, A.V. Boukhanovsky, CLAVIRE: e-Science infrastructure for data-driven computing, Journal of Computational Science, 3(6), pp. 504–510, 2012. [13] A. Manjunatha, A. Ranabahu, P. Anderson, A. Sheth, Getting code near the data: a study of generating customized data intensive scientific workflows WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line)

634 Advances in Communication Technology and Systems

with domain specific language, IEEE International Conference on Cloud Computing Technology and Science, 2010. [14] S.V. Kovalchuk, P.A. Smirnov, S.S. Kosukhin, A.V. Boukhanovsky, Virtual simulation objects concept as a framework for system-level simulation, IEEE 8th International Conference on E-Science, pp. 1–8, 2012. [15] S.V. Kovalchuk, D.M. Terekhov, A.A. Bezgodov, A.V. Boukhanovsky, visual exploration of complex network data using affective brain–computer interface, International Journal of Advanced Computer Science and Applications, 4(7), pp. 21–27, 2013.

WIT Transactions on Information and Communication Technologies, Vol. 56, © 2014 WIT Press www.witpress.com, ISSN 1743-3517 (on-line)

Suggest Documents