A Semantic Framework for Automated Workflow ...

0 downloads 0 Views 471KB Size Report
contains 316 ppg files captured using different mobile devices like. Samsung S-Duos, iPhone 4, Nokia Xolo, Sony Xperia, and LG. Results of web based ...
Way to make ourselves redundant: A Semantic Framework for Automated Workflow Generation for IoT T. Chattopadhyay, S. Banerjee, S. Maiti, S. Dey, D. Jaiswal, B. Barik, Innovation Lab, Kolkata Tata Consultancy Services Limited, Kolkata, India +91 33 66884701

(t.chattopadhyay, snehasis.banerjee, santa.maiti, sounak.d, dibyanshu.jaiswal, biswanath.barik) @tcs.com

ABSTRACT With the rapid deployment of sensors across the world in various sectors, there has been a growing demand of developing smart applications and services that can leverage this boom of Internet of Things (IoT). However, developing analytical applications for IoT is a difficult process as applications tend to be cross-domain and it is unreasonable to imagine that a single developer will possess all relevant skills such as the domain knowledge, sensor signal processing knowledge, coding skill, and deployment knowledge infrastructure to develop the application. This paper presents an analytical method that assists the developer equipped with only coding skill by (i) recommending algorithms to reduce the effort required from a signal processing expert, and (ii) capturing the domain experts’ knowledge in a rule base to reduce the domain experts’ involvement and thus finally reduce development cost (mainly for hiring professionals with niche skill set) and time (by capturing the signal processing experts knowledge and ratify it with help of web). We have evaluated our method by comparing the accuracy for a typical IoT application obtained by using the algorithms used by signal processing experts against the algorithms recommended by our method.

1. INTRODUCTION Internet-of-Things (IoT), often positioned as the next generation Internet, has the potential to bring in disruptive changes in business models with potentially 1 trillion connected devices across the world. The unprecedented level of connectivity mandates new ideas and innovations encompassing several domains such as e-Governance, Health Care, Transportation, and Utilities etc. and also delves into the development of cross-domain

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

TACTiCS – TCS Technical Architects’ Conference 2015

Copyright 2015

solutions/services. However, developing such applications is inherently difficult and it may not be reasonable to imagine application developers to be equipped with the diverse set of skills and knowledge encompassing the domain, sensor signal processing, algorithms and their usage, deployment infrastructures and more. For the industry, this poses a bottleneck with respect to resources and also raises the cost of development. Figure 1 depicts a knowledge pyramid, similar to that used in [2] for an IoT-based use-case involving diagnosis of heart-related disease by gathering physiological data using non-invasive sensors and applying a series of transformations to traverse the data → information → knowledge → wisdom path. The transformations applied at each layer are tightly coupled with the knowledge acquired from the domain expertise. In an end to end system a domain expert provides the requirement. Thinking of TCS scenario, any ISU will get the requirement from a domain expert of client. To explain this scenario with a use case we can think that the domain expert provides objective which belongs to the top most level of the pyramid depicted in [2]. In a typical use case the objective might be to monitor tachycardia. This objective (or may be the disease for this use case) would be mapped to symptoms using ontology or linked knowledge like PubMed. In this paper we shall not discuss about this mapping as we assume it to be done using some existing tool. The symptoms are written in English in most the cases so we are going to develop an NLP based tool that would parse the symptom into qualifier and measurable sensor. This module will use domain knowledge and sensor space to do the parsing. Moreover the domain knowledge can be obtained in three different ways namely (i) populated by domain expert manually using our provided UI, (ii) to be populated by web search and ratified by the domain expert, and (iii) to be populated by domain expert and enhanced by data mined from web. Then the qualifier part is registered as a rule into the rule data base and the measurable property is mapped into sensor. The primary motivation of this work is to void the dependency on the availability of such niche skill-set and automate the process of application development by using knowledge models covering all these aspects. As a part of this final vision, we have created a method that is capable of

Page 1 of 5

But this proposal have no means to capture the various levels of knowledge and also no method to define the goal or sensors to be used. In [6], authors have presented a survey which provides useful insights regarding automation of the outlier removal process if the data type can be properly classified.

2. PROPOSED METHODOLOGY Our analysis based on the questionnaire filled up by the IoT application developers in our lab shows that the niche skill set of signal processing is mostly required in feature selection phase of the signal processing chain as shown in Figure 2. Algorithm recommendation has an economic value also as reported by Gartner (http://blogs.gartner.com/peter-sondergaard/the-internetof-things-will-give-rise-to-the-algorithm-economy/).

Copyright 2015

Machine Learning

Reduced feature set

Dim. Reduction

Feature Extraction

Outlier Detection

As this is a new field of research, authors could find only a small number of work related to analytics for IoT. In [5], authors have used a database that stores prior performance and complexity of different algorithms. They have initially converted raw data into a set of features that can be mined and then used in automatic selection of appropriate algorithms based on the problem data set.

(

State of the art:

Prepare

recommending suitable algorithms for sensor signal processing once the application goal is defined. It also recommends the application developer to use suitable sensor data and assist him/her to select the suitable algorithm by providing relevant information about the algorithm from web. This assistance is required to remove the requirement of having niche skill like signal processing algorithm for the application developer.

Resampling

Figure 1: Knowledge Pyramid

Figure 2: Signal Processing Chain For the rest of the phases of the signal processing chain we provide an annotated list of algorithms and contextual help for each of these algorithms by mining the related applications, usage of it, and some other relevant information from web. In this section we are going to describe some basic analytical steps we have followed to develop our system to generate data from raw sensor signal namely (i) Algorithm Recommendation system, (ii) Web based knowledge enhancement.

Algorithm Recommendation system: The recommendation works on the following steps: 

Initially an annotated repository of algorithms related to signal processing, statistical processing, machine learning, and outlier detection is constructed.

Page 2 of 5



 

The annotations involves fields like Application Program Interface (API), overview of the method, memory requirement and computational complexity as a meta- data in an OWL file. This repository currently consists of the algorithms used in Kolkata Lab only but is planned to be enriched by the experts from other labs using the crowd sourcing platform proposed by Innovation Labs, Pune. A super set of features used in 1-D signal processing is constructed based on the survey over the signal processing application papers published on major communities namely Ubicomp, SenSys, MobiComp, AAAI, WWW in the last 5 years. This list of super set of features is shown in Table. Reduce the dimension of the features using MIC, a dimension reduction method, based on the ground truth dataset and the data set. Recommend the reduced set of features to the application developer. TABLE 1: LIST OF SUPER SET OF FEATURES USED IN SIGNAL PROCESSING

Time domain features

Frequency Domain Features

RMS, zero crossing rate, low energy frame rate, Running average of amplitude, sum of absolute differences, intensity / energy/mean, variance, signal peaks/mean crossing rate, Mean, max, min, median, amplitude, and high-pass filtered values of signal intensity and jerk, ratios, difference, squared sum of peaks.

spectral entropy, spectral flux, spectral roll off, bandwidth, phase deviation, Mean and standard deviation of DFT power, FFT principal component analysis (PCA), spectral correlations, frequency domain entropy,

Web based knowledge Enhancement: Figure 3 describe our knowledge enrichment process and the major modules of the system is described below. Associated Entity Search: As the entity extraction from textual web data requires a considerable amount of text processing tasks like text normalization, part-of-speech (POS) tagging, entity recognition and its sense disambiguation etc. and the pipelined NLP processing suffers from cascading error, the associated entities (which may be the candidate for enrichment) through publicly available semantic network called BabelNet [13]. Entity Ranking: As the BabelNet is generic semantic network, there have so many irrelevant concepts associated with a particular concept which has to remove. For example, with respect to the concept ‘Fast Fourier Transform’, BabelNet shows ‘Mona Lisa’ as an associated concept which should be removed when the associated concepts are searched against ‘Fast Fourier Transform’ from the perspective of ‘signal processing’. Therefore, an entity ranking method is designed to select some relevant associated concepts.

Figure 3: Framework of web-based knowledge enrichment Relation Extraction: Once the relevant concepts are selected, this module finds the most frequent relation in between a pair of < concept, associated_concepti > Meta-Tag Generation: This module generates a small description of the concept using relevant associated concepts and their relationships with the concept. Ontology Enrichment: The relevant associated concepts can be connected to the existing ontology using relations upon (optional) expert validation.

Rule Based knowledge Extraction: There are 2 modules that take part in the reasoning process. The first module working on the data (generated by sensors) is the Data Processor involved in deriving meaningful information from the underlying data being processed. Data is usually processed at a time or groups / snapshots. The data processor is written in Java with simple templates for rule entry and processing. The second module is a Reasoner based on Description Logic with capability of doing procedural computations and complaint with the Semantic Web format and principles. The reasoned is based on an extension of the widely used Apache Jena1, developed in Java. Both the modules have interfaces for entering rules by domain experts.

Data Processor: It supports simple rules (defined in pre-defined drop-down box guided templates) that will run on the sensed data. The generic format is (Condition → Action / Inference). To enable meaningful rule writing the data from sensors were given unique IDs so that a common set of rules may be applied on a dedicated sensed data. For example, rules meant for cardiovascular monitoring should not be applied for the domain of power plant monitoring. Hence rules are defined based on the sensor and sensed data. Rules are editable as they are kept in a database with easy lookup facility. There is also checks whether there are self-contradicting rules based on simple analysis of condition and action part of rules. 1

Copyright 2015

https://jena.apache.org Page 3 of 5



Concepts and relationships are represented as ontologies in OWL3 format:



Rules are represented as Jena Rules: following example shows an example rule inferring if a person has Stress based on sensed data of heart rate and blood pressure.

Figure 5: Data Processor

(?person ?sensor) (?sensor ) . (?sensor ) -> (?person )

The above figure shows a simple data processor. The capabilities of it are enlisted below as follows: 

 





Transforming: Converting unit of sensed data into another form for maintain uniformity needed by higher level reasoning. An example rule maps body temperature read in Fahrenheit to Celsius: bodyTemp:data -> unit:FahrenheitToCelsius Filtering: Used to filter out anomalies and erroneous sensor readings like negative or excessive heart rate as shown: heartSensor:data > 250 -> remove Range Mapping: This forms the basic part of mapping of quantitative values to corresponding qualitative terms as required by higher levels of reasoning, because too much granularity is often not needed and speeds up reasoning. An example rule is shown for high heart rate: heartSensor:data > 100 -> high heart rate Aggregation: Sometimes single sensor readings may lead to erroneous information. So if a small snapshot of sensor data / past history is kept in time ordered form, aggregation operations can be made as shown below: (heartSensor:data, 10s) > 100 -> high heart rate Semantic Mapping: As the higher level semantics in our case deals with description logic (Semantic Web) therefore it becomes necessary to assign a URI to each sensor reading derived information so that apt linking and reasoning can take place. An example is shown: sensor1:data -> sem:sensor1/value (where ‘sem’ is a namespace for http://heartrate.net#) So a sensor reading of 120 heart rate gets mapped to for apt reasoning.



Queries on the combined data (facts and ontologies) are run in SPARQL4, the standard query language to query on RDF data and OWL ontologies. An example query returning list of persons affected by Stress: select ?person where {?persont }

The working principle of the Reasoner is as follows: Background knowledge in the form facts (RDF) and ontologies (OWL) is pre-loaded in the working memory. Queries (SPARQL) are registered for execution at intervals to detect patterns in the working memory. Rules (of Jena format) are bonded with the working memory to fire when data matching rule patterns enter the working memory. Data (usually dynamic sensor readings in RDF form) enters the system and is put in Data Queue where it waits for its turn for entry into the working memory. The Data Handler manages the queue, and group the data into semantically linked meaningful groups for optimal insertion into working memory. This saves time of rule firing due to limiting missing data needed to satisfy a rule pattern. For fast incremental reasoning, Rete [10] algorithm is used that sacrifices memory for speed. Initial experimental results show that the Reasoner is capable of handling acceptable loads of sensor data and ontologies. The inferences drawn from Reasoning module (the actionable knowledge) can be used for apt actions, which may in turn enable the development of Wisdom layer in future.

3. RESULT AND DISCUSSION In this section we are going to discuss about the results we have obtained and will discuss them. We shall discuss the results mainly in five aspects:

Reasoner:

Data set: We have tested the accuracy against a dataset that contains 316 ppg files captured using different mobile devices like Samsung S-Duos, iPhone 4, Nokia Xolo, Sony Xperia, and LG. Results of web based knowledge mining:

Figure 6: Reasoning Module As seen in Fig. 6, the Reasoner module is based on Description Logic and follows the Semantic Web framework where: 

Facts are in or triple format – (RDF2) with URIs appended to resources (entities) Example: (where ‘m’ is namespace for a URI, say http://medical.tcs.com)

2

http://www.w3.org/RDF/

Copyright 2015

The implementation of the web-based knowledge mining framework proposed in Figure 2 is work in progress. Up to now, we have implemented entity ranking method. For ranking entities, we have used TF-IDF (term frequency – inverse document frequency) based ranking method, a popular document ranking method, to rank the associated entities extracted from BabelNet and achieved good result. We have extracted some entities like ‘Fast Fourier transform’, ‘Matrix Transpose’, ‘Upsampling’, ‘High Pass Filter’, ‘Band Pass filter’ etc. from algorithm repository and able to rank their associated entities extracted from BabelNet. 3

http://www.w3.org/TR/owl-features/

4

http://www.w3.org/TR/sparql11-query/

Page 4 of 5

TABLE 2: ASSOCIATED ENTITY RELATED-UNRELATED CLASSIFICATION

4. CONCLUSION In this paper we have presented a method that can assist the application developer to write any signal processing application without details knowledge on signal processing or algorithm. The developer needs to have some domain knowledge which is captured to populate an ontology in a question and answer manner. In this paper we are also claiming that the feature set to be used does not depend on the type of sensor and instead it is dependent on the type of the signal. So this work will help the future researchers of signal processing application developers in IOT. This solution can be improved by incorporating the ranking the recommended algorithm and it is left a future scope of work.

Fast Fourier Transform Related Concepts

Unrelated Concepts

Fourier transform, complexity, frequency, Fourier series, Spectrum analyzer,

John Tukey, asteroid, Princeton University, Visual Basic, MP3, Huffman coding, Mona Lisa

Matrix Transpose Related Concepts

Unrelated Concepts

Matrix, vector, determinant, dot product, eigenvalue

random access memory, Frobenius normal form, apoptosis, Scilab

[1] Amit Sheth, Pramod Anantharam, Cory Henson, "PhysicalCyber-Social Computing: An Early 21st Century Approach," IEEE Intelligent Systems, vol. 28, no. 1, pp. 78-82, 2013.

High-pass filter Related Concepts

Unrelated Concepts

Filter, frequency, resistor, Electronic filter, amplitude, attenuate

[2] A. Pal, A. Mukherjee and Balamuralidhar P., "Model Driven Development for Internet of Things: Towards easing the concerns of Application Developers", International Conference on IoT as a Service, IoT360 Summit, Rome, 2014

imaginary number, Kirchhoff's circuit laws, Edge detection, electronics, electric circuit

Comparison of accuracy of recommended algorithms: We have evaluated our result against the state of the art method described in [7] and we have found that our proposed method which uses automatically generated feature set produces almost same result. The comparison with ground truth and state of the art work as reported in [7] is presented in Table. It is evident from the table that the predicted feature set works with almost same accuracy. The results described in Table 3 shows that our result is quite compliant with the ground truth. The noise in some signal that were not detected in our automated outlier generation method is the reason behind error with respect to ground truth. The features we have used in this work is a superset of the features used in [7] and thus the result is also close with this work. TABLE 3: COMPARISON WITH STATE OF THE ART WORK AND GROUND TRUTH

User->

1

2

3

4

5

6

7

8

Actual

54

66

84

106

80

105

105

80

Method [12]

53

63

84

98

88

102

104

81

Recommended

54

65

84

98

89

101

104

80

Gain in time of development: The development time for the [7] work was three months to develop the algorithm chain. On the other hand the similar workflow was generated using our tool is 3 days only.

Copyright 2015

5. REFERENCES

[3] Balamurali P, Prateep Misra, Arpan Pal, “Software Platforms for Internet of Things and M2M”, Journal of the Indian Institute of Science, A Multidisciplinary Reviews Journal, VOL 93:3 Jul.–Sep. 2013, ISSN: 0970-4140 Coden-JIISAD [4] Prateep Misra et. al., “A computing platform for development and deployment of sensor data based applications and services”, Patent No. WO2013072925 A2 [5] Automatic Mapping from data tp preprocessing algorithms, US Patent No WO2002073529 [6] M. Gupta, J Gao, C. C. Aggarwal, J. Han, Outlier Detection for Temporal Data: A Survey, IEEE Transaction on knowledge and data engineering, Vol 25, No 1, January2014. [7] Pal, Arpan, et al. A robust heart rate detection using smartphone video. Proceedings of the 3rd ACM MobiHoc workshop on Pervasive wireless healthcare ACM, 2013. [8] R. Navigli and S. Ponzetto, BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network, Artificial Intelligence, 193, Elsevier, 2012, pp. 217-250 [9] Compton, M., Barnaghi, P., Bermudez, L., Garc´ıa-Castro, R., Corcho, O., Cox, S., Graybeal, J., Hauswirth, M., Henson, C., Herzog, A., et al.: The ssn ontology of the w3c semantic sensor network incubator group.Web Semantics: Science, Services and Agents on theWorld Wide Web 17, 25–32 (2012). [10] Forgy, Charles L. "Rete: A fast algorithm for the many pattern/many object pattern match problem." Artificial intelligence 19.1 (1982): 17-37.

Page 5 of 5

Suggest Documents