daily activity for analyzing routine behaviour and its deviations. Keywords â body ... routine for efficient behaviour profiling in a home healthcare environment.
Pattern Mining for Routine Behaviour Discovery in Pervasive Healthcare Environments Raza Ali, Mohamed ElHelw, Louis Atallah, Benny Lo and Guang-Zhong Yang
Abstract— Pervasive sensing is set to transform the future of patient care by continuous and intelligent monitoring of patient well-being. In practice, the detection of patient activity patterns over different time resolutions can be a complicated procedure, entailing the utilisation of multi-tier software architectures and processing of large volumes of data. This paper describes a scalable, distributed software architecture that is suitable for managing continuous activity data streams generated from body sensor networks. A novel pattern mining algorithm is applied to the pervasive sensing data to obtain a concise, variable-resolution representation of frequent activity patterns over time. The identification of such frequent patterns enables the observation of the inherent structure present in a patient’s daily activity for analyzing routine behaviour and its deviations. Keywords – body sensor networks, frequent pattern mining, activity recognition, behaviour profiling I. INTRODUCTION
T
HE recent emergence of pervasive healthcare has enabled the monitoring of chronically ill patients or elderly in their own home environments [1]. This can result in lower costs, reduced strain on healthcare, and more convenience for patients and their social carers [2, 3]. Pervasive sensing is also valuable for measuring postoperative recovery, which is particularly important for minimally invasive surgery as early signs of post-operative complication can only be captured in a home environment because patients are usually discharged much more quickly compared to that of conventional surgery [3]. Current designs of pervasive sensing are also working towards integrating ambient and wearable sensing for continuous monitoring of key physiological and activity indices of patients. Alteration of daily activities and deviations from normal behaviour can manifest the onset or progression of disease. Pervasive healthcare deployments entail a diverse set of software needs including interoperability with legacy sensors, web services requirements, varying levels of security expectations and visualization for different system users. In addition, the need to continuously gather and process large volumes of sensor data, coupled with repeated querying of the database, would introduce significant Manuscript received 29 Feb 2008 R. Ali, M. ElHelw, L. Atallah, B. Lo and G.Z. Yang are with the Department of Computing, Imperial College, London. E-mail:{smrali, me, latallah,benlo,gzy}@doc.ic.ac.uk.
computational and storage loadings. This problem is exacerbated when a large number of users need to be catered for at the same time. As a result, algorithmic and system- and functional-level complexities comprise significant challenges to pervasive healthcare system developments. In this case, rigid software architectures designed to be applicationspecific cannot cope with the diverse and evolving requirements of this rapidly evolving field of research and development. Recently, a number of light-weight software methodologies suitable for scalable data processing, transmission and storage have been introduced [4,5]. Synopsis structures [4,6], for example, can act as substantially smaller visualization and querying surrogates for the actual data for a specific set of queries. Techniques such as wavelets, histograms, sketches and sub-sampling can also reduce the resource utilisation of data dramatically, thus freeing up main memory, lowering access rates at the database, and improving responsiveness for web clients. For these reasons, there has been an increasing interest in efficient methods for synopses for data streams [4]. Behaviour profiling is an important area in pervasive sensing [7]. The onset or complication of a life threatening episode may be marked by changes in behaviour and activity patterns. For example, in the elderly population, prostatism, degenerative joint disease, bursitis, and gastro-esophageal reflux are common causes of frequent awakening episodes and disturbed sleep, along with congestive heart failure, coronary artery disease, and chronic obstructive pulmonary disease. Sophisticated algorithms can recognize an increasing range of user’s activities from wearable and ambient sensor data [3]. While there is significant variability in human activity, there is typically a repeating, structure over the long term. Finding this routine pattern from activity information is an important step towards the understanding of the general behaviour of the patient. To this end, frequent pattern mining [8,9,10,11] constitutes a promising technique for the discovery of rich routine information and has been successfully applied to a number of applications [9,11]. This paper proposes a framework for routine pattern discovery based on activity data obtained from the e-AR (ear-worn activity recognition) sensor developed at Imperial College London [2] and demonstrates its use to efficiently mine and update a concise variable-resolution synopsis routine for efficient behaviour profiling in a home healthcare environment.
II. A SOFTWARE ARCHITECTURE FOR PERVASIVE HEALTHCARE The proposed system is based on the mediated push model [12] which has been used for many industry frameworks such as J2EE [13] and research frameworks for pervasive systems [8]. It uses an intermediary connector, called the broker, to facilitate communication between distributed software components. Components can publish data under a topic or subscribe to a particular topic, or a category of topics. The broker delivers data to registered subscribers. Data producing components can provide metadata in schema-conformant XML, which can subsequently be queried from the broker. A uniform API is provided in Java, C++, AJAX and Web Services. Communication can be through sockets, secure sockets, Remote Method Invocation (RMI), and Java Messaging Service (JMS). Multiple copies of the broker can be used for load balancing.
Figure 1: A Scalable Distributed Architecture for Pervasive Healthcare.
An example deployment of our system in a pervasive homecare scenario is shown in Figure 1. Data from the e-AR sensor is collected and communicated to the server using an authorized gateway device. The server side hosts a database cluster that archives data streams on relatively slow storage. A dedicated cluster machine buffers and mines data to generate synopsis structures that are subsequently stored on limited fast storage in the database. For optimal utilization of resources and responsiveness, most queries and visualizations are serviced by data stored in the fast storage. Each of the data producing, processing, visualization and storage components are linked by the broker. Such decoupling of components, which never explicitly refer to each other, enables for more flexible and scalable systems. III. MINING ACTIVITY LEVELS When a user is wearing an e-AR sensor, an activity level is streamed every 4 seconds. This activity level is the output of the classifier as described in [2], and relies on the variance and range of the accelerometer data. The output of the classifier can take one of four values. The lowest level indicates almost no activity (during sleeping or sitting for
example) and the highest level indicates an activity involving a lot of movement, such as running. While some activities may be described by a single activity level, many activities may produce a sequence of activity levels, for example, exercising and working. Frequent pattern mining can discover such repeating combinations of activity levels. In the proposed pattern mining framework, a transaction database D is a set of transactions. Each transaction T is a subset of items I = {i1 , i2 ,..., in } and is associated with time t . The items, in our problem, are the activity levels. A transaction is a list of distinct activity levels occurring in a given time window. A pattern is a non-empty subset of I . A transaction T is said to contain a pattern P if P ⊆ T . The number of transactions containing a pattern is called its support. A pattern is considered to be frequent if its support is above a user specified threshold. Furthermore, a pattern is regarded as persistent if it has the highest support. Closed frequent pattern mining improves on this by limiting its search to patterns that are not subsets of any other pattern with the same support. This results in fewer redundant patterns. FP-Stream [8] is an example of a frequent pattern mining algorithm that incorporates temporal information by mining a data stream at different granularities. It constructs a prefixtree of patterns, where each node specifies support for a pattern over time windows. Time windows grow in sequences such as t , 2t , 4t , 8t and so on, where t is a unit of time. While this is a powerful technique for general data streams, it is not an intuitive data structure for describing human routine. Other fast algorithms have been investigated for finding frequent patterns. Closet+ [10], which we have used in this work, is an example of an algorithm which can quickly find closed frequent patterns using optimized search strategies, and provides advantages over existing mining algorithms in terms of runtime and memory usage [10]. IV. ROUTINE TREE Based on methods such as FP-Stream and Closet+, we propose a data structure called the routine tree designed to describe behaviour patterns. The routine tree is a variable resolution mapping of time periods within a day to frequent patterns of activity levels. Whereas FP-Stream associates pattern-nodes with a table of time-windows, each node in a routine tree is a time-period associated with a table of frequent patterns. Each tree represents one day, and nodes on one level, from left to right, show sequentially increasing, non-overlapping periods. The criterion for this division of time is the persistent activity pattern: adjacent nodes show a change in the persistent pattern. Figure 2 shows an example of such a tree. This tree presents an intuitive view of the composition of a user's routine. Finer resolution information about a user's activities is obtained by moving down the tree and studying coarse grain information from nodes nearer the root. The data structure can be used to answer a range of queries about
the user’s activities. In this respect, it can be seen as a synopsis structure for user activity. 0:00 – 23:59
0:00 – 12:00
12:00 – 23:59
Figure 3: Pruning a Routine Tree –an example of a merge operation.
0:00 – 8:00
8:00 – 12:00
Fig 2: Routine Tree with support shown between 12:00 and 23:59 for capturing subject activities.
V.
CONSTRUCTING A ROUTINE TREE
We construct the routine-tree in two phases. In the first phase, we construct a tree by mining activity data in progressively finer granularity. This tree is then pruned to produce a more compact representation. The first phase is described in Algorithm 1. We begin with the complete day as the root node then systematically construct a binary tree by executing Closet+ on the node. Closet+ yields a table of frequent patterns and their supports, which is associated with that node. The node is split if it meets splitting criteria that takes into account the number of frequent patterns found, and the minimum duration at which the data is to be mined. Algorithm 1: Generate-Tree Inputs: Database D of Activity Levels indexed by time, Node N representing a period [N start , N end ] , split _ threshold , min _ duration .
Outputs: A Routine Tree R rooted at N 1. 2.
Apply Closet+ on all transactions in D with time in [N start , N end ] . Store patterns found in Table(N ) If number of patterns in Table(N ) is greater than split _ threshold and duration of N is above min _ duration a. Split N into equal duration nodes, N left and N right adding them as the children of N in the tree R b. call Generate-Tree with N left c. call Generate-Tree with N right
The tree is then made compact by merging nodes where possible. If two adjacent leaves have the same maximal frequency pattern, they are candidates for merging. If they are siblings, they can be removed from the tree. Otherwise they are merged, which involves restructuring the tree, and updating the node periods and pattern tables. Figure 3 shows an example of this operation.
By combining routine-trees across several days, we can find patterns that typify the user’s routine. As before, a tree is constructed, then pruned. Instead of using Closet+ to generate the pattern-tables for each time period, patterns occurring during the period are collected from each tree. Patterns below the minimum support are then removed. VI. RESULTS The methods explained above are applied to two datasets. The first dataset (Simulated Data) is a simulated dataset from activity data compiled by using a certain known pattern. This data was gathered in the lab using the e-AR sensor for different household activities, such as sleeping, eating, walking, etc. Data for one day is then created by concatenating these activities together in a sequence relatable as a typical daily routine. The second dataset (Real Data) is obtained by asking a normal subject to wear the e-AR sensor and roughly label his activities throughout the day. To obtain a quantitative measure of accuracy, we have compared the base dataset against the tree. Each transaction in the dataset is compared against the patterns stored in the tree for that time at the highest resolution. A transaction and pattern are compared by finding the size of their intersection and scaling it by the size of the larger of the two sets. The best match is the accuracy of the tree for that transaction. Table 1 shows the accuracy of the proposed method averaged over the transactions in each dataset. In this study, we define compression factor to be the ratio of the size of the base dataset to the number of rows needed to represent the tree in a database, which includes the nodes of the tree and the patterns associated with them. Table 1 shows compression factors of the two datasets. Trees generated by the algorithm can be substantially smaller than the base data, by up to 12 times. Higher compression is obtained on the simulated data because it is a larger dataset, has extended periods of sleep and work that are suitable for compression, and has less variation in activity levels. TABLE I ACCURACY AND COMPRESSION RESULTS Simulated Data Accuracy Compression
96.4 12.2
99.2 9.5
98.1 9.3
98.1 10.8
Real Data 97.2 9.6
88.9 2.2
In order to effectively visualise the data, a software toolkit has also been developed. The first view, shown in Figure 4, provides a condensed view of routine by plotting the leaves of the tree against a time grid. This shows the tree in the highest resolution. The persistent pattern is shown for each period. Patterns are associated with a bar colour and height, shown in Figure 4. Increasing bar size indicates a more strenuous activity. The second view, in Figure 5, shows the complete tree, level by level against a time grid. Figure 4 shows the leaf-view of the trees from 5 days in the first dataset. The structure of each day in the simulated data can be seen, with periods of sleep, followed by exercise on some days, an 8 hour work period with a break for lunch, evening exercise, rest, then sleep. The “Week” tree is a result of combining the five trees, which captures the general picture of the simulated routine.
been demonstrated on data gathered from the e-AR sensor. In this study, visualizations that provide a multi-resolution view of the user’s day based on the routine-tree are developed. Future work could be conducted to further improve the system performance. This can be through online methods for mining streaming data. Bottom-up tree construction approaches could also be explored to re-use mining results for a time period in higher resolutions.
Figure 5: Example results for a subject wearing the sensor during late evening hours of the day.
REFERENCES [1] [2]
[3]
Figure 4: Results on simulated behaviour dataset showing the behaviour pattern changes during the week.
The set of real data was collected during the lateevening hours, during which the user worked, commuted home, rested and finally slept. Figure 5 is the treevisualization for this dataset. Each phase of the user’s day is distinguishable as labelled in the figure. Increasing levels show more detail of these phases. The first level merely summarizes the recorded session as one of moderately intense activity. The second level shows three phases in the day, which we know to be work, commute, and rest at home. In the third and fourth levels, more detail can be seen, such as a period of sleep at the end of the day, a break during work hours, and the composition of the commute: walk to the station, stand in the train, and walk home.
[4]
[5]
[6]
[7]
[8]
[9]
VII. CONCLUSIONS To cater for variable requirements and high resource demands of pervasive sensing applications, a flexible and scalable software architecture is proposed. A mediatory component is used to stream data to processing, storage and visualization units. Significant resource costs are avoided by summarizing behaviour information in a proposed binary tree structure called the Routine-Tree, which maps time periods to frequent patterns of activity. This tree is constructed by systematically mining activity data to make it more compact. A high accuracy and compression ratio has
[10]
[11]
[12]
[13]
G.Z. Yang, Body Sensor Networks, ed. G.Z. Yang. 2006, London: Springer-Verlag. B. Lo, L. Atallah., O. Aziz, M. ElHew, A. Darzi and G.Z. Yang, “Real-Time Pervasive Monitoring for Postoperative Care,” in Proc. 4th International Workshop on Wearable and Implantable Body Sensor Networks, Aachen, 2007, pp. 122 -127. O. Aziz, L. Atallah, B. Lo, M. Elhelw, L. Wang and G. Z. Yang, A. Darzi, "A Pervasive Body Sensor Network for Measuring Postoperative Recovery at Home," in Surgical Innovation, vol. 14 (2), 2007, pp. 83-90. C. C. Agarwal and P. S. Yu, “A Survey of Synopsis Construction Algorithms in Data Streams,” in Data Streams: Models and Algorithms, Springer Verlag, 2007, pp. 169 – 202. M. Modahl, I. Bagrak, M. Wolenetz, P. Hutto and U. Ramachandran, “Mediabroker: An architecture for pervasive computing,” in Proc. of the 2nd IEEE International Conference on Pervasive Computing and Communications, Orlando, 2004, pp. 253 – 262. P. B. Gibbons and Y. Matias, "Synopsis Data Structures for Massive Data Sets," In DIMACS: Series in Discrete Mathematics and Theoretical Computer Science: Special Issue on External Memory Algorithms and Visualization, A, 1999. L. Atallah, M. ElHelw, J. Pansoit, D. Stoyanov, L. Wang, B. Lo and G.Z. Yang, “Behaviour Profiling with Ambient and Wearable Sensing,” in Proc. 4th International Workshop on Wearable and Implantable Body Sensor Networks, Aachen, 2007, pp. 133 -138. C. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu, “Mining Frequent Patterns in Data Streams at Multiple Time Granularities,” H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), Next Generation Data Mining, AAAI/MIT Press, 2004, pp. 191 – 210. F. Pan, G. Cong, A.K.H. Tung, J Yang and M Zaki, “CARPENTER: finding closed patterns in long biological datasets,” in Proc. of the 9th ACM SIGKDD , Washington, 2003, pp. 637 – 642. J. Wang, J. Han and J. Pei, “CLOSET+: searching for the best strategies for mining frequent closed itemsets,” in Proc. of the 9th ACM SIGKDD, Washington, 2003, pp. 236 – 245. W. Lee, S. Stolfo, P. Chan, P. Eskin, E. Wei Fan, M. Miller, S. Hershkop and J. Zhang, “Real time Data Mining-based Intrusion Detection,” in Proc. DARPA Information Survivability Conference & Exposition, Vol. 1, 2001, pp. 89 – 100. F. Buschmann; R. Meunier, H. Rohnert, P. Sommerlad and M. Stal, “Pattern-Oriented Software Architecture, Volume 1: A System of Patterns," John Wiley & Sons Ltd, 1996. P. J. Perrone, V. S. R. R. Chaganti, T. Schwenk, “J2EE Developer’s Handbook,” Prentice Hall, 2003.