Toward active sensor placement for activity recognition - wseas

3 downloads 0 Views 432KB Size Report
McGraw-Hill Professional,. 2002. [11] M.E. Hackel, G.A. Wolfe, S.M. Bang, and J.S.. Canfield. Changes in hand function in the aging adult as determined by the ...
Recent Researches in Communications, Automation, Signal Processing, Nanotechnology, Astronomy and Nuclear Physics

Toward active sensor placement for activity recognition PAUL M. YANIK1 , JOE MANGANELLI2 , LINNEA SMOLENTZOV3 , JESSICA MERINO1 , IAN D. WALKER1 , JOHNELL O. BROOKS3 , KEITH E. GREEN2 Clemson University 1 Department of Electrical and Computer Engineering 2 School of Architecture 3 Department of Psychology Clemson, South Carolina USA {pyanik, jmanganelli, linneas, jmerino, iwalker, jobrook, kegreen}@clemson.edu Abstract: The development of ubiquitous sensing strategies in home environments underpins the promise of adaptive architectural design, assistive robotics, and services which would support a persons ability to live independently as they age. In particular, the ability to infer the actions, behavioral patterns and preferences of the individual from sensor data is key to effective design of such components for aging in place. Very often, sensing for recognition of human activity utilizes vision based sensors. However, it has been seen that many home users find the presence of cameras to be invasive. Hence, we seek to develop a sensing system which uses non-vision based sensors to discreetly discern occupant position, activity, and user context in the home environment. This paper describes initial experimentation to determine optimal sensor placement for detection of specific activities. Three essential reaching motions typical of individuals lying in bed are examined. Action data was collected using IR motion sensors positioned at an array of vantage points on a virtual sphere surrounding the motion space. Histograms of Oriented Gradients (HOGs) are used to extract motion representations from Self-Similarity Matrices (SSMs) for each action. It is shown that mean HOGs can serve as exemplars and allow us to choose a preferred sensor position for each motion type. Using these exemplars, motions can be classified with a promising level of accuracy (> 75% for our data set), with improved outcomes observed through aggregation of sensor readings. Key–Words: activities of daily living, activity recognition, aging in place, architectural robotics motion sensing, sensor placement

1 1.1

Introduction

The loss of dexterity in the hand is a key factor affecting performance of certain ADLs including precision and grip tasks. Further, decreased manual dexterity is often coupled with pathological conditions such as osteoporosis and Parkinson’s disease, making this an interesting area of research [3]. For the current study a selection of representative tasks were examined: reaching (e.g., for a cup), grabbing (e.g., the bed rail) and pressing (e.g., a call button). These are all hand functions that have been used in studies exploring hand and finger mobility in aging adults [11], [17]. Architects and environmental designers attempt to accommodate those with reduced mobility through the use of Universal Design Principles (UDP) and smart home technologies. UDPs ensure that the environment does not confound an individual s efforts to complete tasks. UDPs make the environment safe, clean, legible and barrier-free [10],[13],[18] for all occupants, regardless of ability. These strategies facilitate resident mobility and independence. However,

Motivation

Mobility decreases as we age. Such reductions, whether gradual or sudden, may ultimately impair one’s ability to perform essential Activities of Daily Living (ADLs). Often, it is vital to ensure that older adults have the ability to perform ADLs independently [20]. For those wishing to age in place, a reduced capacity to conduct ADLs may be associated with diminished quality of life, decreased independence, the need for higher caregiver burden, or even institutionalization [5]. This paper discusses our initial efforts to sense and characterize user activity through active sensor placement. Successful application of these results will form the basis of a comprehensive system of adaptive robotic and architectural components to support independent living for individuals whose capabilities and needs are changing. ISBN: 978-960-474-276-9

231

Recent Researches in Communications, Automation, Signal Processing, Nanotechnology, Astronomy and Nuclear Physics

1.2

the majority of current implementations are static and of low fidelity, with accommodation solely the result of the form and placement of furniture and fixtures. Smart homes extend awareness, increase control over systems, and enhance the security, healthfulness and safety of the environment through sensing, inference, communication technologies, decision-making algorithms and appliance control [4],[9],[12],[16]. These efforts are mostly focused on building systems because the real-time processing of occupant activity is expensive and often relies on technologies considered intrusive of people s privacy. As a result, most occupant sensing in smart homes remains low fidelity. Smart home technology would benefit greatly from the capacity to sense motion with high fidelity. In particular, this would facilitate development of robotic components to actively support user need while preserving privacy. Such robotic components would allow for learned inference of user action and intention through persistent monitoring. Further, degradation in the abilities of the user could be tracked over time so as to adaptively inform the robot’s assistive action plans. For example, a robotic bedside table might keep inventory of personal items, provide automated reminders, control lighting and music, or provide critical care alerts to medical personnel. With knowledge of typical user motion patterns the robot could respond to gestured commands or detect infrequent needs such as assistance with reach, weight transference, or ambulation. A significant body of work exists in the area of automated recognition of human activities. Clearly, detection of user activity and inference of user context and intention are central to action planning by system software and robotic components. Despite the development of many promising techniques, the goal of robust activity recognition remains elusive. Due to such factors as changes in lighting and camera position, variations in anthropometry, and speed of execution, the problem remains largely unsolved. Frequently, activity recognition research revolves around the use of visual images to perform a kinematic analysis based on the structural geometry of human participants (limb dimensions, joint angles, etc.) [2]. However, it has been seen that many users find the presence of cameras to be intrusive in certain situations [1],[8]. Hence, this paper explores an approach which utilizes a non-structural method to create a representation of human motion based on self-similarity. Also, the claim that motion classification is invariant to changes in viewing angle [15],[19] is examined. Analysis was performed using IR motion sensors for a non-vision based approach. These sensors were selected to allow for greater user privacy while also being insensitive to variations in lighting. ISBN: 978-960-474-276-9

Related Work

Activity recognition can be roughly divided into two main components. First, a representation of the motion in some compact form is computed. The representation is then subjected to a pattern classification technique so that it may be assigned to one of a known gallery of activities. Representations may be further broken down into feature-based (parametric) versus holistic (nonparametric) forms. Parametric representations extract features related to the physical geometry and kinematics of the actor. Classification may take advantage of known characteristics of motion to improve accuracy. Holistic representations utilize image statistics of the motion performed in (x, y, t) space. Hence, with regard to the frequently employed visual images of motion, these can also be characterized as pixel-based representations [2]. In this paper, a holistic representation is used. However, our analysis is performed with non-visual motion data. Histograms of Oriented Gradients (HOGs) are used in [7] to generate regional descriptors of still images for human detection. Using this technique, object appearance may be characterized by the distribution of image intensity gradients without specific information of where the gradients or image edges occur. For this method, an image is divided into cells. A normalized histogram of gradient directions for pixels in each cell is constructed. Concatenating the histograms over the collection of cells forms an image representation which may be used for classification. Gradients are computed using the Prewitt operator which has been seen to yield favorable results over other kernels. Smoothing was shown to decrease discriminative performance. It has been shown that periodic motions such as walking or running may be recognizable solely from the movement of lighted feature points placed on the actor’s body [14]. This phenomenon is exploited by [2],[6] through the concept of self-similarity. In this approach, the locations of features in an image sequence are seen to generate a repeating pattern from which a motion descriptor may be generated. The set of features is tracked through the course of an image sequence. The summed distances of features between image pairs is computed. Performing this summation exhaustively across all image pairs forms a SelfSimilarity Matrix (SSM). The main diagonal of the SSM is composed of zeros (since the entries represent image distances from themselves). Diagonals which are parallel to the main diagonal represent periodicity of motion, while diagonals which are perpendicular to the main diagonal represent symmetrical patterns in the observed motion (e.g. key body poses while walking). With the SSM, the periodic motion of a mov232

Recent Researches in Communications, Automation, Signal Processing, Nanotechnology, Astronomy and Nuclear Physics

AMN23112 analog IR motion sensors. For repeatability, the motions were performed by a PUMA robot acting as a stand-in for the human arm. Sensor data was recorded from an array of points over the surface of a virtual sphere surrounding the workspace of the robot. A mock up of the scenario as it might exist a hospital environment is shown in Figure 2.

ing person is observable across image sequences as in Figure 1.

(a)

(b)

Figure 1: (a) Walking action and (b) the associated SSM showing periodicity [6]. Rao, et al. [19] explore the importance of dynamic instants (extrema of changes in velocity) as keys to human perception of motion. Further, they argue that such instants are view-invariant; they are preserved under 2D projection regardless of vantage point. This work is extended in [15] where motion descriptors are constructed from the HOG of the SSM. The authors argue that SSMs are approximately stable in appearance at varying camera angles (view invariant), and thus, their HOGs may be used as robust classifiers. The intuition of this approach centers around the idea that image features in a periodic sequence will attain a similar spatial orientation regardless of placement of the camera. None of the approaches described above, however, examines the possibility that the sensors need not be cameras. The experimentation described in section 2 makes use of the approach of [15] to classify selected motions. Our results given in section 3 show that the approach yields promising classification accuracy with motion sensor data and that outcomes improve as sensor inputs are aggregated.

2

Figure 2: Hospital room scenario with continuum sensor surface. A rotating arc fixture was used to sweep the surface of the sphere in order to facilitate precise positioning of the sensor (Figure 3).

Method

This section describes the laboratory fixture used to collect motion sensor data as well as the analysis technique used to generate descriptors and classify motions. Data was collected for three activities which were chosen for their fundamental importance to a person lying in bed, as in a healthcare setting. These activities included:

Figure 3: PUMA robot with fixture for spherical sensor positioning. Sensor vantage points (r, θ, φ) were selected such that r = 30”, θ ∈ {0◦ , 15◦ , 30◦ , . . . , 180◦ } and φ ∈ {0◦ , 15◦ , 30◦ , . . . , 255◦ }. For our purposes, the angle θ is measured downward from 0◦ at the vertical axis. This array of 234 points is depicted in Figure 4. Sensor readings were collected using a motion sensor positioned at each of these points for all candidate motions.

1. (Reach) Bringing a cup to the mouth. 2. (Press) Pressing a nurse call button. 3. (Grab) Grabbing the bed rail.

Descriptor Calculation. The Self Similarity Matrix S(I) for each sequence of motion sensor readings I = {I1 , I2 , . . . , IN } is calculated using (1)

Data Collection. Data sets were collected at 17 Hz over seven second intervals using Panasonic ISBN: 978-960-474-276-9

233

Recent Researches in Communications, Automation, Signal Processing, Nanotechnology, Astronomy and Nuclear Physics

Figure 5: HOG descriptor format [15]. points were used as test data. Each test data HOG was compared with each of the exemplars and classified by the exemplar to which it was nearest according to (3)

Figure 4: Sensor vantage points.

⎡ ⎢ ⎢ S(I) = ⎢ ⎢ ⎣

0 d21 .. .

d12 0 .. .

d13 d23 .. .

. . . d1N . . . d2N .. .

d N 1 dN 2 dN 3 . . .

0

⎤ ⎥ ⎥ ⎥ ⎥ ⎦

j i = arg min DE (Htest , Htrain ) j

(1)

j is one of j = 3 where Htest is a test data point, Htrain candidate action classes, DE is the distance to the exemplar using the Frobenius norm, and i is the classification. The percentages of data points used as training data were varied from a single vantage point up to 50% at 10% increments. In this way, it was possible to determine whether a descriptor from any given vantage point resembled that of its class exemplar so as to validate/invalidate the claim that the stability of the SSM allows for robust view invariance.

where elements of S(I) represent the Euclidean distance measure between pairs of sensor readings in I such that di,j = ||Ii − Ij ||2

(2)

Assumptions implicit in the distance calculation of (2) are that, for a given sequence, the sensor does not move and that the background does not change. Hence, any change in output from the motion sensor represents the action being observed. Thus, the total movement of the actor can be represented as the difference between the sensor reading pairs. A local (overlapping) HOG descriptor is calculated for each point i = 1 . . . N on the main diagonal of S(I) where N = 116. The descriptor consists of a histogram of m = 8 gradient direction bins for each of j = 11 log-polar cells as shown in Figure 5. Gradients are computed using the Prewitt operator as suggested in [7]. Bin entries are weighted by the associated gradient magnitudes. Since S(I) is symmetric, only the entries above the diagonal are included in the descriptor computation. Descriptors for all points are concatenated to form a composite descriptor H for the action sequence. Hence, for a given motion’s data set, H is an (8 × 11) × 116 = 8 × 1276 matrix.

3

Experimental Results

SSMs for motion sensor data readings taken from the vantage points used above are given in Figure 6. It can be seen that, although there is a nominal resemblance between SSMs for a given class, the similarity does not allow for obvious discrimination. Classification results for the motion sensor readings are given by Table 1. Results are poor when only a single view is used to generate exemplars - no better than random guess for the reach and press motions. The grab motion shows greatest accuracy, owing to its inherent dissimilarity from the other motion classes. Results improve as the percentage of data used to train the classifier is increased (reaching 75%), although not to level that can be considered reliable for our proposed application. Clearly, motion sensor data does not carry the richness of information found in video data. However, results with motion sensor data are promising. To increase the amount of information available for activity classification through motion sensing, a surface contour encompassing an array of sensor vantage points is

Action Classification. Class exemplars for each of the three candidate actions were calculated as the mean HOGs for a specified percentage of the available data. These HOGs were selected randomly and constituted the training data. The remainder of the data ISBN: 978-960-474-276-9

(3)

234

Recent Researches in Communications, Automation, Signal Processing, Nanotechnology, Astronomy and Nuclear Physics

Table 2: Motion sensor array classification results.

(a)

(b)

(d)

(e)

Array Size 1×1 1×2 2×2 3×3

(c)

Classification Accuracy (%) Reach Press Grab 72.39 83.46 91.62 97.74

73.30 79.28 89.95 95.88

98.99 98.96 100.00 100.00

(f)

to calculate exemplars are relatively small compared to the number of test data points, the effect is considered to be minimal. (g)

(h)

(i)

4

In this paper, the use of SSMs to generate HOG classifiers for activity recognition has been explored. Further, we have used non-video motion data to see if a holistic activity model might be useful in privacy sensitive applications. It has been shown that motion sensor readings of basic actions can be classified by this method with a promising degree of accuracy. Where single sensor inputs are used as class exemplars, classification accuracy is poor and highly sensitive to vantage point. Where multiple descriptors are averaged to produce exemplars, classification improves but is still subject to the choice of vantage point for best outcomes. We interpret these findings as supportive of view invariance in that the appearance of SSMs for a given class is stable enough over the range of vantage points to collectively form a useful discriminant. Experimentation with single sensor inputs (test data) also yielded poor results. However, when multiple sensor views are combined into a single average reading for a small contour surrounding a vantage point, results improve significantly. Hence, the use of motion sensor data for the purpose of activity recognition appears to be a viable area for exploration. Future work is foreseen in the addition of new motion classes with more complex kinematics and the use of HOGs to predict optimal vantage points for hard-torecognize action classes. Because the classification technique used computes a distance to HOG exemplars, the vantage point descriptor closest to an exemplar can be considered optimal for that action. This prospect gives rise to the notion that relocation of sensors to an optimal view may yield improved recognition. And, since human motions begin and end at arbitrary times and occur at varying speeds, we hypothesize a moving

Figure 6: SSMs for motion sensor input taken from orthogonal views: reach (a,b,c), press (d,e,f) and grab (g,h,i). envisioned (as depicted in Figure 2). Data from such a contour may be emulated by averaging sensor inputs over regional subsets of the virtual sphere. Table 2 shows several scenarios for such arrays. The table assumes 20% of data points were used to calculate class exemplars. Using this scheme, results significantly exceed those for single sensor inputs for arrays of 2 × 2 and larger while requiring substantially fewer data points for training. It can be assumed, however, that the mean HOG calculated for any given array will occasionally encompass sensor inputs used to calculate the exemplars themselves since exemplars are calculated over a random set of data points. Since both the arrays and the percentage of data points used

Table 1: Motion sensor classification results. Training Points 1 10% 20% 30% 40% 50%

Classification Accuracy (%) Reach Press Grab 37.50 65.73 72.39 75.00 75.04 74.87

ISBN: 978-960-474-276-9

35.82 65.90 73.30 75.40 74.18 76.37

Conclusions and Future Work

88.60 97.16 98.99 98.84 99.08 98.93

235

Recent Researches in Communications, Automation, Signal Processing, Nanotechnology, Astronomy and Nuclear Physics

temporal window of action which might be sensed and classified continuously. Active sensor positioning to best observe a scene as it unfolds is seen as the problem formulation for our ongoing work. [9] Acknowledgements We wish to thank Megan Milam for her assistance in collecting our data set. This research was supported in part by the U.S. National Science Foundation under grants IIS-0904116 and IIS-1017007.

[10]

[11] References: [1] S. Beach, R. Schulz, J. Downs, J. Matthews, B. Barron, and K. Seelman. Disability, age, and informational privacy attitudes in quality of life technology applications: Results from a national Web survey. ACM Transactions on Accessible Computing (TACCESS), 2(1):1–21, 2009. [2] C. BenAbdelkader, R.G. Cutler, and L.S. Davis. Gait recognition using image self-similarity. EURASIP Journal on Applied Signal Processing, 2004:572–585, 2004. [3] E. Carmeli, H. Patish, and R. Coleman. The aging hand. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences, 58(2):M146, 2003. [4] D.J. Cook, M. Youngblood, E. Heierman, K. Gopalratnam, S. Rao, A. Litvin, and F. Khawaja. MavHome: An Agent-Based Smart Home. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications, pages 521–524. Citeseer, 2003. [5] K.E. Covinsky, R.M. Palmer, R.H. Fortinsky, S.R. Counsell, A.L. Stewart, D. Kresevic, C.J. Burant, and C.S. Landefeld. Loss of Independence in Activities of Daily Living in Older Adults Hospitalized with Medical Illnesses: Increased Vulnerability with Age. Journal of the American Geriatrics Society, 51(4):451– 458, 2003. [6] R. Cutler and L.S. Davis. Robust real-time periodic motion detection, analysis, and applications. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8):781–796, 2002. [7] N. Dalal, B. Triggs, I. Rhone-Alps, and F. Montbonnot. Histograms of Oriented Gradients for Human Detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, pages 886–893, 2005. [8] G. Demeris, B.K. Hensel, M. Skubic, and M. Rantz. Senior residents’ perceived need of ISBN: 978-960-474-276-9

[12]

[13]

[14]

[15]

[16]

[17]

[18] [19]

[20]

236

and preferences for ”smart home” sensor technologies. International Journal of Technology Assessment in Health Care, pages 120–124, 2008. B. DeRuyter and E. Pelgrim. Ambient AssistedLiving Research in CareLab. interactions, XIV(4):30–34, 2007. A. Friedman. The Adaptable House: Designing Homes for Change. McGraw-Hill Professional, 2002. M.E. Hackel, G.A. Wolfe, S.M. Bang, and J.S. Canfield. Changes in hand function in the aging adult as determined by the Jebsen Test of Hand Function. Physical Therapy, 72(5):373, 1992. S.S. Intille, K. Larson, and E.M. Tapia. Designing and Evaluating Technology for Independent Aging in the Home. In International Conference on Aging, Disability and Independence, 2003. S. Iwarsson and A. Stahl. Accessibility, Usability and Universal Design - Positioning and Definition of Concepts Describing PersonEnvironment Relationships. Disability & Rehabilitation, 25(2):57–66, 2003. G. Johansson. Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14(2):201–211, 1973. I. Junejo, E. Dexter, I. Laptev, and P. P´erez. Cross-View Action Recognition from Temporal Self-Similarities. European Conference on Computer Vision–ECCV 2008, pages 293–306, 2008. C.D. Kidd, R. Orr, G.D. Abowd, C.G. Atkeson, I.A. Essa, B. MacIntyre, E. Mynatt, T.E. Starner, W. Newstetter, et al. The Aware Home: A Living Laboratory for Ubiquitous Computing Research. Lecture Notes in Computer Science, pages 191– 198, 1999. H.B. Olafsdottir, V.M. Zatsiorsky, and M.L. Latash. The effects of strength training on finger strength and hand dexterity in healthy elderly individuals. Journal of Applied Physiology, 105(4):1166, 2008. W.F.E. Preiser and E. Ostroff. Universal Design Handbook. McGraw-Hill Professional, 2001. C. Rao, A. Yilmaz, and M. Shah. ViewInvariant Representation and Recognition of Actions. International Journal of Computer Vision, 50(2):203–226, 2002. J.M. Wiener, R.J. Hanley, R. Clark, and J.F. Van Nostrand. Measuring the Activities of Daily Living: Comparisons Across National Surveys. The Journal of Gerontology, 45(6):S229, 1990.

Suggest Documents