multi-object tracking and identity management in wireless sensor ...

6 downloads 9385 Views 2MB Size Report
The multi-object tracking problem is hard because sensor data has to be associated with specific objects. ... We call this issue an identity swapping. To fix the ... Research Center, formerly known as Xerox PARC) in 2000. Since then, I have ...
MULTI-OBJECT TRACKING AND IDENTITY MANAGEMENT IN WIRELESS SENSOR NETWORKS

A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Jaewon Shin December 2004

c Copyright by Jaewon Shin 2005 ° All Rights Reserved

ii

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Leonidas J. Guibas Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Abbas El Gamal I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Sebastian Thrun I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Feng Zhao

iii

Approved for the University Committee on Graduate Studies.

iv

Abstract The multi-object tracking problem is hard because sensor data has to be associated with specific objects. This is the well known data association problem, which is known to be NP-hard. As a result, all extant algorithms are heuristics and there always exists a finite probability that the tracking system will be confused about the identities of objects after a data association algorithm is applied. We call this issue an identity swapping. To fix the problem, we propose a mathematical framework where the goal is to augment sub-optimal data association algorithms by maintaining a probability distribution over the set of possible identities, thus providing an identity management framework. In identity management, the probability distribution on identities is updated by two events – object mixing caused by objects being in proximity and local evidence incorporation from sensor nodes. Maintaining the full distribution, however, can be computationally infeasible and is often unnecessary – considering that in practice the information provided by this distribution is accessed only in certain stylized ways, such as asking for the identity of a given track, or the track with a given identity. Exploiting this observation, we propose two approximate representations called the marginal belief matrix and the information matrix, and respectively introduce two update operations for both approximations when mixing and local evidence events happen. We analyze and compare the computational complexities of the proposed approximations, and show that these methods provide efficient approximations and exhibit a tradeoff between the two update operations. For their distributed implementation in a wireless sensor network, we propose an agent-based architecture and demonstrate its feasibility by a discrete event-driven simulator we designed for wireless sensor network applications. v

Based on experimental results from a real-time people tracking system, we conclude that the proposed methods can efficiently fix artifacts of the standard sub-optimal data association algorithms.

vi

Acknowledgements First of all, I would like to thank my advisor, Leo Guibas. With his amazing mathematical intuition and knowledge on broad areas of computer science and mathematics, Leo has always helped me at the right moment find the right tools for problems at hand, which I would have approached using hacks and easy solutions otherwise. He has also taught me the importance of presenting mathematical/engineering ideas through numerous talks at our group meetings. Without his encouragement and guidance, this dissertation could not have been possible. I also would like to thank my dissertation committee members – Abbas El Gamal, Sebastian Thrun and Feng Zhao. Abbas has such a broad area of expertise ranging from sensor hardwares to information theory and it has been a great pleasure to collaborate with him and his students on SNRC camera network project. Sebastian’s amazing insight on probabilistic systems has provided the key ideas in developing information matrix based approach. It was Feng, who introduced to me the exciting area of wireless sensor networks, when he had me as a intern at PARC (Palo Alto Research Center, formerly known as Xerox PARC) in 2000. Since then, I have been very fortunate to collaborate with him and other PARC folks on many different projects as a research intern and consultant, and learned a lot through the unique experience. It was his encouragement and insight that helped me define my thesis topic, which was grown out of a summer project at PARC. Next I would like to thank my colleagues at Stanford. Prof. Persi Diaconis kindly spent his precious time to introduce to me the wonderful world of representation theory for probability and statistics, which eventually helped me tie a few loose ends in my dissertation. I also thank Guibas lab members for all the fun we had together vii

and things that I learned from them. Danny and Anthony have been great officemates, friends and colleagues since we moved into the Clark building. Jie and Qing have been always there for discussions. Afra, An, Daniel and Niloy helped me a great deal in revising my defense talk and this dissertation. I will really miss geometry seminars, jokes, the great 2003 thanksgiving party and dinners with all of you guys. Friends of mine at Stanford have been a great source of emotional support during this period. I will not attempt to mention all the names, but want to acknowledge Kiyoung, Wonjoon, Donghyun, Youngjune, Nahmsuk, Sangeun, Stanford EE Yonsei alumni (Akaraka!!!) and friends from Daesung Presbyterian Church and Cornerstone Community Church. God bless all of them. Finally, I would like to thank my mom and dad for their love, sacrifice and encouragement. They have been the greatest teachers and role models in my life, and have always supported me with their endless love. This dissertation is a result of their constant faith and love in me over the years. My beloved wife, Jungmin, has been on my side through all these years and nothing could have been possible without her love and support, and I also thank my parents-in-law for their love and encouragement for Jungmin and me. My cute little daughter, Jamie, has arrived in June 2003, and been such a joy to my life ever since. Lastly, I dedicate this Ph.D. dissertation to the memory of my grandfather, who, I am sure, must be proud of me looking down from above.

viii

Contents Abstract

v

Acknowledgements

vii

1 Introduction

1

1.1

Wireless Sensor Networks (WSN) . . . . . . . . . . . . . . . . . . . .

1

1.2

Multi-Object Tracking Problem . . . . . . . . . . . . . . . . . . . . .

3

1.3

Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

1.4

Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.5

Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2 Object Localization

13

2.1

A Tracking Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.2

IDSQ: Information-Driven Sensor Querying . . . . . . . . . . . . . . .

17

2.3

Sensor Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.4

Information Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.5

Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.5.1

Localizing a Stationary Object . . . . . . . . . . . . . . . . . .

28

2.5.2

Tracking a Moving Object with Non-Gaussian Distribution . .

29

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

2.6.1

Representation of Belief State . . . . . . . . . . . . . . . . . .

38

2.6.2

Sequential vs. Concurrent Information Exchange . . . . . . .

38

2.6.3

Query Types . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

2.6.4

Bias in Sensor Selection . . . . . . . . . . . . . . . . . . . . .

40

2.6

ix

2.6.5

Tracking Robustness . . . . . . . . . . . . . . . . . . . . . . .

3 Identity Management Framework 3.1

41

Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

3.1.1

Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.1.2

Local Evidence . . . . . . . . . . . . . . . . . . . . . . . . . .

48

3.1.3

Identity Management as a discrete Bayesian filtering . . . . .

50

4 Marginal Belief Matrix 4.1

40

. . . . . . . . . . . . . . . . . . . . . . . .

52

4.1.1

Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

4.1.2

Relation between B and p(X) . . . . . . . . . . . . . . . . . .

54

4.2

Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

4.3

Local Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.3.1

Ideal Solution . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.3.2

Local Evidence Incorporation as Optimization . . . . . . . . .

59

4.3.3

Solution 1: Quadratic Programming . . . . . . . . . . . . . . .

60

4.3.4

Solution 2: The Sinkhorn Algorithm . . . . . . . . . . . . . .

62

Distributed Implementation . . . . . . . . . . . . . . . . . . . . . . .

65

4.4.1

Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

4.4.2

Agent-based Architecture . . . . . . . . . . . . . . . . . . . .

65

4.5

Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

4.6

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

4.4

Marginal Belief Matrix B

52

5 Information Matrix

75

5.1

Information Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

5.2

Local Evidence for Information Matrix . . . . . . . . . . . . . . . . .

78

5.3

Mixing for Information Matrix . . . . . . . . . . . . . . . . . . . . . .

78

5.4

Inference for Information Matrix . . . . . . . . . . . . . . . . . . . . .

81

5.4.1

Metropolis Algorithm . . . . . . . . . . . . . . . . . . . . . . .

82

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

5.5

x

6 Comparison of the Two Approximations

86

6.1

Storage and Computation . . . . . . . . . . . . . . . . . . . . . . . .

86

6.2

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

6.2.1

Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

6.2.2

Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . .

90

6.2.3

Experimental Results . . . . . . . . . . . . . . . . . . . . . . .

92

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

6.3

7 Group Management Protocol

97

7.1

Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

7.2

Distributed Collaboration Graph (DCG) . . . . . . . . . . . . . . . .

98

7.2.1

Acquaintance Group . . . . . . . . . . . . . . . . . . . . . . .

98

7.2.2

Construction of the Distributed Collaboration Graph . . . . .

99

7.3

Maintaining Communication Tree on DCG . . . . . . . . . . . . . . . 102 7.3.1

Mixing Event . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.3.2

Crossing Event . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7.3.3

Relay Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7.4

Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.5

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

8 Conclusions and Future Work

111

8.1

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.2

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Bibliography

114

xi

List of Tables 2.1

Statistics on tracker performance for the different information utilities obtained from 80 runs per information utility. . . . . . . . . . . . . .

37

6.1

Comparison of the two approximations of joint identity distribution p(X) 87

7.1

A routing table in a node of a collaboration graph . . . . . . . . . . . 100

xii

List of Figures 1.1

Wireless sensor nodes with different sizes and capabilities . . . . . . .

1.2

Wireless sensor nodes measure temperature in a vineyard, in British

2

Columbia, Canada. Each sensor records temperature data periodically and reports back to a central server. Information from sensors is used to control the amount of water, cut the use of pesticide and determine harvest time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3

3

Wireless sensors measure the amount of chemical plume in their sensing range. This application is an example of how local knowledge can be used to update global state information in WSN applications. . . . .

1.4

Locations and identities of players provide contextual keys to understand the status of a soccer game even without seeing the ball position.

1.5

4 4

An example of how p(Sk |Zk ) is updated in a Bayesian filtering framework: The left figure is a posterior p(Sk−1 |Zk−1 ) at time k − 1. The middle figure is a predicted posterior p(Sk |Zk−1 ) at time k. The right figure is a posterior p(Sk |Zk ) after incorporating a measurement at time k. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.6

6

k

Data association problem: Given N states S and N measurements Z k , there are N ! ways to associate the measurements with the states.

xiii

6

1.7

Identity swapping example: Two objects moved along straight lines and a sub-optimal data association algorithm can conclude incorrectly that the bottom object is the car and the upper one is the bus. Using local evidence at a bottom sensor node, however, we can fix this problem. This is the main idea behind identity management on how to use local information to update global identity information. . . . . . . . .

7

1.8

How MHT works: The ‘X’ marks denote new measurements. . . . . .

9

2.1

A toy WSN application: A 1-D sensor network tracks two objects moving toward the same direction as the arrows. Scenario 1 uses less energy to localize the objects, while Scenario 2 uses more energy to localize them with greater accuracy. In determining “Who’s ahead?” from the two scenarios, however, there is practically no difference. . .

2.2

14

A tracking scenario illustrating how the decision of sensor collaboration is accomplished using a measure of information utility as well as a measure of cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3

16

Sensor querying and data routing by optimizing an objective function of information gain and communication cost, whose iso-contours are shown as the set of concentric ellipses. This figure illustrates how a user query on the location of the object is being routed towards the maximum of the objective function – the center of the co-centric ellipses – along the routing path. The circled dots are the sensors being queried for data along the path. T represents the object position, with its covariance shown as a small red ellipse. ? denotes the position of the query origin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.4

20

Sensor selection based on information gain of individual sensor contributions. The information gain is measured by the reduction in the error ellipsoid. In the figure, reduction along the longest axis of the error ellipsoid produces a larger improvement in reducing uncertainty. Sensor placement geometry and sensing modality can be used to determine the potential information gain by the two sensors S1 and S2.

xiv

21

2.5

The expected likelihood function for each sensor (i or j) is a weighted sum of the marginal likelihood function conditioned at each grid point in the predicted belief distribution. The expected posterior can then be computed from this likelihood function. . . . . . . . . . . . . . . .

2.6

26

Example of a likelihood function of an acoustic sensor: The likelihood is symmetric around the position of the sensor and is peaked along a circle, which is the most likely position for the sound source. . . . . .

2.7

28

Sensor selection based on the nearest neighbor method. The estimation task here is to localize a stationary object labeled ‘*’ at the center. (a) Select the nearest sensor ; (b) Incorporate the new measurement from the selected sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.8

29

Sensor selection based on the Mahalanobis distance measure of information utility. The localization problem is the same as that in Figure 2.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.9

30

Tracking a moving object using the information-driven approach. A object is moving from the bottom of the field to the top. As the object moves through the field of sensors denoted by the dots, a subset of the sensors are activated to carry the belief state. Each new sensor is selected according to an information utility measure on the expected posterior distribution of the new state. (a) Current belief distribution at time t. (b) New posterior distribution at time t + 1, after incorporating a measurement from the selected sensor. . . . . . . . . . . . . .

31

2.10 Tracking error: Nearest neighborhood algorithm . . . . . . . . . . . .

32

2.11 Tracking error: Nearest neighborhood with the heuristic . . . . . . .

32

2.12 Mahalanobis distance . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

2.13 Mahalanobis distance with the heuristic

. . . . . . . . . . . . . . . .

33

2.14 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

2.15 Entropy with the heuristic . . . . . . . . . . . . . . . . . . . . . . . .

34

2.16 KL-divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.17 KL-divergence with the heuristic

35

xv

. . . . . . . . . . . . . . . . . . . .

2.18 Snapshot of a typical simulation using the KL-divergence as the information utility. The yellow dots are sensor nodes and the gray grids are current estimates on the object position. The pink-square-enclosed nodes is the current leader, and the green-square-enclosed nodes are its current neighbors. In this example, sensor density is 100, and the node communication range is set to 30. . . . . . . . . . . . . . . . . .

36

2.19 Tracking results for different choices of the utility functions. The (red) straight lines are the actual tracks, and the (blue-gray) curves are the estimated ones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

2.20 Representation of belief: parametric (e.g. Gaussian), non-parametric (e.g., grid samples, particle samples) . . . . . . . . . . . . . . . . . .

39

3.1

Quiz: Five different vehicles in five different garages . . . . . . . . . .

41

3.2

An example of ‘swapping order’ from the host: Two vehicles in garages 2 and 3 could be swapped with some probability. . . . . . . . . . . .

3.3

The host peeks at one of the garage doors and tell what is inside, although he could be lying. . . . . . . . . . . . . . . . . . . . . . . . .

3.4

42 42

The joint identity distribution p(X) evolves through two events – mixing and local evidence . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.5

A diagram illustrating a mixing event between ith and jth object. . . .

46

3.6

Mixing as a convolution – q(X) = p ? m (X): x and y axes represent permutations and probabilities, respectively. . . . . . . . . . . . . . .

47

3.7

Identity sensing model used in this thesis . . . . . . . . . . . . . . . .

49

3.8

Continuous vs. Discrete Bayesian Filtering . . . . . . . . . . . . . . .

50

4.1

An example of how a marginal belief matrix B is updated when three objects are moving in a WSN. Both mixing events have mixing probabilities of 0.5. Note that only two columns of B are updated after each

4.2

mixing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

Proof idea for Property 2 . . . . . . . . . . . . . . . . . . . . . . . . .

58

xvi

4.3

Convergence of the Sinkhorn algorithm with three different matrix sizes: 10 × 10 100 × 100 1000 × 1000. Each matrix is randomly generated. For a practical error ² = 10−5 , 5 or 6 iterations seem to be enough regardless of the size of the matrices. . . . . . . . . . . . . . .

4.4

Performance comparison of the two distance measures for local evidence incorporation in the marginal belief matrix representation . . .

4.5

63 64

A wireless sensor network with communication links and sensing range: Each sensor can sense locations and identities of objects within its sensing range and exchange messages with nodes within its communication range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.6

Basic idea for distributed implementation: Each agent carries a column of the belief matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.7

66

Flow chart of the distributed algorithm that each node runs to implement the marginal belief matrix approach in wireless sensor networks

4.8

65

68

A snapshot of the simulator: Four objects are moving in a WSN – three of them are depicted as (yellow) dots and one of them is drawn as a tank. The objects follow (gray) straight lines and four agents, denoted as solid squares in four different colors, are assigned to track the four objects. The bar graphs show the marginal distribution (a column of the belief matrix) maintained by the agents. Initially, all the agents know exactly the identities of objects they are tracking. . . . . . . . .

4.9

70

A snapshot of the simulator after four mixings – the bar graphs on the right show that the marginal probabilities are mixed and the agents are no longer certain about the identities of what they are tracking. .

71

4.10 A snapshot of the simulator after local evidence incorporation – two pieces of local evidence on the ‘tank’ object are sensed and all the marginal probabilities are re-normalized using the Sinkhorn algorithm at the cost of group communication. The agents are almost sure about the identities of their objects after the re-normalization. . . . . . . . .

xvii

72

4.11 An example of how uncertainties of the marginal distributions are affected by the two events during the simulation run. Object numbers are the same as the track numbers in the previous figure. It is apparent that mixing events increase uncertainties, while local evidence decreases them. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

5.1

Metropolis random walk on an associahedron for N = 3 . . . . . . . .

84

6.1

Comparison of the three approaches for N = 3: Marginal belief matrix, Information matrix with exact inference and Information matrix with approximate inference . . . . . . . . . . . . . . . . . . . . . . . . . .

89

6.2

Comparison for N = 4 . . . . . . . . . . . . . . . . . . . . . . . . . .

90

6.3

Comparison for N = 5 . . . . . . . . . . . . . . . . . . . . . . . . . .

91

6.4

Comparison for N = 6 . . . . . . . . . . . . . . . . . . . . . . . . . .

92

6.5

Experimental setup in the Stanford AI lab . . . . . . . . . . . . . . .

93

6.6

Example of two ground-truth tracks . . . . . . . . . . . . . . . . . . .

94

6.7

Typical errors by tracking systems: Two tracks at the top are correct ones, although our tracking system has found the three tracks at the bottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.8

Experiment scenario: Three people walking in the lab over 87.79 second period. Their individual tracks are shown in different colors. . . . . .

6.9

95 95

Uncertainties of object identities after many mixing and evidence incorporation events from the data in Figure 6.8 . . . . . . . . . . . . .

96

7.1

A simple agent interaction for multi-object tracking application . . .

99

7.2

Idea of using agent trajectories for group member discovery and group communication: The agent trajectories after the mixing can be used as member discovery and group communication in this example. . . . 100

7.3

Information stored at relay nodes (left) and junction nodes (right) . . 101

7.4

Mobile agent paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.5

Distributed collaboration graphs for two identities from Figure 7.4 . . 102

xviii

7.6

A multi-object tracking scenario. In the above figure, objects are labeled a, b, c, d, e. The arrows indicate the trajectories of objects. Each square represents a mixing event (see Section 7.3.1 below). . . . . . . 103

7.7

Mixing Event: The updated tree does not contain a cycle. . . . . . . 105

7.8

The various communication trees for object a as time evolves. Edges that are declared inactive are marked with crosses.

7.9

. . . . . . . . . . 106

Crossing Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.10 A typical scenario with five agents . . . . . . . . . . . . . . . . . . . . 108 7.11 Group communication performance comparison: Flooding on a whole DCG and multicasting on its spanning tree . . . . . . . . . . . . . . . 109 7.12 A simple heuristic to deal node/link failures and its effect on the probability of communication failure . . . . . . . . . . . . . . . . . . . . . 110

xix

Chapter 1 Introduction In this dissertation, we study the problem of tracking multiple moving objects in wireless sensor networks – the goal is to design a set of distributed algorithms that compute the locations and identities of moving object using multiple sensor nodes wirelessly connected to one another. In this chapter, we introduce the definition of wireless sensor networks and explain the challenges of designing algorithms for such systems. We then summarize the related work and give an overview of the rest of this thesis.

1.1

Wireless Sensor Networks (WSN)

A wireless sensor network (WSN) is a large scale distributed systems of small, untethered, low-power nodes capable of sensing, processing and communication. WSNs are unique in their ability to monitor phenomena widely distributed in space and time, such as microclimate variations in a forest, earthquake vibration monitoring in buildings, machine control and diagnosis in factories, traffic monitoring in highways, etc. Figure 1.1 shows some of the sensor nodes available on the market as of publication and Figure 1.2 shows an example of a WSN, where wireless sensor nodes are used to measure temperatures at various locations of a vineyard. In many of these scenarios it is infeasible to have the WSN simply collect all potentially relevant data. Instead, it is far more efficient to be able to query the 1

2

CHAPTER 1. INTRODUCTION

Motes

RFID tag

SPOT watch

PDA

Sensoria node

Figure 1.1: Wireless sensor nodes with different sizes and capabilities network as the need for particular kinds of information arises. Such queries can often be formulated as distributed inference problems, where the goal is to estimate a global state of interest, X, given local pieces of evidence provided by the sensor nodes. Furthermore, this inference should be probabilistic in nature, due to the inherent noise in sensor readings and the uncertainty associated with physical phenomena. In this setting, it is very important to capture the information structure of the problem and the dependencies between both the problem’s local and global variables. For example, say we are tracking a large chemical plume in a region R using a WSN, and are assuming that we know the total amount T of the chemical involved. If a sensor locally determines the quantity of the chemical, say, t in a subregion r, then we also know that there is T − t of the chemical in the rest of the region R − r, thus updating the global information as shown in Figure 1.3.

CHAPTER 1. INTRODUCTION

3

Figure 1.2: Wireless sensor nodes measure temperature in a vineyard, in British Columbia, Canada. Each sensor records temperature data periodically and reports back to a central server. Information from sensors is used to control the amount of water, cut the use of pesticide and determine harvest time.

1.2

Multi-Object Tracking Problem

Locations and identities of moving objects, e.g., people, vehicles, and laptops, provide important contextual information that can be useful in many applications. In ubiquitous computing, location-based services [6, 8, 28] exploit this information to provide personalized services to users. For surveillance applications, these information provide clues on what happened in the region of the interest. For multi-person sports like soccer, American football, movements of players provide enough information to understand the status of the game even without seeing the ball as in Figure 1.4. The tracking problem can be formulated as a stochastic estimation problem, where the goal is to estimate the mostly likely state1 Sk at time t = k∆, given a history of sensor measurements {Z1 , · · · , Zk }. This formulation requires two models – the Examples of a state vector Sk for the single-object tracking: Sk = [x y]T (position), Sk = [x y vx vy ]T (position + velocity). For the multi-object tracking, Sk is a concatenation of the single object states, i.e., Sk = [x1 y1 x2 y2 ]T (positions of two objects). 1

4

CHAPTER 1. INTRODUCTION

Sensor node Sensing range

t

T−t

local information global state update

Figure 1.3: Wireless sensors measure the amount of chemical plume in their sensing range. This application is an example of how local knowledge can be used to update global state information in WSN applications.

Figure 1.4: Locations and identities of players provide contextual keys to understand the status of a soccer game even without seeing the ball position.

CHAPTER 1. INTRODUCTION

5

dynamic model2 Sk = f (Sk−1 , nk−1 ) and the measurement model Zk = g(Sk , wk ), where nk and wk are the stochastic transition noise and the measurement noise, respectively. The seminal work by Kalman [29] provides an optimal, iterative solution of this formulation under rather strict assumptions of linear, Gaussian measurement and dynamic models given as follows. Sk = Ak Sk−1 + nk−1 Zk = H k S k + w k For sensing modalities like acoustic sensors or laser range sensors, however, the Gaussian assumption is not satisfied and a more general mathematical framework is required to deal with the general noise characteristics. Bayesian filtering [27] is a mathematical framework, where one can iteratively compute a posterior distribution given a history of measurements. The main theorem of Bayesian Filtering says that the posterior density p(Sk |Zk ) can be computed recursively as follows Z p(Sk |Zk ) ∝ p(Zk |Sk ) p(Sk−1 |Zk−1 )p(Sk |Sk−1 ) dSk−1

(1.1)

where p(S0 ) is a prior distribution, p(Zk |Sk ) is a measurement model given as a likelihood function and p(Sk |Sk−1 ) is a dynamic model. Figure 1.5 illustrates the Bayesian filtering framework. Nonparametric methods based on the Monte-Carlo sampling have been successfully used to solve the above Bayesian filtering equation in many different applications [7, 26, 32, 51]. It is relatively straightforward to apply the aforementioned Bayesian filtering framework to the single-object tracking problem and, as a result, it is considered more or less a solved problem in the tracking community. The multi-object tracking problem, however, is not a simple extension of the single-object tracking problem due to the data association problem – Which measurements are associated with which states in the multi-object setting? Figure 1.6 illustrates the main challenge of the data association problem for a simple case of two objects. One can see that, in the worst case, there are exponentially many ways to associating measurements to states 2

This is also called a prediction model or a stochastic transition model.

6

CHAPTER 1. INTRODUCTION

Figure 1.5: An example of how p(Sk |Zk ) is updated in a Bayesian filtering framework: The left figure is a posterior p(Sk−1 |Zk−1 ) at time k − 1. The middle figure is a predicted posterior p(Sk |Zk−1 ) at time k. The right figure is a posterior p(Sk |Zk ) after incorporating a measurement at time k. and there does not exist any computationally efficient algorithm for this problem. This problem is known to be NP-hard, since one can formulate the data association problem as the multi-dimensional assignment (MDA) problem [12, 21, 40].

S1

Z1

States

Measurements S

S1

2

Z

Z

1

Z

2

2

S1

Z

1

S2

Z

2

OR S2

N! associations, in general. Figure 1.6: Data association problem: Given N states S k and N measurements Z k , there are N ! ways to associate the measurements with the states. As a result of the NP-hardness, all the existing data association algorithms are heuristics. In other words, there always exists a finite probability of the tracking

7

CHAPTER 1. INTRODUCTION

system being confused about the identities of objects after a data association algorithm is applied, which we call identity swapping. Figure 1.7 illustrates the identity swapping. In the figure, two objects, a car and a bus, are moving along straight lines. A sub-optimal data association algorithm, however, could make a wrong association and concludes that the lower object is the car and the upper object is the bus. When there are many objects, these could lead to many instances of identity swapping and it is not possible to design reliable high-level applications based on the outputs of the tracking systems.

?

? This is a car!

This is a bus!

or data association:

sensor node:

active sensor:

object path:

Figure 1.7: Identity swapping example: Two objects moved along straight lines and a sub-optimal data association algorithm can conclude incorrectly that the bottom object is the car and the upper one is the bus. Using local evidence at a bottom sensor node, however, we can fix this problem. This is the main idea behind identity management on how to use local information to update global identity information. Given the nature of the data association problem, it seems that we cannot avoid the identity swapping problem. In a WSN, however, it is likely that some sensors are very close to moving objects and thus able to sense attributes of objects that can be used to fix the identity swapping problem. For example, suppose a sensor near the bottom object in Figure 1.7 reports the nearby object is actually the bus.

CHAPTER 1. INTRODUCTION

8

In this simple case, this local evidence implies that the upper object is the car and we can effectively fix the identity swapping problem using only local evidence on object identity. We use the exclusion among the two identities to update both of the identities – global information – using only local evidence. The above discussion leads to the main question of this dissertation. Can we use this local evidence of object identities to fix the identity swapping problem caused by the sub-optimal data association algorithms in a WSN? Instead of coming up with another data association algorithm, we propose a new framework called identity management, where the goal is to maintain the global object identities using only local information while objects are moving in a WSN and sensors are sensing identity information.

1.3

Related Work

Multi-object tracking algorithms depend on a single object tracking algorithm when objects are well-separated from one another. When objects are close to one another, the data association problem need to be solved – there are exponentially many ways to associate measurements to known states. There are two well-known methods for dealing with the data association problem – the Multiple Hypothesis Tracker (MHT) in [41] and the Joint Probabilistic Data Association Filter (JPDA) in [3, 4, 12]. Figure 1.8 illustrates the main idea of MHT – whenever new measurements are available from sensors, MHT generates hypotheses. In the worst case, the number of hypotheses can grow exponentially and MHT uses heuristics such as gating3 , pruning4 and N-scan-back5 . The main advantage of MHT is its ability to deal with the unknown number of objects even in a very cluttered environment, where there could be many false and missing measurements. The flexibility of MHT comes at the cost of exponential storage and running time, and it is crucial to have good heuristics to prune unlikely hypotheses. Despite its worst case complexities, it has been used 3

A measurement can be associated with a state only when it is within some distance from the state. 4 One can compute the probabilities of the hypotheses and eliminate unlikely hypotheses. 5 This heuristic considers only measurements within the recent N time steps.

9

CHAPTER 1. INTRODUCTION

extensively in military applications due to its near-optimal performance and has been successfully implemented in the visual tracking community [10, 11].

New measurements

Hypotheses generation

Correct!

Prunning heuristics

Wrong!

Figure 1.8: How MHT works: The ‘X’ marks denote new measurements. JPDA is inherently a sub-optimal solution for the data association problem, since it considers only the most recent set of measurements. Given a state Ski at time k, JPDA combines all the measurements {Zk1 , · · · , ZkN } in a probabilistically consistent manner as follows, p(Ski |Zk ) =

X

αj p(Zkj |Sk )p(Ski |Zk−1 )

j

where αij is a probability of associating the jth measurements Zkj to the ith state Ski P and j αij = 1. As one can see from the above equation, JPDA has the polynomial

running time and can be used with general dynamic models – JPDA does not specify

how to compute a predicted posterior p(Ski |Zk−1 ). For example, JPDA has been successfully implemented in a people tracking system [45] and used extensively for tracking aircrafts with different flight modes [2, 34]. In [39, 38, 44], the authors proposed a formal probabilistic approach called Identity Uncertainty to reasoning about identity under the framework of first-order logics of probability. Their setting is very general in that their framework allows one to

CHAPTER 1. INTRODUCTION

10

pose identity questions with varying number of objects, properties and relations. Approximate inference for their formal language can be done using Markov Chain Monte Carlo (MCMC) algorithms [37], where the Markov chain is defined on a state space consisting of the relational models of the first-order language. There are some notable differences between their approach and what we propose in this dissertation. First, their approach does not assume unique identities – in fact, the number of unique identities is an identity question posed in their language, while our approach assumes unique identities and exploits the exclusion among them to update global identity information. Second, their method does not seem to be easily distributable, while one of the focus of this dissertation is to design a framework that can be implemented in a purely distributed fashion in a WSN. Finally, the two approaches are fundamentally different in their goals – their approach concerns inference on identity, while we use identity information to update probabilities of different identities in a WSN. In fact, their approach can used to implement a local identity sensing module in our framework, whose outputs can then be used by our approach to update global identity information.

1.4

Main Contributions

The two main contributions of this thesis are as follows. 1. We introduce a mathematical framework called identity management to overcome the identity swapping problem: In this framework, we formulate identity management as a state estimation problem, where we track the probability distribution p(X), where X is one of N ! permutations representing identity assignments. We also define two events in a WSN that update the state X, namely mixing and evidence. The precise mathematical definition of the two events are presented. The probability distribution p(X) and the two events define the optimal identity management, which requires exponential running time and storage. 2. We provide two practical approximations of the identity management framework

CHAPTER 1. INTRODUCTION

11

that can be implemented in wireless sensor networks – Marginal belief matrix approach and Information matrix approach: These two approaches are based on totally different ideas – one maintaining N 2 marginal probabilities and the other maintaining N 2 log-likelihoods. Both of them have polynomial running times for the two operations and can be implemented in a purely distributed manner in a WSN. Neither of the methods is a clear winner in all situations and one must take into account their relative strength in making a decision on which to use for his/her own application. 3. We also given an information-driven sensor selection framework for localizing a moving object, which can be implemented in a distributed manner in a WSN. 4. We design a group management protocol: a light-weight distributed protocol for maintaining multicasting tree among moving agents in a WSN.

1.5

Thesis Organization

The remainder of this dissertation is organized as follows. Chapter 3, 4, 5 and 6 introduce the mathematical framework of the identity management and their approximations. Chapter 2 and 7 explain other necessary algorithms and protocols to support the multi-object tracking in a WSN. Chapter 2 introduces an information-driven sensor selection framework for an energy-efficient single-object tracking in a WSN. This chapter is based on a joint work [51] with Feng Zhao and Jim Reich, where I contributed as the second author. My main contributions are i) proposing the information-theoretic utilities based on the expected posterior and ii) running simulations to verify the efficiency and accuracy of the tracker with the proposed sensor selection utility. Chapter 3 formulates the problem of (the optimal) identity management in a centralized setting by precisely defining all the related mathematical quantities and operations on them, and explains how its solution can fix the identity swapping problem. Chapter 4 and 5 present the two practical approximations of the optimal identity management. Their representations, two operations, running time and distributed

CHAPTER 1. INTRODUCTION

12

implementation are discussed in detail. Chapter 6 compares the two approximations in terms of running time, storage, inference and distributed implementation. These three chapters are based on [47] and [48], where I proposed the whole mathematical framework, designed algorithms and ran simulations as the first author of both papers. Collaborators includes Leonidas Guibas, Nelson Lee, Sebastian Thrun and Feng Zhao. Nelson Lee implemented the experimental setup used in Chapter 6. Chapter 7 explains a group communication protocol required by the marginal belief matrix approach and is based on a joint work [49] with Anthony Mancho So and Leonidas Guibas. As the first author, I proposed the group communication framework, designed a distributed protocol for building the distributed collaboration graph (DCG) and co-designed a distributed protocol for maintaining a communication tree among agents in a DCG. Finally, Chapter 8 provides conclusions and suggests future work.

Chapter 2 Object Localization Although the single object tracking problem is considered as a solved problem, there are still some issues when it comes to its distributed implementation in a WSN. For example, which nodes will be responsible for solving the Bayesian sequential equation 1.1? Which nodes will provide measurements? Which nodes will store the location estimate? How is a query on a particular object handled in a distributed fashion? All these issues are caused by the following fundamental constraints of a WSN – i) a WSN is a resource-constrained system, and ii) all the information available are local. In other words, applications of a WSN might require significant communication among nodes to extract useful high-level information, since each sensor can only sense phenomena within its sensing range. Inter-node communication, however, is a major source of energy consumption in a WSN, and needs to be carefully scheduled to avoid draining node energy quickly. The problem we are addressing in this chapter is how to dynamically query sensors and route data in a network so that information gain is maximized while latency and bandwidth consumption is minimized. In general, the better information quality at a node could be obtained by incorporating more measurements from other sensor nodes. In practice, however, most WSN applications require information quality to be just good enough to answer queries by users. Let us consider a following toy example where a WSN tracks two objects, A and B, moving along a line as in Figure 2.1. In the figure, two scenarios of position estimates are shown – Scenario 1 shows two position estimates represented as 13

14

CHAPTER 2. OBJECT LOCALIZATION

A

B Scenario 1

A

B Scenario 2

Figure 2.1: A toy WSN application: A 1-D sensor network tracks two objects moving toward the same direction as the arrows. Scenario 1 uses less energy to localize the objects, while Scenario 2 uses more energy to localize them with greater accuracy. In determining “Who’s ahead?” from the two scenarios, however, there is practically no difference. Gaussian distribution with larger variances, and Scenario 2 shows the two estimates with smaller variances by incorporating more measurements from sensors, thus using more energy. Now, suppose a query “Which object is the front runner?” is injected in the WSN. Although Scenario 2 has much more accurate position estimates, there is not any practical difference in determining that B is ahead of A with high confidence. In fact, Scenario 2 has wasted the additional amount of energy it used from the application point of view. To exploit the above observation, each node should be able to evaluate the quality of information and to select a good (subset of) sensor(s) among its neighboring nodes in terms of quality of information. Given these capabilities, each node can decide whether to incorporate more measurements or not from its neighbors. Locally evaluating information quality is not a difficult task. For example, one can use the variance of a position estimate as a quality measure, when the estimate is represented by a Gaussian distribution, or can use the statistical entropy for general representations. However, finding a good sensor that would increase the estimate quantity the most is not a simple matter, since a node should only use information locally available – the meta data1 about its neighbors, like sensor positions and their modalities, and the 1

We assume the meta data about the neighboring sensors are made available during the network

CHAPTER 2. OBJECT LOCALIZATION

15

current estimate – we call this strategy information-driven approach. In this chapter, we propose an information-driven sensor selection strategy for the single-object tracking problem in a WSN. We first present a typical application scenario and describe different type of sensor selection heuristics based on informationtheoretic measures like entropy and KL divergence2 .

2.1

A Tracking Scenario

To illustrate the main idea of the information-driven approach, we consider a task of tracking a moving object through a two-dimensional sensor field as in Figure 2.2. A user initiates the following query: “report the position of the object every 5 seconds”. A few interesting features of the problem are worth noting. There is no road constraint, and therefore no prior knowledge of possible object trajectories can be exploited. Second, the vehicle can accelerate or decelerate, in between the nearest sensors. Both of these render traditional closest-point-of-approach (CPA) [20] based trackers difficult to apply. Third, many sensors can potentially make simultaneous observations and flood the network with the information. This requires the network to make intelligent decisions about who should sense and who should communicate and at what time. For the sake of simplicity, we focus on the sensor collaboration during the tracking phase, ignoring the detection phase and glossing over the details of routing the query into regions of interest. We further assume there is one leader 3 node active at any moment, and its task is to select and route tracking information to the next leader. Throughout this dissertation, we will use super-scripts to denote sensor identifiers and sub-scripts to denote time index, unless stated otherwise. 1. A user query enters the sensor network at node Q. 2. Meta knowledge guides the query towards a region of potential events. initialization phase. 2 We will use ‘KL divergence’ and ‘relative entropy’ interchangeably throughout this dissertation 3 We will use ‘leader’ and ‘agent’ interchangeably throughout this dissertation.

CHAPTER 2. OBJECT LOCALIZATION

16

Figure 2.2: A tracking scenario illustrating how the decision of sensor collaboration is accomplished using a measure of information utility as well as a measure of cost.

CHAPTER 2. OBJECT LOCALIZATION

17

3. Node a computes an initial estimate of vehicle state Sˆa , determines the next best sensor N EXT (Sˆa , λi ), i ∈ neighbor(a), and hands off the state information to b. is sensor characteristics for node i. 4. Node b computes a new estimate by combining its measurement Z b with the previous estimate Sˆa using, say a Bayesian filter: Sˆb = Sˆa ⊕ Z b ; N EXT = c 5. Node c computes: Sˆc = Sˆb ⊕ Z c ; N EXT = d 6. Node d computes: Sˆd = Sˆc ⊕ Z d ; N EXT = e 7. Node d sends current estimate back to the querying node Q. 8. Node e computes: Sˆe = Sˆd ⊕ Z e ; N EXT = f 9. Node f computes: Sˆf = Sˆe ⊕ Z f ; N EXT = · · · 10. Node f sends current estimate back to querying node Q. As the above tracking scenario illustrates, sensor selection is a local decision. The decision must be based on a measure of information utility and cost, which can be locally evaluated and updated. The following section will overview the informationdriven sensor querying (IDSQ) approach.

2.2

IDSQ: Information-Driven Sensor Querying

We formulate the problem of distributed tracking as a sequential Bayesian estimation problem. Assuming that the state of an object we wish to estimate is S. A new measurement Z j from sensor j is combined with the current estimate p(S|Z 1 , · · · , Z j−1 ), hereafter called belief state, to form a new belief state about the object being tracked. Selecting a sensor j that is likely to provide the greatest improvement to the estimation at the lowest cost can be formulated as an optimization problem. The objective function for this optimization problem can be defined as a mixture of both information gain and cost: M (p(S|Z 1 , · · · , Z j )) = αϕutility (p(S|Z 1 , · · · , Z j )) − (1 − α)ϕcost (Z j )

CHAPTER 2. OBJECT LOCALIZATION

18

where ϕutility (·) is the information utility measure, ϕcost (·) is the cost of communication and other resources, and α is the relative weighting of the utility and cost. We will refer to sensor l, which holds the current belief, as the leader node. This node might act as a relay station to the user, in which case the belief resides at this node for an extended time interval, and all information has to travel to this leader. In another scenario (such as in Figure 2.2), the belief itself travels through the network, and leadership is transferred from node to node through the network. Depending on the network architecture and the tracking task, either of these cases or a combination thereof can be implemented. The first term in the objective function M (·) characterizes the usefulness of the data provided by the sensor j. For example, when the sensor (e.g., a microphone measuring acoustic amplitude) provides a range constraint, the usefulness of the sensor data can be measured by how close the sensor is to the mean of the belief state under the Mahalanobis metric. We will return to this in some detail when we describe different criteria for sensor selection later. The second term measures the cost of obtaining the information, characterized by link bandwidth, transmission latency, node battery power reserve, etc. In the case of a moving leader node, this is the cost of handing the current belief state off to sensor j, acquiring data at sensor j, and combining the data with the current belief. In the case of a stationary leader node, this is the cost of requesting data from sensor j, acquiring the data and returning it to the leader to be incorporated into the belief state. In this case, the communication cost may be a function of the distance between sensor l and sensor j, as a crude measure of the amount of energy required to transmit the data from sensor j to sensor l. For example, with Mahalanobis distance as an information utility measure and Euclidean distance as an energy cost measure, the objective function becomes: ˆ −1 (S j − Sˆr ) − (1 − α)(S j − S l )T (S j − S l ) M (S j ) = −α(S j − Sˆr )T Σ ˆ S j and S l are the mean of the object position, its covariance, the position where Sˆr , Σ, of queried sensor l, and the position of querying sensor j, respectively. An example of using this objective function to query sensors and route data for a localization problem is illustrated in Figure 2.3. The task here is to determine which

CHAPTER 2. OBJECT LOCALIZATION

19

sensors have the most useful information and ship the information back to a fixed querying node, denoted by ‘?’ in the figure. It is important to note that incremental belief update during the routing dynamically changes both the shape and the offset ˆ at every node of the objective function according to the updated values of SˆT and Σ ˆ are passed on to the next along the routing path. As the updated values of SˆT and Σ node, all routing decisions are still made locally. The plotted objective function in the figure represents a snapshot of the objective function that an active routing node locally evaluates at a given time step.

2.3

Sensor Selection

Given the current belief state, we wish to incrementally update the belief by incorporating measurements of other nearby sensors. However, among all available sensors in the network, not all provide useful information that improves the estimate. Furthermore, some information might be useful, but redundant. The goal is to select an optimal subset and to decide on an optimal order of how to incorporate these measurements into our belief update. We emphasize again that, due to the distributed nature of a WSN, this selection has to be done without explicit knowledge of the measurement residing at each individual sensor, in order to avoid communicating less useful information. Hence, the decision has to be made solely based upon the sensor characteristics such as the sensor position or sensing modality, and the predicted contribution of these sensors. Figure 2.4 illustrates the basic idea of optimal sensor selection. The illustration is based upon the assumption that estimation uncertainty can be effectively approximated by a Gaussian distribution, illustrated by uncertainty ellipsoids in the state space. In the figure, the solid ellipsoid indicates the belief state at time t, and the dashed ellipsoids are the incrementally updated belief after incorporating an additional measurement from a sensor, S1 or S2, at the next time step. Although in both cases, S1 and S2, the area of high uncertainty is reduced by the same amount, the residual uncertainty in the case of S2 maintains the longest principal axis of the distribution. If we were to decide between the two sensors, we might favor case S1

CHAPTER 2. OBJECT LOCALIZATION

20

Figure 2.3: Sensor querying and data routing by optimizing an objective function of information gain and communication cost, whose iso-contours are shown as the set of concentric ellipses. This figure illustrates how a user query on the location of the object is being routed towards the maximum of the objective function – the center of the co-centric ellipses – along the routing path. The circled dots are the sensors being queried for data along the path. T represents the object position, with its covariance shown as a small red ellipse. ? denotes the position of the query origin.

CHAPTER 2. OBJECT LOCALIZATION

21

Figure 2.4: Sensor selection based on information gain of individual sensor contributions. The information gain is measured by the reduction in the error ellipsoid. In the figure, reduction along the longest axis of the error ellipsoid produces a larger improvement in reducing uncertainty. Sensor placement geometry and sensing modality can be used to determine the potential information gain by the two sensors S1 and S2.

CHAPTER 2. OBJECT LOCALIZATION

22

over case S2, based upon the underlying measurement task. Although details of the implementation depend on the network architecture, the fundamental principles introduced in this chapter hold for both, the selection of a remote sensor by a leader node (e.g., a cluster head), as well as the decision of an individual sensor to contribute its data and to respond to a query traveling through the network. The task is to select the sensor that provides the best information among all available sensors that have not been incorporated. As has been shown in [7], this provides a faster reduction in estimation uncertainty, and usually incurs a lower communication overhead for meeting a given estimation error requirement, compared to blind or nearest-neighbor sensor selection schemes.

2.4

Information Utility

Information utilities play a key role in the information-driven approach to sensor selection. In this section, we first introduce an information-theoretic definition of the utility measure. We then describe several approximations to the measure that prove to be practically useful. Our goal is to predict the information utility of a piece of non-local sensor data before obtaining the data. In practice, the prediction must be based on the currently available information: the current belief state, the characteristics of the sensor of interest which includes information such as the sensor position and sensing modality that can be established beforehand. We assume there are n sensors labeled from 1 to n and the corresponding measurements of the sensors are Z 1 , · · · , Z n . Let U ⊂ {1, · · · , n} be the set of sensors whose measurements have been incorporated into the belief. That is, the current belief is p(S|{Z i }i∈U ). The sensor selection task is to choose a sensor whose data has not been incorporated into the belief yet and which provides the most information. To be precise, let us define an information utility function that assigns a value to each probability distribution. For the moment, we ignore the cost term in the objective function. The best

CHAPTER 2. OBJECT LOCALIZATION

23

sensor, defined by the earlier objective function, is given by ˆj = argmaxj∈V ϕutility (p(S|{Z i }i∈U ∪ {Z j })) where V is the set of sensors whose measurements are potentially useful. The following are possible information utility functions. Mahalanobis Distance Measure For a particular belief state and sensing modality, the Mahalanobis distance can be used as an information utility for sensor selection. Its main idea is illustrated in Figure 2.4. In the figure, the solid squares labeled S1 and S2 are sensors whose measurements can potentially improve the current belief state. Suppose each sensor provides a range measurement, e.g., an object is 10 meters away from here with some uncertainty. A new belief state, whose uncertainty is shown as dashed ellipse, is computed by incorporating the measurement into the current belief state using the Bayes rule. In the sensor configuration shown, S1 would provide better information than S2 because S1 lies close to the longer axis of the uncertainty ellipse, thus decreasing uncertainty more along the longer axis. To favor sensors along the longer axes of an uncertainty ellipsoid, we use Mahalanobis distance – a distance measure normalized by the uncertainty covariance, defined as follows ˆ Σ) ˆ = −(S j − S) ˆΣ ˆ −1 (S j − S) ˆ , ϕ(S j , S, where S j is the position of sensor j, and Sˆ is the mean of the belief (object position estimate). The Mahalanobis distance-based utility works well when the current belief can be well approximated by a Gaussian distribution or the distribution is very elongated, and sensors provide range measurements. For general uncertainty distributions or sensor modalities other than range sensors, we need to develop alternative information utility measures.

CHAPTER 2. OBJECT LOCALIZATION

24

Information-theoretic Measures: Entropy and KL-divergence An information utility function ϕ(·) evaluates the compactness of the belief state distribution. A natural choice of ϕ(·) is the statistical entropy, which measures the uncertainty of a given random variable. Mathematically, the entropy is defined as Hp (S) = −

X

p(S) log p(S) ,

S∈Ω

for a discrete random variable S. The equivalent definition for a continuous random variable is Hp (S) = −

Z

p(S) log p(S)dS , Ω

where Ω denotes the support of the random variable S in both definitions. Generally speaking, the smaller the entropy is, the more certain we are about the value of the random variable. Considering this meaning, the entropy-based utility measure can be simply defined as a negative entropy, that is, ϕ(p(S|{Z i }i∈U ∪ {Z j })) = −Hp (S) . Another information-theoretic quantity that can be used as an information utility is the KL-divergence, which is a distance measure4 between two probability distributions f (S) and g(S) d(f ||g) =

X

f (S) log(f (S)/g(S))

S∈Ω

Given the definition, the KL-divergence-based utility can be defined as follows, ϕ(p(S|{Z i }i∈U ∪ {Z j })) = d(f (S)||g(S)) , where f (S) = p(S|{Z i }i∈U ) is a prior distribution before incorporating a measurement from sensor j and g(S) = p(S|{Z i }i∈U ∪ {Z j })) is a posterior distribution after the measurement incorporation. 4

KL-divergence is not a metric – triangle inequality does not hold.

25

CHAPTER 2. OBJECT LOCALIZATION

Information-theoretic measures on Expected Posterior Distribution Note that both information-theoretic utilities discussed above are practically infeasible – how can we evaluate an utility of a posterior probability distribution without a measurement? To deal with this issue, we propose an expected posterior distribution, as a practical alternative of the real posterior distribution. The main idea is to guess the likelihood function at each neighbor based on the predicted posterior and the sensor position. We repeat the Bayesian filtering equation for the sake of completeness. p(Sk+1 |Zk+1 ) ∝ p(Zk+1 |Sk+1 ) where p(Sk+1 |Zk ) =

R

Z

p(Sk+1 |Sk )p(Sk |Zk )dSk ,

p(Sk+1 |Sk )p(Sk |Zk )dSk is a predicted posterior and p(Zk+1 |Sk+1 )

is a likelihood function. For simplicity, we assume that the belief state is represented as a discrete set of samples on a grid in the state space. This non-parametric representation of the belief state allows us to deal with general multi-modal distributions and nonlinear dynamics. Figure 2.5 shows an example of the grid-based state representation. The gray squares represent the likely position of the object as specified by the current belief. The brighter the square is, the higher the probability of the position of an object being there is. i Given the observation model Zk+1 = hi (Sk+1 ) of sensor i, we can guess the meai i surement Zk+1 from the predicted belief – the measurement Zk+1 conditioned on a

state Sk = sl is given as hi (Sk+1 = sl ), where l is an index for a grid point in the representation. Then, the expected likelihood function can be computed as a weighted sum of the marginal likelihood function conditioned at each grid point in the predicted belief distribution as follows. i pˆ(Zk+1 |Sk+1 ) =

X

Lli (Sk+1 , wkl )p(Sk+1 = sl |Zk ) ,

sl ∈Ω i where Lli (Sk+1 , wkl ) = p(Zk+1 = hi (sl )|Sk+1 ), Ω is the support of the grid representa-

tion of Sk+1 , and the predicted posterior p(Sk+1 |Zk ) can be computed as follows X p(Sk+1 |Sk = sl )p(Sk = sl |Zk ) . p(Sk+1 |Zk ) = sl ∈Ω

CHAPTER 2. OBJECT LOCALIZATION

26

Figure 2.5: The expected likelihood function for each sensor (i or j) is a weighted sum of the marginal likelihood function conditioned at each grid point in the predicted belief distribution. The expected posterior can then be computed from this likelihood function.

CHAPTER 2. OBJECT LOCALIZATION

27

i Using the expected likelihood function pˆ(Zk+1 |Sk+1 ) from sensor i, the expected

posterior belief pˆ(Sk+1 |Zk+1 ) can be computed as follows. pˆ(Sk+1 |Zk+1 ) ∝ pˆ(Zk+1 |Sk+1 )p(Sk+1 |Zk ) .

(2.1)

Given the expected posterior, the aforementioned information-theoretic utilities can be actually implemented. In Section 2.5, their performances will be compared with other utilities through extensive simulations. This approach can be applied to non-Gaussian belief states, since the discrete approximation of the belief state assumes a general form. In fact, one can easily extend the equation for the expected posterior 2.1 to a general continuous representation by replacing the summation over the discrete grids with an integral over a continuous supports of Sk+1 .

2.5

Experimental Results

In this section, we present computational results from applying the information utility measures introduced in the previous section to the single-object localization and tracking problems. The following assumptions are made for simulations. • An object is a point sound source and sound propagation is lossless and isotropic. • There are two kinds of sensors – acoustic sensors and bearing sensors. • For an acoustic sensor, a root-mean-squared (RMS) amplitude measurement Z is related to the sound source position S as follows Zi =

a +w, ||S − S i ||

where a is the RMS amplitude of the sound source, S j is the position of the sensor j, || · || is l2 -norm, and w is RMS measurement noise [30]. For simplicity, we assume w is a zero-mean Gaussian random variable and a is a uniform random variable. Figure 2.6 shows a likelihood function p(Z i |S) derived from the above model.

CHAPTER 2. OBJECT LOCALIZATION

28

Figure 2.6: Example of a likelihood function of an acoustic sensor: The likelihood is symmetric around the position of the sensor and is peaked along a circle, which is the most likely position for the sound source.

2.5.1

Localizing a Stationary Object

We compare the information-driven sensor selection with the nearest neighborhood (NN) selection in the context of localizing a stationary object with acoustic sensors. Figure 2.7 shows two snapshots of the tracking algorithm based on the nearest neighborhood criterion. Figure 2.7(a) shows the posterior distribution after combining the data from the sensor at the middle of the linear array with the data from its two nearest neighbors. The updated posterior distribution still remains as a bimodal distribution (Figure 2.7(b)) until the sensor at the upper-left corner is finally selected. In Figure 2.8, the sensor selection is based on the Mahalanobis distance measure. Figure 2.8(a) shows the posterior after combining the measurements from the same three sensors near the middle as in Figure 2.7(a). The residual uncertainty, however, is elongated and thus the upper-left sensor is selected as the next sensor according to the Mahalanobis distance. The new measurement from that sensor reduces the uncertainty to a very small region (Figure 2.8(b); also compare Figure 2.8(b) with Figure 2.7(b)).

29

CHAPTER 2. OBJECT LOCALIZATION

(a)

(b)

Figure 2.7: Sensor selection based on the nearest neighbor method. The estimation task here is to localize a stationary object labeled ‘*’ at the center. (a) Select the nearest sensor ; (b) Incorporate the new measurement from the selected sensor.

2.5.2

Tracking a Moving Object with Non-Gaussian Distribution

We present experimental results on how the information utility measures can be applied to a tracking problem with a general belief state representation. In our simulation, we assume a leader node (the square-enclosed dot in Figure 2.9) carries the current belief state. The leader chooses a sensor with the highest utility from its neighbors according to the information measure, and then hands off the current belief to the chosen sensor (the new leader). As discussed earlier, the information-based approach to sensor querying and data routing selectively invokes sensors to minimize the number of sensing actions needed for a given accuracy and hence, latency and energy usage. It can optimize the use of multi-sensing-modality information (e.g., range and bearing sensing) to improve tracking accuracy. It can also handle non-constant object dynamics and is more general than the CPA-based method. Figures 2.10, 2.11, 2.12, 2.13, 2.14, 2.15, 2.16 and 2.17 show the performance of various trackers based on four different sensor selection criteria (information utilities): nearest neighborhood (a)-(b), minimizing Mahalanobis distance (c)-(d), minimizing

30

CHAPTER 2. OBJECT LOCALIZATION

(a)

(b)

Figure 2.8: Sensor selection based on the Mahalanobis distance measure of information utility. The localization problem is the same as that in Figure 2.7 entropy (e)-(f), and maximizing KL-divergence (g)-(h). For each criterion, we also examine the effect of a simple heuristic that prevents each node from being selected more than C times, where C is a constant determined by various factors – frequency of measurement incorporation, sensing noise and etc. Since the position estimate of an object provided by a single range sensor is under-constrained, it is desirable that the same sensor not be selected over and over again in order to maintain a certain amount of spatial diversity. In practice, the repeated use of information from a single sensor may also lead to an over-confident estimate due to correlations between consecutive measurements from the same sensor. Figure 2.10, 2.12, 2.14 and 2.16 are the results without the heuristic and Figure 2.11, 2.13, 2.15 and 2.17 are ones with the heuristic. These empirical results indicate that both information-theoretic utilities without the heuristic provide excellent tracking accuracy and outperform the other utilities by a huge margin, while the Mahalanobis distance and the nearest neighborhood select the same set of sensors repeatedly, thus forcing the position estimate to be biased as a result. For the Mahalanobis distance and the nearest neighborhood utilities, the heuristic seems very useful by preventing the same set of sensors from being selected repeatedly, while it

31

CHAPTER 2. OBJECT LOCALIZATION

150

150

100

100

50

50

0

0

50

100

(a)

150

0

0

50

100

150

(b)

Figure 2.9: Tracking a moving object using the information-driven approach. A object is moving from the bottom of the field to the top. As the object moves through the field of sensors denoted by the dots, a subset of the sensors are activated to carry the belief state. Each new sensor is selected according to an information utility measure on the expected posterior distribution of the new state. (a) Current belief distribution at time t. (b) New posterior distribution at time t+1, after incorporating a measurement from the selected sensor.

32

CHAPTER 2. OBJECT LOCALIZATION

Euclidean Distance without the Heuristic 150

Estimation Error(meter)

100

50

0 30

40

50

60 70 Sensor Density

80

90

100

Figure 2.10: Tracking error: Nearest neighborhood algorithm

Euclidean Distance with the Heuristic 150

Estimation Error(meter)

100

50

0 30

40

50

60 70 Sensor Density

80

90

100

Figure 2.11: Tracking error: Nearest neighborhood with the heuristic

33

CHAPTER 2. OBJECT LOCALIZATION

Mahalanobis Distance without the Heuristic 150

Estimation Error(meter)

100

50

0 30

40

50

60 70 Sensor Density

80

90

100

90

100

Figure 2.12: Mahalanobis distance

Mahalanobis Distance with the Heuristic 150

Estimation Error(meter)

100

50

0 30

40

50

60 70 Sensor Density

80

Figure 2.13: Mahalanobis distance with the heuristic

34

CHAPTER 2. OBJECT LOCALIZATION

Entropy without the Heuristic 150

Estimation Error(meter)

100

50

0 30

40

50

60 70 Sensor Density

80

90

100

80

90

100

Figure 2.14: Entropy

Entropy with the Heuristic 150

Estimation Error(meter)

100

50

0 30

40

50

60 70 Sensor Density

Figure 2.15: Entropy with the heuristic

35

CHAPTER 2. OBJECT LOCALIZATION

Relative Entropy without the Heuristic 150

Estimation Error(meter)

100

50

0 30

40

50

60 70 Sensor Density

80

90

100

90

100

Figure 2.16: KL-divergence

Relative Entropy with the Heuristic 150

Estimation Error(meter)

100

50

0 30

40

50

60 70 Sensor Density

80

Figure 2.17: KL-divergence with the heuristic

CHAPTER 2. OBJECT LOCALIZATION

36

Figure 2.18: Snapshot of a typical simulation using the KL-divergence as the information utility. The yellow dots are sensor nodes and the gray grids are current estimates on the object position. The pink-square-enclosed nodes is the current leader, and the green-square-enclosed nodes are its current neighbors. In this example, sensor density is 100, and the node communication range is set to 30. actually hurts in the case of the information-theoretic measure. The larger error for the low sensor density in the Mahalanobis and the nearest neighborhood utility is because there are not enough sensors to work when the heuristic is used. Figure 2.18 shows a snapshot of a simulation run using the KL-divergence as the information utility. Table 2.1 summarizes the statistics from the simulation results in Figures 2.10 – 2.17. A track is considered ‘lost’ when the final estimated position of the object is more than 15 meters away from the actual position. Figure 2.19 shows the estimated tracks for each of the information utilities, ordered in the same way as in Table 1, for sample runs with sensor density 60.

37

CHAPTER 2. OBJECT LOCALIZATION

Figure 2.19: Tracking results for different choices of the utility functions. The (red) straight lines are the actual tracks, and the (blue-gray) curves are the estimated ones.

(a) Nearest neighbor (b) Nearest neighbor with the heuristic (c) Mahalanobis distance (d) Mahalanobis with the heuristic (e) Entropy (f) Entropy with the heuristic (g) KL-divergence (h) KL-divergence with the heuristic

# lost tracks 75 37 70 32 0 0 0 2

Mean error 34.39 44.79 24.86 44.20 5.13 5.05 8.09 10.79

Table 2.1: Statistics on tracker performance for the different information utilities obtained from 80 runs per information utility.

CHAPTER 2. OBJECT LOCALIZATION

2.6 2.6.1

38

Discussion Representation of Belief State

In our tracking example, the belief state is being passed over from one leader to another. To efficiently implement the information-driven tracking algorithm, it is important to design a compact representation of belief states so as to minimize the communication bandwidth. While the parametric representation is the most compact one, it is limited to special classes of distributions and is not suitable for many practical sensing modalities. On the other extreme, one can approximate an arbitrary distribution with a set of discrete samples, as we have used in the experimental results. This forms the basis of many successful Monte Carlo based algorithms such as sequential Monte Carlo [16]. One drawback of the non-parametric approaches is the large numbers of samples needed to approximate the underlying belief state well, and addressing this problem is an active topic of current research. Somewhere between the parametric and particle sample based approaches lies the grid-based representation of beliefs. In this representation, each grid approximates the value of the belief at the grid location. When the state space is sparse, meaning many parts of the space have negligible probability masses, the grid can be efficiently encoded in a sparse representation to minimize storage requirement. Figure 2.20 provides a pictorial description of the three representations.

2.6.2

Sequential vs. Concurrent Information Exchange

In our discussion thus far, we have primarily focused on the case where a leader node selects the next leader to hand off the belief state information. While the idea is simple, this single node-to-node handoff may suffer information loss when the current leader/link incurs a failure. A more robust scheme uses a zone-to-zone hand-off where a group of sensors (in a zone) elect a leader and collectively maintain the belief state. The leader performs the normal node-to-node handoff to the next chosen leader of a cluster of sensors. But when the leader node fails, another sensor in the same cluster

CHAPTER 2. OBJECT LOCALIZATION

39

Figure 2.20: Representation of belief: parametric (e.g. Gaussian), non-parametric (e.g., grid samples, particle samples) will step in as an acting leader, and the handoff continues. Pushing this idea to the extreme, every sensor may exchange information with another sensor in parallel, perhaps at the cost of consuming greater overall communication bandwidth. There are several possible types of information to exchange. Each sensor can send its own belief to a chosen next sensor, it can send its measurement to a neighbor sensor to whom the data is likely to be useful, or it can request other’s belief or measurement. The selection of which style of information exchange to deploy for a sensor net depends on how the information will be extracted and used (e.g. query types) and the level of desired robustness to failure and tolerance for latency.

2.6.3

Query Types

In our straw-man tracking scenario (Figure 2.2), we assumed that a user initiates a query about the location of a vehicle as a function of time. In other cases, one might expect the network to possess some low-level awareness of the objects. When an event was detected and classified, the network can initiate a tracking task. As the network begins to activate sensors in a local region to maintain active belief states, the information about the current active regions and their event logs can help to guide user queries, which may enter the network at any point of the network, into the region of high information relevance. Beyond the single object tracking scenario, one might be interested in tracking

CHAPTER 2. OBJECT LOCALIZATION

40

a group of objects or relations among the objects [24]. In these cases, a user initiated high-level query may, upon entering the network, break into several sub-queries, some of which are routed into regions where individual objects are present, and some of which coordinate the routing or interpretation of the first group of sub-queries. Complexities arise as objects merge, split, or cross-over. Sub-queries may have to reconcile with each other periodically to maintain consistency. The taxonomy of query types, the nature of physical phenomenon being observed, and their corresponding in-network processing styles are future topics of investigation.

2.6.4

Bias in Sensor Selection

The information utility measures we have introduced are approximate in nature. Mahalanobis distance is a heuristic for measuring the utility of range sensing data. In the presence of the finite-precision representation of probability distribution and possibly nonlinear utility functions, the sensor selection based on the expected posterior computed from expected likelihood function may be strongly biased by the prior distribution. We have observed that some of the cases where the track is lost are actually due to this bias. A poorly approximated prior may produce incorrect utility values, leading to the selection of less useful sensors eventually causing the tracking error to explode. Similar to the sequential Monte Carlo method (also known as particle filters), one way to reduce the effect of the poorly approximated priors on sensor selection is to design proposal distributions that draw on multiple information sources.

2.6.5

Tracking Robustness

The quality of tracking is a complex function of several parameters: sensor placement density, sensing range, communication range, spatial extent of the physical phenomenon being observed, object dynamics, signal-to-noise ratio (SNR). Another critical issue for robustness is handshake during information handoff from node to node. We expect future research will seek to understand the effect of these parameters on the behaviors of trackers and design robust communication protocols.

Chapter 3 Identity Management Framework In Chapter 2, we introduced the information-driven approach to efficiently implement a single-object tracking system in a WSN and demonstrated that the proposed framework provides excellent tracking quality while mitigating the cost of communication and other resource usage in a WSN. In the upcoming chapters, we turn our attention to the main focus of this dissertation, the multiple-object tracking problem, and present a mathematical framework where the goal is to augment sub-optimal data association algorithms by maintaining a probability distribution over the set of possible identities, thus providing an identity management framework.

1

2

3

4

5

Figure 3.1: Quiz: Five different vehicles in five different garages

Let us imagine we are solving a quiz in a TV show. In the beginning, the host shows five different vehicles in five different garages as shown in Figure 3.1 and closes all the doors. He then gives the following orders to his crew members behind the state – “Swap the two vehicles in the second and third garages.” However, the crew 41

42

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

1

2

3

4

5

Figure 3.2: An example of ‘swapping order’ from the host: Two vehicles in garages 2 and 3 could be swapped with some probability.

members do not always listen to the host and could ignore the orders at times. This information is illustrated in Figure 3.3. In-between these ‘swapping orders’, the host takes a peek at one of the doors and tells us what is inside, although he too could be lying. Figure 3.3 illustrates this ‘hint’ from the host. After giving many ‘swapping

1

2

3

5

Figure 3.3: The host peeks at one of the garage doors and tell what is inside, although he could be lying.

orders’ and ‘hints’, the host finally points out a particular garage and asks “What’s inside?” Can we answer this question? If yes, then with how much confidence? Mathematically, the above quiz is equivalent to the problem of the identity management for tracking multiple moving object in a WSN. The ‘swapping order’ by the host corresponds to object mixing caused by objects being in proximity, and the ‘hint’ from the host corresponds to local evidence by sensors. Given the uncertainties in both events, one can expect that the above question – and the identity management – can be best handled by a probabilistic formulation. In the upcoming section, we will precisely define the identity management and the related mathematical quantities.

3.1

Problem description

We first define the notion of the join identity state X of the N objects.

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

43

Definition 1. The joint identity state of N objects is defined as X = (x1 , · · · , xN ) , where xj is the marginal identity state for the jth object. xj takes a value xj = i ∈ {1, · · · , N }, indicating that the jth physical object has an identity i. No two different objects can have the same identity. In the context of the quiz described in the previous section, X is a particular ordering of the vehicles in the garages, and the total number of the possible states is N ! = 1 × · · · × N . In general, X can take on N ! different permutations – X ∈ SN , where SN is the symmetric group of N elements.1 SN can be represented by the set of all N × N permutation matrices – 0-1 N × N matrices with exactly one 1 in each row and column, each of which represents an identity assignment between physical objects and identities. For example, two joint identity states X = (x1 = 1, x2 = 2, x3 = 3), X = (x1 = 2, x2 = 1, x3 = 3) of a N = 3 case can be represented as the following permutation matrices. 

1 0 0

 X = (x1 = 1, x2 = 2, x3 = 3) →  0 0  0  X = (x1 = 2, x2 = 1, x3 = 3) →  1 0



 1 0 =I 0 1  1 0  0 0  = I(1,2) 0 1

Definition 2. X(i,j) is a N × N matrix obtained by swapping ith and jth columns of X and is called an (i, j) transposition of X. The relation between X and X(i,j) is given by the following equation. X(i,j) = XI(i,j) , where I is an N × N identity matrix and I(i,j) is a matrix obtained by swapping the ith and jth columns of I. 1

The symmetric group SN is the set of all permutations on N objects under permutation composition.

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

44

As discussed before, the setting calls for a probabilistic formulation on the identities of objects, and we propose to maintain a joint probability distribution p(X) over all N ! identity states X ∈ SN , which we call the joint identity distribution. The joint identity distribution p(X) encodes all the information about the identities of objects. For example, the following are a few queries that can be answered from the joint identity distribution p(X). • What is the most likely identity assignment X? max p(X) X

• What is the probability of an object j having a particular identity i? (Marginal probability) p(xj = i) =

X

p(X) ,

X∈K(i,j)

where K is a set of all permutation matrices, whose (i, j) entry is 1. • Which two objects, i and j, are most likely to be a particular identity k? max p(xi = k) + p(xj = k) i,j

Our goal is to maintain the joint identity distribution, while objects are moving. The joint identity distribution is updated through two types of events: mixing events and local evidence events, as shown in Figure 3.4. Intuitively, a mixing event occurs when two object locations are so close that their identities are no longer distinguishable by nearby sensors. This will increase the uncertainty of identity assignments in p(X). A local evidence event happens when a sensor node makes measurements on the identity of a specific object2 and updates p(X) using Bayes rule. Local evidence events generally reduce the entropy of p(X). Our goal is to maintain p(X) on-line while these two types of events are occurring. The two event types will be precisely defined in the following sections. 2

In a WSN setting, objects may pass near sensor nodes which can then determine their identity, either through signal classification techniques or directly, as in the case of RFID tag readers.

45

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

p(X)

?

Mixing

Evidence This is a car!

? Identity confusion p(X)

Figure 3.4: The joint identity distribution p(X) evolves through two events – mixing and local evidence

3.1.1

Mixing

A mixing event is caused by two objects being in proximity and thus their identities are no longer distinguishable by sensors. For simplicity, we assume that only a single pair of objects can be mixed at the same time. Figure 3.5 illustrates how the joint identity distribution p(X) can be updated after a mixing between the ith and jth objects. The circles on the left side are the position estimates at time k and the ones on the right are the position estimates at time k + 1. Suppose the ith and jth position at time k and k + 1 are very close and it is not certain how to associate them, that is, we have a mixing event between the ith and jth objects. Let p(X) and q(X) be the joint identity distributions before and after the mixing, respectively. Let α be a probability of the swapping of the ith and jth objects. The, we can write the joint identity distribution q(X) as follows. q(X) = (1 − α)p(X) + αp(X(i,j) )

(3.1)

According to the representation theory on the symmetric group [13], the mixing equation above is a convolution operation between the joint identity distribution p(X) and a mixing distribution m(X), which is defined as follows.

46

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

x1 ...

...

xi

xi

...

...

p(X)

x1

xj

or

q(X)

xj

...

...

xN

xN

Figure 3.5: A diagram illustrating a mixing event between ith and jth object.

Definition 3. Mixing between the ith and jth objects is defined as a convolution operation of p(X) and m(X) as follows. X p(s)m(Xs−1 ) , p ? m (X) =

(3.2)

s∈Sn

where s

−1

is an inverse permutation of s, and   1−α    m(X) = α    0

m(X) is defined as follows. X=I X = I(i,j) otherwise

where α is a mixing probability, I is the N × N identity matrix and I(i,j) is its (i, j) transposition. Figure 3.6 shows an example of a convolution operation for N = 3. The x and y axes represent identity permutations and their probabilities, respectively. Comparing the joint identity distribution before and after the mixing, p(X) and q(X), it is apparent that the values of q(X) are smaller and more uniform than p(X) – the uncertainty in the joint identity distribution has increased. Lemma 1. The statistical entropy of the joint identity distribution can only increase after a mixing. H(q(X)) ≥ H(p(X))

47

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

1−a a

x1

x2

0.5

0.5

I=x 1

I (i,j) m(X)

p(X)

0.5(1−a) 0.5a

x1

0.5(1−a) 0.5a

x2

x 1 I (i,j) x 2 I (i,j) q(X)

Figure 3.6: Mixing as a convolution – q(X) = p ? m (X): x and y axes represent permutations and probabilities, respectively.

Proof. The above convolution operation can be represented as the following matrix multiplication → − − q = M→ p , − − where → p and → q are N ! × 1 vectors representing the two distributions and M is a N ! × N ! mixing matrix, in which each row (and column) has only two non-zeros values α and 1 − α. According to the Birkhoff theorem [5], any doubly-stochastic matrix can be represented as a convex sum of permutation matrices of the same size. P !)! P Replacing M with (N i ai = 1, ai ≥ 0 and Πi is the ith (N ! × N !) i=1 ai Πi , where permutation matrix, and taking the entropy on both sides of the above equation, we

48

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

have the following. − H(→ q ) = H(

(N !)!

X

− ai Πi → p)

i=1

(N !)!

X



− ai H(Πi → p)

i=1

(N !)!

=

X

− ai H(→ p)

i=1

− = H(→ p) The inequality comes from the concavity of the entropy. This concludes the proof. The claim of the above lemma, that the uncertainty never decreases with mixing, agrees with our intuition that mixing only adds uncertainty. After repeated mixings, the identity state distribution converges to the uniform distribution.

3.1.2

Local Evidence

A local evidence event is a sensor measurement on the identity of an object, and is represented as a likelihood function in the probabilistic formulation. Definition 4. A local evidence Z = (i, j) is a sensor measurement stating that the jth has the ith identity. Given inherent uncertainty in sensing, we will use the following probabilistic sensor measurement model for identity sensing. Definition 5. The likelihood of an identity measurement Z on jth object at a sensor node is defined as follows. p(Z = (i, j)|X) = where 0 ≤ β ≤ 1.

 β

(1 − β)/(N − 1)

xj = i xj 6= i

(3.3)

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

49

The above probabilistic measurement model can be interpreted as follows. Suppose a sensor says the object within its sensing range is the bike as in Figure 3.7. It is very likely that the object is indeed the bike, although it is also possible that the object is something else due to sensor noise.

a less likely cause

?

the most likely cause a less likely cause a less likely cause

This object is the bike!

a less likely cause

Figure 3.7: Identity sensing model used in this thesis

Given a local evidence Z = (i, j), we can update the joint identity distribution p(X) by incorporating the likelihood function p(Z = (i, j)|X) using the Bayes rule as follows. p(X|Z = (i, j)) ∝ p(X)p(Z = (i, j)|X) , where we use ∝ to indicate that the right side of the equation is proportional to the P left – the normalization constant c = 1/ X p(X)p(Z = (i, j)|X) on the right side of the equation is omitted for simplicity.

Under the conditional independence assumption3 , multiple local evidence can be incorporated as follows. p(X|Z = Z 1 , · · · , Z k ) ∝ p(X)p(Z 1 |X) · · · p(Z k |X) , where Z i is a local evidence at sensor i. 3

Sensor measurements are statistically independent given the identity state X.

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

Continuous prior

Prediction

Likelihood incorporation

Discrete prior

Convolution

Likelihood incorporation

50

Figure 3.8: Continuous vs. Discrete Bayesian Filtering

3.1.3

Identity Management as a discrete Bayesian filtering

The two operations on p(X) define a discrete Bayesian filter on p(X), completely analogous to usual continuous Bayesian filtering – the mixing (convolution) corresponds to the prediction step in the continuous filtering and the local evidence corresponds to the observation or likelihood incorporation step. The only difference is that the prediction step in discrete Bayesian filtering happens only at discrete-time mixing events, while the prediction step in the continuous case happens at every time step. Now, we consider a case where we have both local evidence and mixing together. For example, say we are given events m1 (X), Z1 , Z2 , m2 (X) in that order (here m denotes mixing and Z local evidence events). The posterior can be computed as4 p(X) ∝ [{(p0 (X) ? m1 (X)) ? m2 (X)}p(Z1 |X)p(Z2 |X)] ? m3 (X) The above computation, unfortunately, is practically infeasible due to the exponential complexity of the two operations – convolution from a mixing requires O(N !) operations and Bayesian normalization from local evidence requires O(N !). Therefore, we need to efficiently approximate the joint distribution p(X) and the 4

The real computation is done in an iterative fashion – whenever there is a mixing or a local evidence event, we update p(X) accordingly.

CHAPTER 3. IDENTITY MANAGEMENT FRAMEWORK

51

two update operations, so that we can implement them in a WSN. In the upcoming sections, we will introduce two practical approximations based on marginal probabilities and log-likelihoods, the so-called marginal belief matrix approach and information matrix approach, respectively.

Chapter 4 Marginal Belief Matrix In this chapter, we present the first practical approximation, called the marginal belief matrix approach. Here the main idea is to maintain N 2 marginal probabilities, since they are the information most users want eventually. We first introduce the definition of the marginal belief matrix and explain how the two operations of the optimal identity management can be re-defined for this representation. The issues related to the distributed implementation are discussed and experimental results from simulation is presented.

4.1 4.1.1

Marginal Belief Matrix B Definition

In Chapter 3, we defined the joint identity state X as X = (x1 , · · · , xN ), where xj = i (i, j ∈ {1, · · · , N }) indicates the marginal event that the jth object has the ith identity. In other words, the joint random variable X is composed of N marginal random variables xj , j ∈ {1, · · · , N }, which are statistically dependent on each other due to exclusion – if a marginal variable xj has an identity i, then no other marginal variables xk , x 6= j can have the identity i. The marginal probabilities are typically what users want to know, since the most natural question on object identities is “which object is this?”. There are total N 2

52

CHAPTER 4. MARGINAL BELIEF MATRIX

53

marginal events, which can be encoded as an N × N matrix B, called the marginal belief matrix. Definition 6. The marginal belief matrix B is an N ×N matrix, whose (i, j)th element bij is given as follows. bij = p(xj = i)

(4.1)

where i, j ∈ {1, · · · , N }. The marginal belief matrix B encodes information on who’s who with what probability. The jth column bj of B encodes the marginal probabilities of the jth object having identity 1 to N – a marginal probability distribution p(xj ). Since each column is a probability distribution, its elements add up to one. Each row of B represents how probability mass of a particular identity is distributed over different physical objects, and all the elements of each row also add up to one. Definition 7. A doubly-stochastic matrix is a N × N non-negative matrix, whose rows and columns sum to one. The doubly-stochasticity of B guarantees that the marginal probabilities are legal – there exists a joint identity distribution p(X), whose marginal probabilities are B. It also provides the consistency of information. Let us consider a simple example to understand this. Suppose the marginal belief matrix representation is used to track Alice and Bob in a WSN. The doubly-stochasticity of the marginal belief matrix implies the following two statements are true. 1. If a person is more likely to be Alice, then the person is less likely to be Bob. 2. If a person is more likely to be Alice, then the other person is less likely to be Alice. The second statement has special meaning in a WSN – if a sensor measures local evidence on an object, this local information could globally affect probabilities of all the other objects. This observation defines a key problem on how to maintain a marginal belief matrix B only given local evidence, which we will discuss in detail in section 4.3.

CHAPTER 4. MARGINAL BELIEF MATRIX

4.1.2

54

Relation between B and p(X)

The marginal belief matrix B can be computed from the joint identity distribution p(X) by summing over permutations as follows. B=

X

p(X = Πi )Πi

(4.2)

i

where Πi is the ith N × N permutation matrix. The following is known about the above relation [13, 14]. Fact 1. The right side of Equation 4.2 is a Fourier transform of p(X) at a permutation matrix representation. The above fact will provide a key to define a mixing operation in the marginal belief matrix representation.

4.2

Mixing

We know that a mixing event between the ith and jth objects for the joint identity distribution p(X) is defined as a convolution with a mixing distribution m(x), which is given as follows.   1−α X =I    m(X) = α X = I(i,j)    0 otherwise

Since the marginal belief matrix is a Fourier transform of p(X), the mixing operation for the marginal belief matrix representation can be easily derived from the well-known Convolution theorem in the Fourier analysis of functions on the symmetric group [13], which we state here without proof. Convolution Theorem. Let p(X) and m(X) be two functions on the symmetric group SN and F(·) denote a Fourier transform of a function on SN . Then, the following is true. F{p(X) ? m(X)} = F{p(X)}F{m(X)}

(4.3)

55

CHAPTER 4. MARGINAL BELIEF MATRIX

From the Convolution theorem and Fact 1, the mixing rule for the marginal belief matrix representation is a simple matrix multiplication. Theorem 1. Let Bbefore and Bafter be the marginal belief matrices before and after a mixing event between the ith and jth objects. Then, the relation of the two matrices is given as follows. Bafter = Bbefore M

(4.4)

where M is a mixing matrix given in Definition 8. Definition 8. The mixing matrix M is a N × N doubly-stochastic matrix defined as follows. M = F{m(X)} = αI + (1 − α)I(i,j) , where I is an N × N identity matrix and I(i,j) is its (i, j) transposition. Let us illustrate how a marginal belief matrix is updated using a simple example in Figure 4.1, where three objects are moving in a WSN and two mixing events occur. The first mixing occurs between object 1 and object 3 and the second mixing occurs between object 2 and object 3.1 Note that only two columns are updated after a mixing and we thus can further simplify the mixing operation as follows. biafter = (1 − α)bibefore + α bjbefore

(4.5)

bjafter = α bibefore + (1 − α)bjbefore where bi is the ith column of B. Equation 4.5 implies that the mixing operation for the marginal belief matrix representation is a local average of two columns and has a linear computational complexity O(N ) instead of O(N 3 ) from a matrix multiplication. Lemma 2. The mixing between object i and j for the marginal belief matrix representation simply updates the two marginal distributions p(xi ) and p(xj ) as their local averages given in (4.5). 1

The object number is arbitrary – it does not mean that we know the identities of objects.

56

CHAPTER 4. MARGINAL BELIEF MATRIX

Mixing (1,2)

B=

1.0

0.0

0.0

0.0

1.0

0.0

0.0

0.0

1.0

M=

B=

0.5

0.5

0.0

0.5

0.5

0.0

0.0

0.0

1.0

Mixing (1,3)

0.5

0.5

0.0

0.5

0.5

0.0

0.0

0.0

1.0

M=

B=

0.25

0.5

0.25

0.25

0.5

0.25

0.50

0.0

0.50

0.5

0.0

0.5

0.0

1.0

0.0

0.5

0.0

0.5

Figure 4.1: An example of how a marginal belief matrix B is updated when three objects are moving in a WSN. Both mixing events have mixing probabilities of 0.5. Note that only two columns of B are updated after each mixing.

CHAPTER 4. MARGINAL BELIEF MATRIX

4.3 4.3.1

57

Local Evidence Ideal Solution

Before we discuss a local evidence incorporation operation for the marginal belief matrix representation, we first study the properties of the ideal solution by comparing Bb ef ore and Bafter , which are the marginal belief matrix before and after the ideal Bayesian update, respectively. Bbefore =

X

pbefore (X)X

Bafter

X

pafter (X)X ,

X

=

X

where pafter (X) is the posterior joint identity distribution after the optimal Bayesian update as follows pafter (X) ∝ pbefore (X)p(Z = (i, j)|X) Property 1. In the Bayesian solution, zero elements in the marginal belief matrix B stay zero after a normalization. Proof. Let T be a set of index pairs (i, j), where bij = 0, and π(i, j) be a set of all N × N permutation matrices, whose (i, j)th element is one. The marginal probability bij = p(xj = i) is the sum of the joint probabilities p(X = Πk ), where Πk ∈ π(i, j). Therefore, bij = 0 ∈ Bbefore for (i, j) ∈ T means that p(X) = 0 for X ∈ π(i, j). Let pafter (X) be the joint identity distributed after the normalization by local evidence Z = (i, j). Since the optimal Bayesian update is a simple point-wise multiplication, pafter (X) will be zero for the same set of permutation matrices X ∈ π(i, j). Therefore, bij ∈ Bafter for (i, j) ∈ T is zero after the normalization. The following property further assumes that local evidence provides perfect information, i.e., Z = (i, j) indeed means the jth object has the ith identity with probability one. Property 2. Under Bayesian normalization, given perfect local evidence Z = (i, j), all the columns that have zero at the ith element and all the rows that have zero at the jth remain unchanged after the normalization.

58

CHAPTER 4. MARGINAL BELIEF MATRIX

x1

x1

1

x2

x3

x4

0.50 0.25 0.125 0.125 B= M2

x2

0.25 0.25 0.25

0.25

0.25 0.25 0.25

0.25

0.00 0.25 0.375 0.375 2

M1

M4 M3

3

Local evidence: Z=(4,4) M5 x3 Mi x4

4

or

probabilistic switch 1/2

1/2

Figure 4.2: Proof idea for Property 2 Proof. We only prove the case for columns, since the case for rows follows from the duality. Figure 4.2 illustrates the main idea for the proof. Lines represent paths of objects and initial identities are given as integers from 1 to 4. There are five mixing events {M1 , · · · , M5 }, which act as probabilistic switches – each becomes a X-shape connector with probability 1/2 and a =-shape connector with probability 1/2. Therefore, one can interpret the diagram as a graph with four sources (identities) and sinks (objects), where probability mass flows from the sources to the sinks and split at Mi ’s according to their mixing probabilities, which is 1/2 in this case. The marginal belief matrix after these mixing events is shown on the right side and b ij represent how much probability mass flows from identity i to object j. The last element of the first column is zero, which means that there is no path from identity 4 to object x1 regardless of states of these switches. Given local evidence Z = (4, 4), we update the fourth column of B to be [0 0 0 1]T . This evidence fixes the possible states of switches (M3 , M4 , M5 ) enclosed by a solid line, but does not affect M1 and M2 . Since M1 and M2 are not affected by the evidence Z = (4, 4), the first column will not be updated. This concludes the example. The same reasoning works in general. The second property maintains that not all the columns are affected by a local evidence and thus will be exploited to reduce the cost of communication in distributed implementation discussed later. These two properties effectively reduces the number of elements of the marginal

59

CHAPTER 4. MARGINAL BELIEF MATRIX

belief matrix given a local evidence and can be incorporated in the optimization formulation discussed in the following section.

4.3.2

Local Evidence Incorporation as Optimization

Let us first consider a simple case for N = 3 to understand how a local evidence can be used to update the marginal belief matrix B. Suppose the current marginal belief matrix B is given as follows. 

.6 .2 .2



   B= .3 .6 .1   .1 .2 .7

A sensor observes that the first object (first column) is more likely to have identity 1, that is, Z = (1, 1). The sensor updates the first object’s marginal distribution to [0.96 0.03 0.01]T using the following equation. p(x1 |Z = (1, 1)) ∝ p(x1 )p(Z = (1, 1)|X) , where the likelihood p(Z = (1, 1)|X) is defined as follows.  0.9 x1 = 1 p(Z = (1, 1)|X) = 0.05 x 6= 1 1

After the first column is updated, the marginal belief matrix is no longer doubly-

stochastic and we call this matrix Bp a perturbed marginal belief matrix.   .96 .2 .2    Bp =  .03 .6 .1   .01 .2 .7 To restore the double-stochasticity and, thus, the consistency, we need to update the second and third columns of Bp . Since we have 6 variables and 5 equations, there are infinitely many solutions. Let us  .96  Be =  .03 .01

consider the following two solutions.    .02 .02 .96 .02 .02    .03 .03 .94 or .94 .03    .04 .95 .01 .95 .04

60

CHAPTER 4. MARGINAL BELIEF MATRIX

Although both matrices are valid solutions, the left one seems to be a better solution, since it is more consistent with the original marginal belief matrix. Exploiting the above observation, we formulate local evidence incorporation as an optimization problem, where the goal is to find a new doubly-stochastic matrix B e that satisfies new local evidence, and is close to the original matrix. We will consider the Frobenius norm and the KL-divergence as possible measures of closeness. The optimization problem can be written as follows. argminBe

dist(Be , Bp )

(4.6)

subject to Be 1 = 1 BeT 1 = 1 Be ≥ 0 where 1 is an N × 1 vector of all 1’s, Bp is a perturbed marginal belief matrix, and Be ≥ 0 indicates that all the elements of Be are non-negative. Bp is the same as the original marginal belief matrix B except for the jth column, which is computed as follows given a local evidence Z = (i, j).  c B(k, j) β Bp (k, j) = c B(k, j) 1−β

N −1

where c = 1/

P

k

k=i otherwise ,

B(k, j)p(Z = (i, j)|X) is a normalization constant, and β is a

probability of observing Z = (i, j) when the jth object has indeed the ith identity. In the following two sections, we discuss in detail how to solve the above optimization problem with the two different distance measure – Frobenius norm and KL-divergence.

4.3.3

Solution 1: Quadratic Programming

Let us first introduce the definition of the Frobenius norm. Definition 9. Let A be an N × N matrix and aij be the (i, j)th element of A. The

CHAPTER 4. MARGINAL BELIEF MATRIX

61

Frobenius norm || · ||F of A is defined as follows. v u N N uX X ||A||F = t a2ij , i=1 j=1

We now define the first distance measure dist(Be , Bp ) in the optimization problem (4.6) as follows. dist(Be , Bp ) = ||Be − Bp ||2F Given the above distance measure, the optimization problem (4.6) can be rewritten as follows. − → − → − → − → → (Be − Bp )T (Be − Bp ) argmin− Be − → ABe = 1 − → I Be ≥ 0 ,

(4.7)

− → − → where Be and Bp are N 2 × 1 vectors obtained by concatenating all the columns of Be and Bp , I is the N 2 × N 2 identity matrix, 1 is a 2N × 1 vector of all one’s, 0 is a N 2 × 1 zero vector, and A is a N 2 × 2N matrix, whose the (i, j)th element is given as follows.

A(i, j) =

     for 0 < i ≤ N   

     for N < i ≤ 2N  

 1 N · (i − 1) + 1 ≤ j ≤ N · i 0 otherwise  1 j = (k − 1)N + (i − N ) , k = {1, · · · , N } 0 otherwise

The above formulation based on the Frobenius norm turns out to be the convex quadratic programming [18, 22], where there always exists a unique solution. Furthermore, the above problem is an instance of the separable quadratic transportation problem, where an efficient algorithm, whose running time is linearly proportional to the input size, is known [35]. Unfortunately, the Frobenius norm does not capture closeness in probabilities well. In fact, the accuracy of the solution compared with that of the ideal Bayesian approach shows that the Frobenius norm is not a good distance measure to compare the doubly-stochastic matrices, as will be seen in the next section.

62

CHAPTER 4. MARGINAL BELIEF MATRIX

4.3.4

Solution 2: The Sinkhorn Algorithm

We first define the KL-divergence [9] between the two matrices. Definition 10. The KL-divergence between two non-negative m × n matrices A and B is defined as follows. D(B||A) =

m n X X

bij log

j=1 i=1

bij aij

Based on the above definition, we now introduce another distance measure for the optimization problem 4.6 as follows. dist(Be , Bp ) = D(Be ||Bp ) In a recent work [1], it was shown that the following iterative algorithm, known as the Sinkhorn algorithm, always minimizes the aforementioned objective function and its convergent matrix is the solution of the optimization problem 4.6 with the above distance measure. Sinkhorn Algorithm. Given a non-negative m × n matrix A and specified vectors for the row sums (r ∈ Rm ) and column sums (c ∈ Rn ), we iterate the following until (0)

convergence, with initial value aij = aij , k = 1: 1. [Row normalization] (k−1)

Multiply every element element aij P (k−1) the actual row sum nj=1 aij (k−) aij

by the ratio of the desired row sum ri to

(k−1)

ri aij = Pn (k−1) j=1 aij

2. [Column normalization] (k−)

Multiply every element element aij

from step 1 by the ratio of the desired Pm (k−) column sum cj to the actual row sum i=1 aij (k−)

cj aij (k) aij = Pm (k−) i=1 aij

63

CHAPTER 4. MARGINAL BELIEF MATRIX

10 10 X 10 100 X 100 1000 X 1000

5

0

−5

log|B(n) − B(n−1)|F

−10

−15

−20

−25

−30

−35

−40

0

10

20

30

40 50 60 Number of Iterations

70

80

90

100

Figure 4.3: Convergence of the Sinkhorn algorithm with three different matrix sizes: 10 × 10 100 × 100 1000 × 1000. Each matrix is randomly generated. For a practical error ² = 10−5 , 5 or 6 iterations seem to be enough regardless of the size of the matrices. It is known that the Sinkhorn algorithm converges uniquely [43]. In [19], they show that each step in the Sinkhorn algorithm is a contraction map in the Hilbert projective metric and show that the number of iterations is bounded by O(L(A)·1/²), where L(A) is the binary input length – the log of the ratio of the largest to smallest non-zero elements of A and ² is the desired accuracy in some metric. In other words, the number of iterations is not a function of the matrix size. Our simulation results in Figure 4.3 also confirm the result – about ten iterations are enough for three different size matrices 10 × 10, 100 × 100 and 1000 × 1000 given a practical error margin ². When compared to the true marginal distribution obtained from the joint identity

64

CHAPTER 4. MARGINAL BELIEF MATRIX

2.5 KL−divergence Frobenius norm

2

Error

1.5

1

0.5

0

3

3.5

4

4.5 Number of objects

5

5.5

6

Figure 4.4: Performance comparison of the two distance measures for local evidence incorporation in the marginal belief matrix representation distribution with the optimal Bayesian evidence incorporation, the Sinkhorn algorithm outperforms the quadratic programming formulation as shown in Figure 4.4 by a huge margin. In the figure, x-axis represents the number of objects and y-axis represents an error between the solution of either optimization formulation and the ideal solution from the Bayesian update.

65

CHAPTER 4. MARGINAL BELIEF MATRIX

sensor node:

active node:

sensing range:

communication range:

communication link:

Figure 4.5: A wireless sensor network with communication links and sensing range: Each sensor can sense locations and identities of objects within its sensing range and exchange messages with nodes within its communication range.

4.4 4.4.1

Distributed Implementation Assumptions

In the upcoming discussion on distributed implementation, we will use the following assumptions. A sensor can directly sense the positions of objects within its sensing range. The number of objects N is known in advance and fixed, although this does not have to be the case for a real implementation2 . Figure 4.5 shows an example of a WSN that tracks five moving objects under these assumptions.

4.4.2

Agent-based Architecture

We extend the agent-based architecture introduced in Chapter 2 to the multi-object case. The basic idea is that, only a small number of nodes called agents are active 2

Refer to Chapter 7 for details.

66

CHAPTER 4. MARGINAL BELIEF MATRIX

[0.5 0.5 0] [1 0 0]

T

T

B T

[0.25 0.25 0.5]

[0.5 0.5 0]

T

C [0.25 0.25 0.5]

[0 1 0]

T

T

[0 0 1] agent

node

T

object track

agent path

Figure 4.6: Basic idea for distributed implementation: Each agent carries a column of the belief matrix. and each of them is responsible for maintaining and updating the information about a single object. Definition 11. An agent is a process running on a node that maintains and updates the information about a single object, which includes its estimated position, identity, signal attributes and so on. Agents can hop to neighboring nodes in an attempt to stay close to an object, while carrying all the information with themselves. In this approach, the whereabouts of the information is easily maintained, but at the risk of the reduced robustness and increasing difficulties in querying agents as discussed in Chapter 2. In the context of the marginal belief matrix representation, each column bj of B is maintained by a separate agent as in Figure 4.6. When a mixing occurs, the two corresponding agents communicate each other so as to update their marginal probabilities accordingly. When an agent node observes a local evidence, say Z = (i, j), then the agent needs to talk to some of the other agents, who may also believe that

CHAPTER 4. MARGINAL BELIEF MATRIX

67

what they are tracking can have the same ID. This type of multi-cast communication in network is usually dealt by a group management protocol [49], which maintains and updates the different groups in the network according to a predefined membership. We will discuss in detail the group management protocol in Chapter 7. In our case, the ith group3 is the set of agents that have non-zero probability at the ith entry of their belief vector bj , i.e., those who have a non-zero belief that they are tracking the ith ID. We assume there exists a good (or optimal) group management protocol for our purpose.

4.5

Simulation

We have designed and implemented an event-driven distributed algorithm simulator for WSN applications in Matlab. The distributed version of the marginal belief matrix based method (shown in Figure 4.7) is simulated. We make the following assumptions for the simulation. • Each node can sense the positions of the objects within its sensing range. • Each node can talk to all the nodes within its communication range. • Initially, the number of the objects are known to the initial agents. • Each node has a signal processing module for the signal classification. • The positions of sensor nodes are known – each node knows the positions of all the sensor nodes within its communication range. The initial agents are manually selected for the simulation, although it is possible to detect them as long as their initial positions are well separated. Figure 4.8, 4.9 and 4.10 show three screen shots from the simulation of the algorithm. Four objects - one tank and three dots - are moving along straight lines for 10 seconds. The tank has distinct signal attributes, so it can be identified with high probability when it is sufficiently away from the other objects. This local evidence 3

In Chapter 7, we call this an acquaintance group.

CHAPTER 4. MARGINAL BELIEF MATRIX

68

Figure 4.7: Flow chart of the distributed algorithm that each node runs to implement the marginal belief matrix approach in wireless sensor networks

CHAPTER 4. MARGINAL BELIEF MATRIX

69

about the tank is the only available information used to normalize the belief b j of all the other agents. The four agents are colored differently and their corresponding beliefs are displayed with the corresponding color. Figure 4.8 is the initial configuration of the objects and their associated agents. Figure 4.9 shows that the uncertainty of the marginal distribution bi of each agent increases after four mixing events and the agents are no longer sure of the identities of the objects. Figure 4.10, however, shows that the uncertainties decrease after a normalization using the Sinkhorn algorithm, given local identity evidence. Figure 4.11 shows how the identity uncertainty, measured as a statistical entropy, of each object evolves during the simulation. The increases in the uncertainty are caused by mixing events and the decreases are by two local evidence incorporation events.

4.6

Conclusion

In this chapter, we proposed a practical approximation of the identity management framework, called the marginal belief matrix-based approach. The marginal belief matrix B is a collection of N 2 marginal probabilities of the jth object having the ith identity. Due to the convolution theorem [13], the mixing operation for the marginal belief matrix is shown to be local averages of the two marginal distributions involved in the mixing event. The local evidence incorporation operation is formulated as an optimization problem, whose solution can be computed by an iterative matrix scaling procedure called the Sinkhorn algorithm. We have also proposed an agentbased architecture to efficiently implement the marginal belief matrix representation in a purely distributed fashion. Simulation results have demonstrated the efficiency, accuracy and effectiveness of the proposed approximation of the identity management framework. In the WSN setting, the main weakness of the belief matrix representation lies in its need to continuously re-normalize its marginal distribution after each local evidence incorporation event. Furthermore, since this operation requires extensive communication throughout the WSN, this representation is energy-consuming. This

CHAPTER 4. MARGINAL BELIEF MATRIX

70

Figure 4.8: A snapshot of the simulator: Four objects are moving in a WSN – three of them are depicted as (yellow) dots and one of them is drawn as a tank. The objects follow (gray) straight lines and four agents, denoted as solid squares in four different colors, are assigned to track the four objects. The bar graphs show the marginal distribution (a column of the belief matrix) maintained by the agents. Initially, all the agents know exactly the identities of objects they are tracking.

CHAPTER 4. MARGINAL BELIEF MATRIX

71

Figure 4.9: A snapshot of the simulator after four mixings – the bar graphs on the right show that the marginal probabilities are mixed and the agents are no longer certain about the identities of what they are tracking.

CHAPTER 4. MARGINAL BELIEF MATRIX

72

Figure 4.10: A snapshot of the simulator after local evidence incorporation – two pieces of local evidence on the ‘tank’ object are sensed and all the marginal probabilities are re-normalized using the Sinkhorn algorithm at the cost of group communication. The agents are almost sure about the identities of their objects after the re-normalization.

73

CHAPTER 4. MARGINAL BELIEF MATRIX

2

Entropy

1.5 1 0.5 0 4 3.5 3

10 8

2.5 6

2 4 1.5

2

Object numbers 1

0

Time

Figure 4.11: An example of how uncertainties of the marginal distributions are affected by the two events during the simulation run. Object numbers are the same as the track numbers in the previous figure. It is apparent that mixing events increase uncertainties, while local evidence decreases them.

CHAPTER 4. MARGINAL BELIEF MATRIX

74

observation led us to consider an alternative approach that does not require continuous normalization given local evidence – the information matrix-based approach, which we will discuss in the next chapter.

Chapter 5 Information Matrix In the previous chapter, we have proposed a practical approximation of the optimal identity management based on the marginal belief matrix. The main disadvantage of this approach is that the proposed normalization requires significant communication among nodes in the WSN in the vicinity of the objects, thus draining the energy of these nodes. To save energy and prevent wasteful renormalization, we might prefer a method that computes probabilities only at a user’s request. In this chapter, we seek to maintain different quantities, which ultimately can be converted to probabilities at a user’s request, but which do not require continuous normalization otherwise. In other words, we seek to accumulate information in a lazy fashion, from which the desired probabilities can be derived on demand. This is the crux of the identity management approach in this chapter.

5.1

Information Matrix

The new approximation we are proposing here is based on the idea of information filtering in [33, 25]. When sensor measurements are represented as log-likelihoods log(p(Z i |X)), incorporating them into the known prior distribution p(X) can be done by simple addition as follows. log(p(X|Z)) ∝ log(p(X))

X i

75

log(p(Z i |X)) ,

CHAPTER 5. INFORMATION MATRIX

76

which can be easily derived from the Bayesian update rule and the conditional independence assumption. The equation needs to be normalized only when inference on the posterior p(X|Z) is required, as opposed to the normal Bayesian update where the posterior is constantly normalized regardless of inference requests. To apply the idea of information filtering to identity management, we define an N × N matrix called the information matrix, whose (i, j)th element lij is a loglikelihood of the jth object having the ith identities. Intuitively, one can consider lij as a piece of information suggesting the jth object having the ith identity and the strength of information can only be estimated by comparing with other log-likelihoods. Definition 12. Information matrix L is an an N × N matrix, whose (i, j)th element is defined as follows. lij =

X

log(p(Zk = (l, j)|xj = i)), l ∈ {1, · · · , N }

k

Since the elements of the information matrix L are log-likelihoods, it can be maintained simply by adding log(β) to the (i, j)th element and log( N1−β ) to all the other −1 elements in jth column given a measurement Z = (i, j). L is initialized as a zero matrix. A striking property of the information matrix L, which is just a collection of N 2 log-likelihoods, is that N ! joint likelihoods l(X = Πk ) can be recovered from it. For example, suppose L and X = Πk are  l11 l12  L=  l21 l22 l31 l32

given as follows,    0 1 0 l13     1 0 0 . Π = l23  k    l33 0 0 1

Then the following is true1 log(l(X = Πk )) = log(p(Z = (2, 1)|Πk ) + log(p(Z = (1, 2)|Πk ) + log(p(Z = (3, 3)|Πk ) = l22 + l12 + l33 1

We assume measurements are conditionally independent, i.e., p(Z1 , Z2 |X) = p(Z1 |X)p(Z2 |X).

77

CHAPTER 5. INFORMATION MATRIX

Using matrix algebra, we can simplify the above equation for joint likelihoods as follows l(X = Πk ) = exp(Tr(ΠTk L))

(5.1)

where Πk is the kth permutation matrix and Tr(·) is the matrix trace operation – sum of the diagonal elements. If a prior distribution is uniform, then the joint identity distribution p(X) is defined simply by the normalized joint likelihoods. exp(Tr(ΠTk L)) =P T i li l exp(Tr(Πl L))

lk p(X = Πk ) = P

(5.2)

Another interesting property of the information matrix is that there can be infinitely many information matrices that encode the same joint identity distribution. Property 3. Adding a constant to any row or column of an information matrix does not affect the underlying joint identity distribution. Proof. Suppose C is a matrix, whose ith column (or row) is c and all the other elements are zero. Then, lk = exp(Tr(ΠTk (C + L))) = exp(Tr(ΠTk C)) exp(Tr(ΠTk L)) = exp(c) exp(Tr(ΠTk L)) The joint identity distribution p(X) do not change when all lk are scaled by the same factor exp(c) – as they will be normalized anyway. This concludes the proof. Due to the above property, we can simplify lij as follows. lij = nij (log(β) − log(

1−β )) N −1

where nij is the number of Z = (i, j) measurements thus far – the counts of Z = (i, j). This gives another interpretation of the information matrix – a collection of evidence counts.

78

CHAPTER 5. INFORMATION MATRIX

5.2

Local Evidence for Information Matrix

As we have seen before, incorporating local evidence Z = (i, j) into an information matrix is easy – we just add log(β) − log( N1−β ) to the (i, j)th element of L without −1 re-normalizing.

5.3

Mixing for Information Matrix

Mixing is not simple in the information matrix representation. In fact, mixing is not well-defined for the information matrix, since the convolution operation on the joint identity distribution introduce non-zero probabilities. Therefore, no matter how you change the information matrix, the information matrix alone cannot represent the mixing in a Bayesian framework. When the prior is uniform, however, the information matrix has all the information about the joint identity distribution and it might be possible to define a mixing operation for the information matrix. Let Lp and Lq be information matrices before and after the the mixing event between the ith and jth objects, and p(X) and q(X) = (1 − α)p(X) + αp(X(i,j) ) are the corresponding joint identity distributions obtained from Equation 5.2. We will use the mixing probability α =

1 2

in the sequel for simplicity, so after mixing

q(Πk ) = q(Πk I(i,j) ). ¤ exp(Tr(ΠTk Lq )) 1£ P p(Π ) + p(Π I ) = k k (i,j) T 2 l exp(Tr(Πl Lq ))

(5.3)

where the right side of the equation is constant and the only unknown is Lq . ¡ ¢ We need to solve the following system of N2 ! equations given below to take into

account the normalization constraint.

p(Πm ) + p(Πm I(i,j) ) exp(Tr(ΠTm Lq )) q(Πm ) = = , T q(Πn ) exp(Tr(Πn Lq )) p(Πn ) + p(Πn I(i,j) ) where m 6= n and m, n ∈ {1, · · · , N !}. If we further simplify the above equation by taking logs on both sides, we get Tr((Πm − Πn )Lq ) = log

µ

p(Πm ) + p(Πm I(i,j) ) p(Πn ) + p(Πn I(i,j) )



.

79

CHAPTER 5. INFORMATION MATRIX

The left side of the above equation is just a linear combination of elements of Lq and the right side is a constant. Therefore, we can write a matrix equation with proper vectorization

where Φ is a

→ − − Φl =→ η , ¡N ! ¢ 2

¡ ¢ → − − × N 2 matrix, l is a N 2 × 1 vector, and → η is a N2 ! × 1 vector.

However, there is no exact solution in general for this overdetermined set of equations.

The least square solution can be used as an approximate solution, although it is not practical due to the prohibitive amount of computation – the pseudo inverse of a ¡N ! ¢ × N 2 matrix must be computed. 2 The above discussion suggests that we need more constraints in order to facilitate

the derivation of the information matrix after mixing. To this end, we make the following two assumptions on the information matrix after the mixing event between the ith and jth objects. • Only the ith and jth columns are updated. • The ith and jth columns will be identical after mixing. The first assumption seems intuitive – why change the other columns when mixing involves only two objects? Before discussing the second assumption, we consider another property of the information matrix. Property 4. For an information matrix L, whose ith and jth columns are equal, its joint likelihood l(Πk ) and l(Πk I(i,j) ) are also equal. Proof. The log-likelihood of Πk is the sum of lmn entries at positions where there are 1’s in Πk . Since Πk and Πk I(i,j) are the same except that their ith and jth columns have been swapped, the difference of the two log-likelihoods is given as follows, log(l(Πk )) − log(l(Πk I(i,j) )) = lmi + lnj − lmj − lni = 0, since the ith and jth columns of L are the same.

CHAPTER 5. INFORMATION MATRIX

80

Considering the above property, the second assumption suggest that we consider information matrices, whose joint likelihood l(Πk ) and l(Πk I(i,j) ) are identical, as they are after a convolution. Under these two assumptions, the number of the unknowns in the updated information matrix after mixing is only N . Let us consider the simple case of N = 3 to see how these assumptions can simplify the computation of L after a mixing. L∗p and L∗q ∗ = exp(lij )) and their elements are exponential versions of Lp and Lq respectively. (lij

are given as follows. (Note that only the di ’s are unknowns):     d 1 d 1 c1 a1 b 1 c 1    ∗  ∗    L p =  a2 b 2 c 2  , L q =  d 2 d 2 c 2  . d 3 d 3 c3 a3 b 3 c 3 From Equation (5.3), the following hold, assuming that the mixing is between the ith and jth columns.2 d1 d2 c3 = (a1 b2 c3 + a2 b1 c3 )/2 , d1 d3 c2 = (a1 b3 c2 + a3 b1 c2 )/2 , d2 d3 c1 = (a2 b3 c1 + a3 b2 c1 )/2 . Taking logs on the both sides, we get log(d1 ) + log(d2 ) = log((a1 b2 + a2 b1 )/2) , log(d1 ) + log(d3 ) = log((a1 b3 + a3 b1 )/2) , log(d2 ) + log(d3 ) = log((a2 b3 + a3 b2 )/2) . The above set of equations always has a unique solution [d1 d2 d3 ]T , so we now have a perfect local mixing rule for N ≤ 3 that involves only the ith and jth columns of the information matrix. Let us extend the above solution to the general case N > 3. Suppose that the ith and jth columns of L∗p are [a1 · · · aN ]T and [b1 · · · bN ]T and d = [d1 · · · dN ]T is the 2

These set of equations satisfy the normalization constraint on equation (5.3).

81

CHAPTER 5. INFORMATION MATRIX

new merged column of L∗q for both i and j. Now consider the following equation for ¡ ¢ merging a and b for all N2 pairs of (m, n) combinations. log(dm ) + log(dn ) = log((am bn + bm an )/2) .

The above equation can be rewritten as a matrix equation as follows. P·η =γ,

(5.4)

where • η = [log(d1 ) · · · log(dN )]T , is a N × 1 vector ¡ ¢ • γ = [· · · log((am · bn + bm · an )/2) · · · ]T , is a N2 × 1 vector ¡ ¢ • P is a N2 × N matrix where each row has two ones at the mth and nth positions and zeros elsewhere.

In general, the system (5.4) is an overdetermined set of equations and does not have a solution. Therefore, we propose to use a least-square approach to obtain an approximate solution, which will be our mixing rule for the information matrix: η = P† γ ,

(5.5)

where P † = (P T P )−1 P T is the pseudo-inverse of P . Thus, the computational complexity of the mixing operation for the information matrix representation is O(N 4 ). Theorem 2. The information matrix approach is optimal for N ≤ 3. For N ≥ 4, the information matrix approach is sub-optimal due to the approximate mixing (5.5).

5.4

Inference for Information Matrix

For tracking applications, we are mostly interested in the marginal probabilities p(xj = i) of the jth object having the ith identity. To compute marginal probabilities from an information matrix L, N ! joint probabilities need to be computed first. Computing joint probabilities, however, requires O(N 3 N !) operations, which is not feasible for larger N . Therefore, we used the Metropolis sampling algorithm [15] as a heuristic to estimate these marginal probabilities. Simulation results in Chapter 6 confirm that the Metropolis algorithm approximates the marginal probabilities well.

82

CHAPTER 5. INFORMATION MATRIX

5.4.1

Metropolis Algorithm

The Metropolis algorithm [15, 37] is a procedure for drawing samples from a finite set Ω according to a probability distribution p(ω), ω ∈ Ω. In our case, the finite set is SN and its element is a permutation X. Metropolis Algorithm. One starts from an arbitrary state X and generates the sequence by repeating the following iteration, with Xk being the previously selected point at each iteration: 1. Select a new trial state X ∗ , chosen according to a symmetric proposal distribution q(X ∗ |Xk ) 2. Calculate the acceptance probability p˜(X ∗ ) A(X , Xk ) = min 1, p˜(Xk ) ∗

·

¸

3. Accept X ∗ with probability A(X ∗ , Xk ), i.e. • if p˜(X ∗ ) ≥ p˜(Xk ), then accept X ∗ ; • if p˜(X ∗ ) < p˜(Xk ), then accept X ∗ with probability p˜(X ∗ )/˜ p(Xk ) 4. If the trial state is accepted, then Xk+1 = X ∗ . Otherwise, Xk+1 = Xk . Go to step 1. We need to consider the following in applying the Metropolis algorithm in our setting. • In the above description of the Metropolis algorithm, the probability distribution over SN is denoted as p˜(X) to emphasize that the Metropolis algorithm only requires the ratio of the probabilities. Therefore, we can use a joint likelihood l(X) from an information matrix L as p˜(X) in the procedure. • The Metropolis algorithm assumes a topology defined on the finite set. Although it is possible to assume the full connectivity among N ! permutations, it seems natural to assume a pair of permutations (X, Y ) are connected only when

CHAPTER 5. INFORMATION MATRIX

83

they are a single transposition-apart from each other, i.e., there exist (i, j) such that X = Y I(i,j) . This topology defines a polyhedron, called the associahedron, whose vertices are permutations as shown in Figure 5.1. • Given an associahedron, we need to define a proposal distribution q(X ∗ |Xk ) that proposes one of the neighbors given a current state Xk . We use the uniform distribution as our proposal distribution  ¡ ¢ 1/ N X ∗ ∈ neighbor(Xk ) 2 ∗ q(X |Xk ) = 0 otherwise .

We use the Metropolis algorithm to sample permutations with high probabilities, from which we compute the marginal probabilities. Suppose the set of sampled permutations is Γ and each permutation X ∈ Γ are sampled nX times. We can then compute the marginal probability matrix P as follows. X nX P ≈ X. |Γ| X∈Γ Our simulation results in Chapter 6 show that the above approximation is good enough with a small sample size in practice, especially when the underlying distribution is sharply peaked around a permutation. This agrees with the known result [15] that only N log N samples are enough to represent a distribution with a very sharp peak centered at a permutation.

5.5

Conclusion

In this chapter, we proposed another practical approximation of the identity management framework, called the information matrix-based approach. The information matrix L is a collection of N 2 log-likelihoods of the jth object having the ith identity, from which the marginal probabilities can be recovered at user’s requests. Unlike the marginal belief matrix representation, more general inferences can be made including the joint identity probabilities. The local evidence incorporation operation is a simple addition and does not require a global communication as in the marginal

84

CHAPTER 5. INFORMATION MATRIX

0

0

1

0

1

0

1

0

0

0

1

0

0

0

1

0

0

1

1

0

0

1

0

0

0

1

0

1

0

0

0

1

0

0

0

1

1

0

0

0

1

0

0

0

1

1

0

0

0

1

0

0

0

1

Figure 5.1: Metropolis random walk on an associahedron for N = 3

CHAPTER 5. INFORMATION MATRIX

85

belief matrix-based approach. For the mixing operation, we propose to use a leastsquare approximation of the system of linear equations obtained from the convolution operation. The flexibility of the information matrix-based approach in dealing with local evidence comes at an increased cost of performing the inference necessary to retrieve the approximate marginal probabilities. However, we overcome this added computational cost by performing approximate inference using the Metropolis sampling method. Thus, inference costs using the information matrix representation are significantly reduced. The performance comparison of the two approximations including the simulation results will be presented in the next chapter.

Chapter 6 Comparison of the Two Approximations In this chapter, we compare and summarize the two proposed approximations in terms of computational complexity, inference accuracy and experimental results. Table 6.1 summarizes the results.

6.1

Storage and Computation

In terms of representation, both approaches require O(N 2 ) storage since they use N × N matrices as their data structures. For mixing operations, the belief matrix approach requires O(N ) computation, while the information matrix approach requires O(N 4 ). For incorporating local evidence, the information matrix approach requires O(1) computation, while the belief matrix approach requires O(N 2 ) computations in practice. One can see there is a tradeoff between the two approaches in terms of computational complexity – the belief matrix has a much simpler mixing operation, while the information matrix has a simpler evidence incorporation operation. Note that the aforementioned computational complexities do not represent the realistic costs of these operation in an actual WSN implementation due to the additional communication cost incurred by the local evidence incorporation operation of the belief matrix approach. 86

Table 6.1: Comparison of the two approximations of joint identity distribution p(X)

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

Information contents Mixing operation Evidence operation Mixing complexity Evidence incorporation complexity Marginal inference complexity Feasibility of joint inference Evidence requires communication

Belief matrix B Information matrix L 2 N marginal probabilities N marginal log-likelihoods Local average of two columns (exact) Pseudo inverse (approximate) Sinkhorn algorithm (approximate) Log-likelihood addition (exact) O(N ) O(N 4 ) O(N 2 ) O(1) O(1) O(N 3 N !) (exact) No Yes Yes No 2

87

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

6.2 6.2.1

88

Experiments Simulation

Figures 6.1, 6.2, 6.3 and 6.4 summarize simulation results where we compare three different approaches – using the marginal belief matrix, the information matrix with exact inference and the information matrix with approximate inference. Each representation is used for the tracking and identity management of objects as mixing and local evidence incorporation events take place. In each of our simulations, we process fifty events where the ratio of the number of mixing to evidence incorporation events is fixed. We then record the difference between the inferred marginal distribution of identities to objects from our approximation method to the true marginal probability distribution summed out over the true joint (N !)-size distribution. Figure 6.1 shows simulations for a system that manages the identities of three objects. The x-axis represents the ratio of mixing to local incorporation events, and each data point corresponds to the average difference of the true marginal probabilities to the inferred marginal probabilities over one hundred random simulations. Figures 6.2,6.3 and 6.4 shows the results when our system manages four, five and six identities and objects, respectively. Observing the results of our simulations, the information matrix with exact inference and the belief matrix perform comparably, meaning their relative errors are approximately equal when compared to the true marginal probabilities. The information matrix using the Metropolis sampling algorithm performs worse, but only by a little. Both of these observations are consistent when we increase the number of identities and objects tracked. However, it is important to note that the information matrix approach with approximate inference performs much better when the ratio of the number of mixing to local incorporation events is small. This is true and consistent with the well known result in the Markov Chain Monte Carlo community [15] that a sharply peaked distribution can be accurately represented using N log N samples for a probability distribution over SN . Thus, when the ratio of evidence incorporation events to mixing events is high, our joint distribution over S N is likely to be sharply peaked, which explains why our sampling method performs better with a

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

89

3 objects Belief matrix Information matrix Information matrix with Metropolis

0.35

Error in Frobenius norm

0.3 0.25 0.2 0.15 0.1 0.05 0 0.1

0.2

0.3

0.4 0.5 0.6 0.7 Ratio: # mixings / # evidences

0.8

0.9

Figure 6.1: Comparison of the three approaches for N = 3: Marginal belief matrix, Information matrix with exact inference and Information matrix with approximate inference

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

90

4 objects Belief matrix Information matrix Information matrix with Metropolis

Error in Frobenius norm

0.5

0.4

0.3

0.2

0.1

0 0.1

0.2

0.3

0.4 0.5 0.6 0.7 Ratio: # mixings / # evidences

0.8

0.9

Figure 6.2: Comparison for N = 4

low ratio of mixing to evidence incorporation events. However, as the graphs indicate, as the number of mixing to evidence incorporation events approaches 1, the Metropolis sampling for inference begins to perform better. This appears counter-intuitive but in actuality is not. As the ratio of the number of mixing to evidence incorporation events approaches 1, the underlying joint distribution approaches the uniform distribution since uncertainty increases after each mixing event. Since our proposal distribution while sampling is the uniform distribution, it is not surprising that as the joint distribution we are approximating approaches the uniform distribution, the difference in marginals decreases.

6.2.2

Experimental Setup

We tested the two different representations discussed in the previous sections using a people tracking system. We will discuss the experimental setup in the Stanford Artificial Intelligence Lab and the experiments conducted.

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

91

5 objects Belief matrix Information matrix Information matrix with Metropolis

0.8

Error in Frobenius norm

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1

0.2

0.3

0.4 0.5 0.6 0.7 Ratio: # mixings / # evidences

0.8

0.9

Figure 6.3: Comparison for N = 5

Two SICK laser range finders are mounted in the Stanford AI Lab. The laser range finders return range measurements over a 180 degree field of view. We cluster adjacent range readings with similar ranges into groups. We then perform background filtering to detect motion. Once motion is detected, the system continues to track the motion of the group. Particle filtering is used to track motion under uncertainty such as when the object being tracked is occluded. The output of the system is a sequence of timestamped Cartesian points representing the trajectory of the tracked object. Furthermore, we have augmented our people tracking system with a radio frequency identification (RFID) system. The RFID system has eight readers that detect the presence of unique tags for identity information. The readers, when activated, send out radio messages to detect the presence of tags within its radio range. Four readers are mounted in the hallways to detect all identities entering and exiting the lab. The remaining four readers are mounted in the lab each activated by motion sensors. Figure 6.5 is a map of the lab area annotated with where the laser range finders and RFID readers are.

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

92

6 objects Belief matrix Information matrix Information matrix with Metropolis

Error in Frobenius norm

1

0.8

0.6

0.4

0.2

0 0.1

0.2

0.3

0.4 0.5 0.6 0.7 Ratio: # mixings / # evidences

0.8

0.9

Figure 6.4: Comparison for N = 6

6.2.3

Experimental Results

Figure 6.7 shows two typical errors in our tracking system – i) split tracks: A single track can be split into two (or more) due to occlusion or mixing events and ii) merged tracks: Two (or more) tracks can be merged into one due to wrong data association. These errors are common in all tracking systems causing erroneous data associations and our goal is to demonstrate the power of identity management to fix these errors. In our experiments we had three individuals, starting from different hallways connected to the lab, walk into the lab, interact with one another, and leave the lab. Each person was carrying an RFID tag that was used to trigger a nearby RFID reader. Figure 6.8 shows three ground-truth tracks. Figure 6.9 shows results after applying the identity management to the data from the scenario in Figure 6.8. The two graphs on the left show how the uncertainties of track identities measured as a statistical entropy evolve through the mixing and

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

Laser sensor

RFID reader

93

Laser sensing range

Figure 6.5: Experimental setup in the Stanford AI lab local evidence incorporation events. The x, y and z axes represent tracks, events and uncertainty in track identities, respectively. The events axis has a sequence of events [Initial M E E E M M E], where M stands for mixing and E stands for evidence. The two matrices on the right are the marginal probabilities of the two approaches after all the events, where the ith , jth entry of each matrix represents the probability that object j has identity i. As the results show, we observe that the information matrix estimates the true probabilities more accurately than the belief matrix. This is expected since the information matrix exactly represents the marginals for N ≤ 3, as we proved earlier.

6.3

Conclusion

From the analysis and experiments, we come to the conclusion that there is a trade-off when choosing between the two proposed methods for approximating an exponentiallysized distribution through mixing and local evidence incorporation events. In approximating the true marginal probabilities, the information matrix-based approach with exact inference and the belief matrix-based approach perform comparably, while the information matrix using the Metropolis sampling algorithm performs worse, but only

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

94

Figure 6.6: Example of two ground-truth tracks marginally. When applied to the real world data from a people tracking system augmented by a RFID system in the Stanford AI lab., the two approaches have successfully dealt with the typical artifacts from the sub-optimal data association algorithms – split tracks and merged tracks from wrong data associations.

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

95

Two ground−truth tracks

Split track

Merged track

Split track

Figure 6.7: Typical errors by tracking systems: Two tracks at the top are correct ones, although our tracking system has found the three tracks at the bottom.

Figure 6.8: Experiment scenario: Three people walking in the lab over 87.79 second period. Their individual tracks are shown in different colors.

CHAPTER 6. COMPARISON OF THE TWO APPROXIMATIONS

96

0.4763 0.0545 0.4692 1.5 1

0.0473 0.8911 0.0616

0.5 0 3 2

Z

1

0

1

2

3

4

5

6

7

0.4763 0.0545 0.4692

Belief matrix approach Y

X

0.9046 0.0952 0.0002

1.5 1

0.0476 0.8571 0.0952

0.5 0 3 2 1

0

1

2

3

4

5

6

7

0.0478 0.0476 0.9046

Information matrix approach

Figure 6.9: Uncertainties of object identities after many mixing and evidence incorporation events from the data in Figure 6.8

Chapter 7 Group Management Protocol In Chapter 4, we saw that the local evidence incorporation operation for the marginal belief matrix representation requires a group communication among agents, which we will formerly define as an acquaintance group in this chapter. Supporting communication within an acquaintance group is critical for multi-object tracking and is also useful for many other WSN applications that might use mobile agents to collect information. To this end, we propose two distributed data structures and show how to maintain them in an efficient manner as agents are moving in a WSN. Specifically, we propose: • Distributed Collaboration Graph (DCG): DCG is a graph obtained from the agent trajectories and can be used for member discovery. • Communication Tree on DCG: In principle, DCG itself can be used as a communication graph by a simple flooding protocol, although it is more desirable to maintain a subtree spanning all the agents in the DCG for efficient multicasting. We develop light-weight distributed protocols that construct and maintain the aforementioned data structures. Moreover, we demonstrate their efficiency through analysis and simulation.

97

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

7.1

98

Related Work

In [31], the authors introduced the notion of collaboration group, which is a set of nodes – or processes – that collaborate to achieve a common goal. Most of the collaboration groups discussed in [31] can be supported by well–known geographic routing protocols like Geocasting [36] or GEAR [50], since nodes in those groups have well-defined geometric or topological relations, for example, a group of nodes within 1-hop communication distances or a group of nodes within [∆X, ∆Y ] region. However, this is not the case for the acquaintance group (AG), in which membership is not defined as geometric/topological primitive, but by their logical relationships, for example, a group of agents who observed an elephant before. In [17], the authors proposed a general group communication framework in a WSN. The basic idea is to maintain a moving backbone, which is a set of nodes used for exchanging messages among agents while they are moving. Their method, however, requires a separate member discovery mechanism for a specific group definition – or else agents could receive large number of unwanted messages.

7.2 7.2.1

Distributed Collaboration Graph (DCG) Acquaintance Group

Figure 7.1 illustrates a typical tracking scenario with three agents A, B and C following car, bike and bus, respectively. For simplicity of discussion, we assume each agent maintains two pieces of information about its object – position and ID. At the left mixing, the two agents communicate with each other to update their ID as the union of the previous IDs {car, bike}. Later at the right mixing, bike and bus are mixed and their corresponding agents update their IDs as the union {car, bike, bus}. In general, after a mixing, two agents update their IDs as the union of the two previous IDs. Now, we define the notion of an acquaintance group (AG) in our context. Definition 13. An acquaintance group AG(i) is a group of agents, whose ID set contains i.

99

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

A position: (x,y) ID:{car,bike} position: (x,y) ID: bike

B position: (x,y) ID: {car,bike,bus}

position: (x,y) ID: {car,bike} position: (x,y) ID: car

agent

C position: (x,y) ID: {car,bike,bus}

position: (x,y) ID: bus node

object track

agent path

Figure 7.1: A simple agent interaction for multi-object tracking application According to the above definition, AG(bike) = AG(car) = {A, B} and AG(bus) = {A, B, C} in Figure 7.1. Intuitively, AG(bike) can be interpreted as a group of agents, who share information on bike – no one outside this group knows anything about bike. Figure 7.2 illustrates that agent trajectories might be used for the membership discovery and group communication – considering that groups are always formed at the intersections of agent trajectories, a graph formed by agent trajectories must connect all the members of a group. We will call this graph a distributed collaboration graph (DCG) and the details will be discussed in the next section.

7.2.2

Construction of the Distributed Collaboration Graph

The construction of DCG can be done in the following way. Each agent that belongs to more than one group1 must leave a routing table2 at the node it resides before it 1

This is equivalent to a set of agents, whose objects have multiple identities. The routing table here should not be confused with that of IP network, where the routing table contains all the destination addresses. 2

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

100

Figure 7.2: Idea of using agent trajectories for group member discovery and group communication: The agent trajectories after the mixing can be used as member discovery and group communication in this example. hops to another node. A routing table contains information about node class, previous node(s), next node(s), group membership and current time stamp as summarized in table 7.1. Table 7.1: A routing table in a node of a collaboration graph Information stored at nodes in DCG Node class: isJunction Previous node #1 : prevN ode1 Previous node #2 : prevN ode2 Next node #1 : nextN ode1 Next node #2 : nextN ode2 Group membership (ID): G Current time: t There are two main classes of nodes in DCG; one is a junction node, which make some routing decision between four (three) nodes - two (one) input nodes and two output nodes - , and the other is a relay node, which just relays packets between two nodes. Junction nodes are formed at mixings; two agents involved in a mixing select

101

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

a node within the intersection of the two communication ranges as a junction node. There is a special junction, which has no previous node IDs and called the root. All the current agent nodes are called terminals. Information stored at nodes can be visualized as Figure 7.3 node#: i−1 node#: i−1

node#: i Gi

node#: i+1 Gi

node#: i+1

ID: i

{Gi,Gk}

Gi relay node

node#: k−1

{Gi,Gk}

Gk junction node

node#: k+1

Figure 7.3: Information stored at relay nodes (left) and junction nodes (right) Let us take an example of five agents with five objects in a WSN as Figure 7.4. Figure 7.5 shows two collaboration graphs corresponding to two different object idenA

B C

D

E

Figure 7.4: Mobile agent paths tities. We emphasize that these graphs are just for visualization and no single node contains global information about a DCG. For example, the left DCG in Figure 7.5, denoted as DCG(bike), is a collection of nodes (and edges), whose routing table(s) contain bike as group membership. It is easy to see why group member discovery is automatic given a DCG - all the terminals of a DCG(bike) are equal to all the members of AG(bike).

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

102

Figure 7.5: Distributed collaboration graphs for two identities from Figure 7.4 Proposition 1. Discovery of all the members of AG(ID) can be done by simply visiting all the terminals in DCG(ID). Although a simple flooding on a DCG can be used as a group communication protocol, it is not efficient; packets will visit all the nodes in a DCG. In the next section, we will present a protocol for maintaining a communication tree among the agents (a subgraph of a DCG) in a distributed fashion.

7.3

Maintaining Communication Tree on DCG

We will now show how to maintain the DCGs and communication trees for each of the acquaintance groups as the objects move. As we shall see, a major feature of our protocol is that all the computations can be done in a distributed and local manner. Informally, the DCG for object i, which we shall denote by DCG(i), keeps track of the positions as well as trajectories of those agents whose identity sets contain i. In other words, DCG(i) can be viewed as the acquaintance group AG(i) augmented with trajectory information. The communication tree for AG(i) is a Steiner tree in DCG(i) that spans the terminals. In particular, we will use this tree as a data structure for supporting group communication. We call an edge in the DCG active if it belongs to the communication tree. Figure 7.6 shows a typical scenario in this setting. Initially, there has been no interaction among the agents. Hence, the DCG for each object is empty. As the objects and their tracking agents move around in the network, we would have to update the corresponding DCGs. The updating of the

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

a d

103

b c

e

f Figure 7.6: A multi-object tracking scenario. In the above figure, objects are labeled a, b, c, d, e. The arrows indicate the trajectories of objects. Each square represents a mixing event (see Section 7.3.1 below). DCGs is triggered by three types of events – mixing event, crossing event, and relay event. We now describe each of these in turn.

7.3.1

Mixing Event

Suppose that two agents, one carrying identity set I1 and the other carrying identity set I2 , come to close proximity of each other at the point p (we shall assume that p is a node in the network). We call such an event a mixing event. As mentioned before, we need to update the identity sets carried by the agents. Moreover, we need to update DCG(i) and its communication tree for each i ∈ I to reflect the mixing. We now describe each of the updating tasks in turn. Updating the Identity Sets As mentioned in Section 7.2.1, the agents in the two outgoing edges will carry the identity set I = I1 ∪I2 . Clearly, this update step can be carried out in a local manner.

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

104

Updating the DCG and the Communication Tree For simplicity of discussion, let us fix our attention to DCG(i), where i ∈ I. To update DCG(i), we first declare the crossing point p a junction node for i. As the agents hop to the next nodes, we declare both of the outgoing edges active for i. It is important to note that an edge may belong to many DCGs, and hence we need to specify to which group the edge is active. Now, since two agents are meeting at the common point p, the active edges in the updated DCG(i) may contain a cycle. However, note that if i 6∈ I1 ∩ I2 , then the active edges in the updated DCG(i) would not contain a cycle (see Figure 7.7). On the other hand, if i ∈ I1 ∩ I2 , then the active edges in DCG(i) will form a cycle after the update, since both agents have information about i if they both are tracking i earlier. To remedy this situation, we first declare one of the active cycle–edges, say e, inactive. Then, starting from the edge e, we traverse around the cycle and check whether there are dangling active edges. An active edge is said to be dangling if there are no incident active edges on either one of its endpoints, and none of its endpoints are terminals. We would like to deactivate as many dangling edges as possible, since their deactivation should not affect the connectivity of the resulting communication tree. Indeed, we have the following proposition.

Proposition 2. Suppose that initially the active edges in DCG(i) form a tree that spans the terminals. If a mixing event creates a cycle of active edges in DCG(i), then upon applying the cycle–canceling algorithm described above, the resulting active edges will still form a tree that spans the terminals. In particular, the subgraph induced by the active edges at the end of the algorithm is connected.

Proof. The first step of the algorithm declares one of the active cycle-edges inactive, and this step clearly preserves connectivity and the spanning property. Moreover, the active edges form a tree after this step. Now, observe that a dangling edge is an edge that leads to a non–terminal leaf node in this tree. Hence, its removal would neither destroy the connectivity nor the spanning property. We remark that an optimal set of edges to delete from the cycle (i.e. a set that

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

105

yields that maximum number of edges) while preserving connectivity can be found in O(l2 ) time, where l is the length of the cycle. Moreover, the graph induced by the remaining active edges is a tree that spans the terminals. Thus, it follows by induction on the number of update steps that our scheme has the desired properties.

I_1 I_2

I=I_1 U I_2 active I=I_1 U I_2 active

Updated Portion of DCG_i when i is in I_1 but not in I_2

Figure 7.7: Mixing Event: The updated tree does not contain a cycle. To perform the cycle–canceling operation, we need only information stored in the nodes along the cycle. Thus, all the above computations can be done in a distributed and local manner. To illustrate our protocol, consider the example in Figure 7.6. The various communication trees for object a, as time evolves, are shown in Figure 7.8.

7.3.2

Crossing Event

Observe that not every intersection of the trajectories corresponds to a mixing event. It could be the case that one agent arrives at p at an earlier time than the other, but that both agents carry information about object i. In this case, we would still need to update DCG(i) in order to ensure the correctness of the data structure. We call such event a crossing event. Suppose that one agent, say x, reach p earlier. As x hops to the next node, it would leave the set of identities Ix at p. Now, when the other agent y reaches p, it could check whether Ix ∩ Iy = ∅. If I ≡ Ix ∩ Iy 6= ∅, then we mark p as a junction of DCG(i) for each i ∈ I. Again, let us focus on one particular identity i ∈ I. (The

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

106

Figure 7.8: The various communication trees for object a as time evolves. Edges that are declared inactive are marked with crosses. marking of p as a junction is mainly for keeping track of the entire DCG(i).) After marking p as a junction, we need to update the communication tree in DCG(i). To do so, let ty be the terminal where agent y resides. If p is incident upon an active edge along the trajectory of x, then we update the communication tree using the mechanism described in the preceding section (see Figure 7.9). Otherwise, we simply add the edge (p, ty ) to the communication tree (and hence (p, ty ) is an active edge). By induction on the number of update steps, we see that such a scheme maintains group connectivity, i.e. every object in the same group is connected by the communication tree. Moreover, in order to perform the update, we need only the information stored at p. Hence, the update step is completely local.

7.3.3

Relay Event

As an agent follows an object, it will hop from node to node. If the agent carries an identity set I with more than one element (which indicates that the agent has involved in a mixing event before), then it will leave a routing table at the node it resides (call it u) before hopping to another node v (see Section 7.2.2). In this case, we add the node v to DCG(i), where i ∈ I, and declare the edge (u, v) active.

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

107

breakup cycle 





active

t_y

p 

active

active

t_x

Figure 7.9: Crossing Event

7.4

Simulation

To validate the correctness and efficiency of the proposed protocols, we have performed a simulation. Approximately 1200 nodes are placed in 640 × 640 region with a communication range of 25 (no unit). For a given number of agents, their directions are randomly chosen, but their velocities are carefully chosen so that there are a few junctions. To focus on verifying the correctness of our protocols, control packets, linklevel acknowledgments and other low-level parameters are not simulated or ignored for simplicity. Figure 7.10 shows a typical scenario with five agents. We compare the performances of two group communication protocols; a flooding on a whole DCG and multicasting on a communication tree. For a given agent trajectories, we compare the two protocols in terms of the total number of hops from a randomly selected agent to the rest of the agents. For fair comparison, we assume that a flooding scheme maintains a queue at each node so packets do not travel forever. For each number of agents, we generate 50 different scenarios (trajectories), over which the number of hops are averaged. Figure 7.11 shows the comparison results; the overhead of maintaining communication tree is validated especially when there are many number of agents.

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

108

Figure 7.10: A typical scenario with five agents

7.5

Conclusion

In this chapter, we have studied how to support a group communication among interacting agents (acquaintance group) in a WSN and proposed two distributed data structures and two corresponding protocols that support i) group membership discovery and ii) efficient multicasting in a purely distributed manner. Simulation shows the feasibility, correctness and efficiency of the proposed protocols. There is room for improvement on both theoretical and practical aspects of the current protocols. In the current setting, if one of the nodes/links in the communication graph fails – which can happen quite often in wireless systems – then the whole group communication would be broken. Our preliminary study, however, shows that a simple local replication scheme can remedy the situation. Whenever an agent leaves a routing table, the agent also broadcasts the table to its one–hop neighbors, and at node/link failures, the redundant information at the neighboring nodes can be used

109

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

200

DCG Tree

180

Average number of hops

160

140

120

100

80

60

40

4

6

8

10

12

14

16

18

Number of agents

Figure 7.11: Group communication performance comparison: Flooding on a whole DCG and multicasting on its spanning tree instead.3 Figure 7.12 shows that this heuristic can effectively reduce the probability of link/node failure for different node densities. Another question we plan to address is the quality of the communication tree (in terms of hop–count or some other metric) produced by our algorithm. Observe that the communication tree is simply a Steiner tree in the DCG that spans the terminals. The problem of computing an optimal Steiner tree in a general graph is NP–hard, and there has been lots of approximation algorithms for this problem, see e.g. [42]. However, none of these algorithms are local, and they do not address the issue of moving terminals. We plan to address these issues in the future using the kinetic data structure (KDS) framework developed in [23]. In particular, it would be interesting to develop a local, distributed approximation algorithm for the Steiner tree problem under the KDS framework. DCG can be useful for supporting a query on identities of objects. Suppose a user is only interested in information about a specific object, say, bike in a WSN, then DCG(bike) can be used as an information aggregation tree – it is not exactly a tree, 3

This is a good example of exploiting diversity in wireless communication.

110

CHAPTER 7. GROUP MANAGEMENT PROTOCOL

Effect of support links in link failure 1

Probability of link failure with support links

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

Main link Supporting link

(a)

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Probability of link failure

(b)

Figure 7.12: A simple heuristic to deal node/link failures and its effect on the probability of communication failure

but can be converted into one using the similar cycle–handling protocol described here.

Chapter 8 Conclusions and Future Work 8.1

Summary

In this dissertation, we have presented a set of algorithms to efficiently implement multi-object tracking systems in a wireless sensor network in a fully distributed manner. We have proposed to decompose a tracking problem into two separate problems of location estimation and identity estimation, and shown that local evidence on object identities from sensors can be exploited to overcome the artifacts of sub-optimal data association algorithms in this framework. Identity management and two approximations Uncertainty in sensor data, coupled with the intrinsic difficulty of the data association problem, suggests probabilistic formulations over the set of possible identities, which we call the identity management. While the optimal identity management requires exponential storage and computation, in practice the information provided by this distribution is accessed only in certain stylized ways, as when asking for the identity of a given track, or the track with a given identity. Exploiting this observation, we have proposed two practical solutions, the marginal belief matrix approach and the information matrix approach and compared their tradeoffs through analysis and real experiments.

111

CHAPTER 8. CONCLUSIONS AND FUTURE WORK

112

Information-driven sensor selection utility Localization of a moving object can be formulated as a sequential Bayesian estimation problem, and we have proposed an information-driven sensor selection framework to solve the estimation problem in energy-efficient manner in a WSN. In the framework, we proposed the expected posterior derived from the current posterior and demonstrated that two information-theoretic utility measures, the entropy and relative entropy, on the expected posterior are good heuristics for sensor selection. Group management protocol We have studied the problem of maintaining communication paths among a group of moving agents that have interacted with one another. As its solution, we propose a data structure called the Distributed Collaboration Graph (DCG), which is a communication graph obtained from the agent trajectories. DCG can be constructed in a purely distributed fashion with very little cost and can be used for group discovery and for multicasting/broadcasting among agents. We also propose a distributed protocol which maintains a communication tree among the agents within the DCG while the agents are moving. This allows us to maintain group connectivity and provides the infrastructure for routing among the moving agents. One application of this protocol is the normalization operation of the marginal belief matrix approach, where an agent needs to communicate all the members of its group to update the marginal probabilities.

8.2

Future Work

Hybrid approach Considering the tradeoffs between the marginal belief matrix approach and the information matrix based approach, one might wonder whether it is possible to combine the two approaches. For example, we can initially use the marginal belief matrix approach when objects are detected. As mixing events occur, the belief matrix are updated using its simple mixing operation. At some point, the uncertainty encoded

CHAPTER 8. CONCLUSIONS AND FUTURE WORK

113

by the marginal belief matrix becomes very large and we might want to switch to the information matrix approach, where evidence incorporation can be done very easily. Once enough evidence is incorporated, we can switch back to the marginal belief matrix representation to facilitate the mixing operation. The main problem is, we do not know how to convert a marginal belief matrix B into an equivalent information matrix L, since there are infinitely many information matrices that encode the same underlying probability distribution. In practice, however, it seems reasonable to use a N × N zero matrix as the information matrix after many mixings, since it is not as critical to approximate the underlying distribution accurately when the distribution is close to the uniform distribution. Query framework The information matrix approach is an example of how a user query can be exploited to reduce the resource use in a WSN. We have done a preliminary study [46] on the general query framework for multi-object tracking in a WSN, where we classify queries into two classes – identity query and location query. The identity queries inquire identities of objects based on locations – Which objects are in the area [∆X ∆Y ]. The location queries seek location information based on identities – Where is object with ID i?. Supporting these queries while minimizing resource usage in a distributed manner is a very important problem both in theory and practice, and we would like to see more research activities in this direction.

Bibliography [1] H. Balakrishnan, I. Hwang, and C. J. Tomlin. Polynomial approximation algorithms for belief matrix maintenance in identity management. In Proc. 43rd IEEE Conference on Decision and Control, December 2004. [2] Y. Bar-Shalom, K. C. Chang, and H. A. Blom. Tracking a maneuvering target using input estimation versus the interacting multiple model algorithm. AeroSys, 25(2):296–300, 1989. [3] Y. Bar-Shalom and T. E. Fortmann. Tracking and Data Association. Academic Press, 1989. [4] Y. Bar-Shalom and X. R. Li. Multitarget-Multisensor Tracking: Principles and Techniques. YBS Publishing, Storrs, CT, 1995. [5] G. Birkhoff. Three observations on linear algebra. In Univ. Nac. Tucumn. Rev. Ser., pages 147–151, 1946. [6] G. Chen and D. Kotz. A survey of context-aware mobile computing research. Technical Report TR2000-381, Dept. of Computer Science, Dartmouth College, November 2000. [7] M. Chu, H. Haussecker, and F. Zhao. Scalable information-driven sensor querying and routing for ad hoc heterogeneous sensor networks. International Journal of High-Performance Computing Applications, 16(3):90–110, 2002.

114

BIBLIOGRAPHY

115

[8] D. O. Chyi. An Infrastructure for a Mobile-Agent System that Provides Personalized Services to Mobile Devices. Technical Report TR2000-370, Hanover, NH, 2000. [9] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons, Inc., New York, 1991. [10] I. J. Cox and S. L. Hingorani. An efficient implementation of Reid’s multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking. IEEE Trans. on PAMI, 18(2):138–150, February 1996. [11] I. J. Cox and M. L. Miller. A comparison of two algorithms for determining ranked assignments with applications to multi-target tracking and motion correspondence. IEEE Trans. on Aerospace and Electronic Systems, 33(1):295–301, January 1997. [12] S. Deb, K. R. Pattipati, and Y. Bar-Shalom. A new algorithm for the generalized multidimensional assignment problem. In Proc. IEEE International Conference on Systems, Man and Cybernetics; Emergent Innovations in Information Transfer Processing and Decision Making, pages 249–254, 1992. [13] P. Diaconis. Group Representation in Probability and Statistics (Lecture Notes Vol. 11). Institute of Mathematical Statistics, Hawyard, Canada, June 1988. [14] P. Diaconis. A generalization of spectral analysis with application on ranked data. The Annals of Statistics, 17(3), 1989. [15] P. Diaconis and L. Saloff-Coste. What do we know about the Metropolis algorithm? In Proc. 27th annual ACM symposium on Theory of computing, pages 112–129, Las Vegas, Nevada, United States, 1995. ACM Press. [16] A. Doucet, N. De Freitas, and N. Gordon, editors. Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. Springer, 2001.

BIBLIOGRAPHY

116

[17] Q. Fang, J. Liu, L. J. Guibas, and F. Zhao. RoamHBA: Maintaining group connectivity in sensor networks. In Proc. 3nd International Symposium on Information Processing in Sensor Networks (IPSN 2004), 2004. submitted. [18] R. Fletcher. Practical Methods of Optimization. John Wiley & Sons, Inc., New York, 2nd edition, 1987. [19] J. Franklin and J. Lorenz. On the scaling of multidimensional matrices. Linear Algebra Appl. 114/115 717-735, 1989. [20] D. S. Friedlander and S. Phoha. Semantic information fusion for coordinated signal processing in sensor networks. In International Journal of High Performance Computing Applications, volume 16, pages 235–242, 2002. [21] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman and Company, 1979. [22] P. E. Gill and W. Murray. Practical Optimization. Academic Press, New York, 1981. [23] L. J. Guibas. Kinetic data structures - a state of the art report. In Proc. Workshop Algorithmic Found. Robot., pages 191–209. A. K. Peters, Wellesley, 1998. [24] L. J. Guibas. Sensing, tracking and reasoning with relations. IEEE Signal Processing Magazine, 19(2):73–85, 2002. [25] D. H¨ahnel, W. Burgard, B. Wegbreit, and S. Thrun. Towards lazy data association in SLAM. In Proceedings of the 11th International Symposium of Robotics Research (ISRR’03), Sienna, Italy, 2003. Springer. [26] M. Isard and A. Blake. Condensation - conditional density propagation for visual tracking. In International Journal of Computer Vision, 1998. [27] A. H. Jazwinski. Stochastic Processes and Filtering Theory. New York: Academic Press, 1970.

BIBLIOGRAPHY

117

[28] R. Jose and N. Davies. Scalable and flexible location-based services for ubiquitous information access. In Proc. International Symposium on Handheld and Ubiquitous Computing (HUC 99), September 1999. [29] R. E. Kalman. A new approach to linear filtering and prediction problems. Trans. of ASME, Journal of Basic Engineering, 82D(3):34–45, 1960. [30] L. E. Kinsler, A. R. Frey, A. B. Coppens, and J. V. Sanders. Fundamentals of Acoustics. John Wiley & Sons, Inc., 1999. [31] J. Liu, M. Chu, J. Liu, J. Reich, and F. Zhao. State-centric programming for sensor-actuator network systems. IEEE Pervasive Computing, 2(4):50–62, October-December 2003. [32] J. J. Liu, J. Reich, and F. Zhao. Collaborative in-network processing for target tracking. EURASIP Journal on Applied Signal Processing, 2003(4):378–391, March 2003. [33] P. Maybeck. Stochastic Models, Estimation, and Control, Volume I. Academic Press, Inc, 1979. [34] E. Mazor, A. Averbuch, Y. Bar-Shalom, and J. Dayan. Interacting multiple model methods in target tracking: A survey. AeroSys, 34(1):103–123, January 1998. [35] N. Megiddo and A. Tamir. Linear time algorithms for some separable quadratic programming problems. In Operations Research Letters, volume 13, pages 203– 211, 1993. [36] J. C. Navas and T. Imielinski. GeoCast – geographic addressing and routing. In Mobile Computing and Networking, pages 66–76, 1997. [37] R. Neal. Probabilistic inference using markov chain monte carlo methods. Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto, 1993.

BIBLIOGRAPHY

118

[38] H. Pasula. Identity Uncertainty. Ph.D. thesis. University of California, Berkeley, 2003. [39] H. Pasula, S. Russell, M. Ostland, and Y. Ritov. Tracking many objects with many sensors. In Proc. IJCAI-99, pages 1160–1171, 1999. [40] A. B. Poore. Multidimensional assignment formulation of data association problems arising from multitarget and multisensor tracking. Computational Optimization and Applications, 3:27–57, 1994. [41] D. B. Reid. An algorithm for tracking multiple targets. IEEE Trans. on Automatic Control, 24(6):843–854, 1979. [42] G. Robins and A. Zelikovsky. Improved steiner tree approximation in graphs. In Proceedings of the 11th annual ACM–SIAM Symposium on Discrete Algorithms (SODA 2000), pages 770–779, 2000. [43] U. Rothblum and H. Schneider. Scaling of matrices which have prescpecified row sums and column sums via optimization. Linear Algebra Appl. 114/115 737-764, 1989. [44] S. Russell. Identity uncertainty. In IFSA/NAFIPS 2001, July 2001. [45] D. Schulz, M. Burgard, D. Fox, and A. B. Cremers. Tracking multiple moving objects with a mobile robot. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2001. [46] J. Shin and L. J. Guibas. Location and identity query support for multi-object tracking in wireless sensor networks. unpublished manuscript, 2004. [47] J. Shin, L. J. Guibas, and F. Zhao. A distributed algorithm for managing multitarget identities in wireless ad-hoc sensor networks. In Proc. 2nd International Workshop on Information Processing in Sensor Networks (IPSN03), pages 223– 238, Palo Alto, CA, April 2003. Springer.

BIBLIOGRAPHY

119

[48] J. Shin, N. Lee, S. Thrun, and L. J. Guibas. Lazy inference on object identities in wireless sensor networks. In Proc. 4nd International Symposium on Information Processing in Sensor Networks (IPSN 2005), 2005. submitted. [49] J. Shin, A. Mancho So, and L. J. Guibas. Supporting group communication for interacting agents in wireless ad-hoc sensor network applications. In IEEE Wireless Communications and Networking Conferences (WCNC 2005), 2005. [50] Y. Yu, R. Govindan, and D. Estrin. Geographical and energy aware routing: A recursive data dissemination protocol for wireless sensor networks. Technical Report UCLA/CSD-TR-01-0023, UCLA Computer Science Department, May 2001. [51] F. Zhao, J. Shin, and J. Reich. Information-driven dynamic sensor collaboration. IEEE Signal Processing Magazine, 19(2):61–72, 2002.

Suggest Documents