Signal Localization, Decomposition and Dictionary Learning on ... - arXiv

4 downloads 130 Views 5MB Size Report
Jul 5, 2016 - reviews on Yelp, purchased followers on Twitter and inflated trust on eBay ...... activating the area around Apple Store on 5th Avenue and (b).
1

Signal Localization, Decomposition and Dictionary Learning on Graphs

arXiv:1607.01100v1 [cs.SI] 5 Jul 2016

Siheng Chen, Student Member, IEEE, Yaoqing Yang, Student Member, IEEE, Jos´e. M. F. Moura, Fellow, IEEE, Jelena Kovaˇcevi´c, Fellow, IEEE

Abstract—Motivated by the need to extract meaning from large amounts of complex data that can be best modeled by graphs, we consider three critical problems in graph signal processing: localization, decomposition and dictionary learning. We are particularly interested in piecewise-constant graph signals that efficiently model local information in the vertex-domain; for such signals, we show that decomposition and dictionary learning are natural extensions of localization. For each of the three problems, we propose a specific graph signal model, an optimization problem and a computationally efficient solver. We then conduct an extensive empirical study to validate the proposed methods on both simulated and real data including the analysis of a large volume of spatio-temporal Manhattan urban data. Using our methods, we are able to detect both everyday and special events and distinguish weekdays from weekends from taxi-pickup activities. The findings validate the effectiveness of our approach and suggest that graph signal processing tools may aid in urban planning and traffic forecasting.

activated piece. For example, given a graph signal in Figure 1(a), we are looking for an underlying activated piece as in Figure 1(b). This is relevant in many real-world applications from localizing virus attacks in cyber-physical systems to activity in brain connectivity networks, to traffic events in city road networks. As the original signal is noisy, this task is related but not equivalent to denoising. Denoising aims to obtain a noiseless graph signal (see Figure 1(c)), which is not necessarily localized. In this paper, we focus on localizing activated pieces in noisy piecewise-constant signals in which case localization is in fact equivalent to denoising.

Index Terms—Signal processing on graphs, signal localization, signal decomposition, dictionary learning (a) Signal.

I. I NTRODUCTION Data is being generated at an unprecedented level from a diversity of sources, including social networks, biological studies and physical infrastructure. To analyze such complex data has led to the birth of signal processing on graphs [1], [2], [3], which generalizes classical signal processing tools to data supported on graphs; the data is the graph signal indexed by the nodes of the underlying graph. Recent additions to the toolbox include sampling of graph signals [4], [5], [6], [7], recovery of graph signals [8], [9], [10], representations for graph signals [11], [12], [13], [14], uncertainty principles on graphs [15], [16], graph-based transforms [17], [18], [19], [20] and community detection and clustering on graphs [21], [22], [23]. We consider three graph signal problems: localization, decomposition and dictionary learning, each of which builds on the previous one. Localization identifies an activated piece in a single graph signal, decomposition uses such pieces to represent/approximate a graph signal and dictionary learning identifies shared activated pieces in multiple graph signals. Localization. The aim is to identify one set of connected nodes where the graph signal switches values—we call an S. Chen, Y. Yang, J. M. F. Moura and J. Kovaˇcevi´c are with the Dept. of Electrical and Computer Engineering. Moura and Kovaˇcevi´c are also with the Dept. of Biomedical Engineering (by courtesy), Carnegie Mellon University, Pittsburgh, PA, 15213 USA. Emails: {sihengc,yaoqingy,moura,jelenak}@andrew.cmu.edu. The authors gratefully acknowledge support from the NSF through awards CCF 1421919, 1563918 and 1513936.

(b) Activated piece.

(c) Noiseless signal.

Fig. 1: Signal localization on graphs. Given a signal (a), the aim is to identify an activated piece (b) while denoising aims to obtain a noiseless signal (c). When smooth background is ignored, localization is equivalent to denoising. Decomposition. The aim here is to represent or approximate a signal as a linear combination of building blocks—activated pieces. For example, given a graph signal in Figure 2(a), we decompose it into two activated pieces (b) and (c), respectively.

(a) Signal.

(b) Piece 1.

(c) Piece 2.

Fig. 2: Signal decomposition on graphs. Given a signal (a), we aim to decompose it into two activated pieces (b) and (c). Dictionary learning. The aim here is to learn a graph dictionary of one-piece atoms from multiple graph signals. For example, given three graph signals in Figures 3(a)— (c), we look for the underlying shared activated pieces as in Figures 3(d) and (e). This is relevant in many real-world applications where we look to localize common patterns in a large data volume. For example, on weekday evenings, the area near Penn Station in Manhattan is notoriously crowded. We can model the number of passing vehicles at all the intersections

2

by a graph signal on the Manhattan street network and use dictionary learning techniques to analyze traffic activities at various moments and automatically localize the patterns.

(a) Signal 1.

(b) Signal 2.

(d) Piece 1.

(e) Piece 2.

(c) Signal 3.

Fig. 3: Dictionary learning on graphs. Given signals (a)–(c), dictionary learning aims to find shared activated pieces (d), (e). Related Work. We now briefly review related work on anomaly detection, decomposition and dictionary learning. a) Anomaly detection: This is a classical task that identifies items, events, or observations that do not conform to an expected pattern or other items in a dataset [24]. Anomaly detection has been widely studied in different fields, including signal processing, statistics, machine learning, computer vision and data mining. For example, tasks include detecting fake reviews on Yelp, purchased followers on Twitter and inflated trust on eBay [25], [26]. Within the context of graphs, much of the literature considers detecting smooth or piecewise-constant graph signals from signals corrupted by Gaussian noise [27], [28], [29], [30], [31]. A recent work [32] considers detecting piecewise-constant graph signals from Bernoulli noise. In this paper, we localize anomalies (activated pieces) on graphs. b) Decomposition: Signal decomposition/representation is at the heart of most signal processing tasks [33] as signals are represented or approximated as a linear combination of basic building blocks. Fourier analysis can be seen as spectral decomposition of signals into sinusoidal components and is widely used in communication systems and digital devices. Wavelet and multiresolution techniques decompose signals into components at different scales and are widely used in image and video compression. Within the context of graphs, a number of counterparts to classical transforms have been found: the graph Fourier transform [1], [3], the windowed graph Fourier transform [19] and the wavelet transform [18]. In a number of works, representations are considered based on the graph Fourier domain [1], [34], [35], designing graph filters in the graph Fourier domain and achieving representations that are particularly localized in the graph Fourier domain. Yet others consider representations based on the vertex domain [36], [37], [30], [14], decomposing a graph into multiple sub-graphs of various sizes and achieving representations that are particularly localized in the graph vertex domain. In this paper, we look at representations based on the graph vertex domain in a dataadaptive fashion.

c) Dictionary learning: Dictionary learning finds a sparse representation of the input data as a linear combination of basic building blocks. This task is widely studied in image processing, computer vision and machine learning. Popular algorithms include the method of optimal directions [38], K-SVD [39] and online dictionary learning [40]. Within the context of graphs, [12] considers learning a series of polynomial graph filters as a graph dictionary, emphasizing the representations on the graph Fourier domain. In this paper, we are concerned with learning a graph dictionary that is sensitive to local changes and uses the representations in the graph vertex domain. Contributions. We start with a basic localization problem whose goal is to use a set of connected nodes to approximate a noisy one-piece graph signal with unit magnitude, propose an efficient and effective solver, and extend it to arbitrary magnitude. Based on this, we further consider signal localization, signal decomposition and dictionary learning on graphs. We conduct extensive experiments to validate the proposed methods. The results show that cut-based localization is good at localizing the ball-shaped class and path-based localization is good at localizing the elongated path class. We further apply our algorithms to analyze urban data, where our methods automatically detect events and explore mobility patterns across the city from a large volume of spatio-temporal data. In summary, the main contributions of the paper are: • • • •

A novel and efficient solver for graph signal localization. A novel and efficient solver for graph signal decomposition. A novel and efficient solver for graph dictionary learning. Extensive empirical studies of the proposed solvers on both simulated data and real data (Minnesota road and Manhattan street networks).

Notation. Consider an undirected graph G = (V, E), where V = {v1 , . . . , vN } is the set of nodes and E = {e1 , . . . , eM } is the set of edges. A graph signal x maps the graph nodes v n to the signal coefficients xn ∈ R; in vector form, x = T x1 x2 . . . xN ∈ RN . Let C ⊆ V be a subset of nodes; we represent it by the indicator vector,  1, vi ∈ C; 1C = (1) 0, otherwise. that is, 1C is a graph signal with 1s in C and 0s in the complement node set C = V \ C. We say that the node set C is activated. When the node set C forms a connected subgraph, we call C a piece and 1C a one-piece graph signal. Define a piecewise-constant graph signal as x =

K X

µi 1Ci ,

i=1

where each Ci is a piece, µi is a constant and K is the number of pieces. Outline of the paper. We present the three problems, graph signal localization, decomposition and dictionary learning in Sections II, III and IV, respectively. Section V concludes the paper and provides pointers to future research directions.

3

II. S IGNAL L OCALIZATION ON G RAPHS In this section, we propose a solver for the localization problem, validating it by simulations. Consider localizing an activated piece C ∈ C (where C is the set containing all the pieces) in a noisy, piecewise-constant graph signal x = µ1C + , (2) where 1C is the indicator function defined in (1), µ is the signal strength and  ∼ N (0, σ 2 IN ) is Gaussian noise. We consider two cases: µ = 1 and µ arbitrary. Previous works formulate this as a detection problem via a scan statistic searching for a most probable anomaly set, which is similar to localizing an activated piece. However, such a method is either computationally inefficient or hindered by strong assumptions. For example, in [27], the authors analyze the theoretical performance of detecting paths, blobs and spatial temporal sets by exhaustive search, which is clearly inefficient. In [29], [31], [32], the authors aim to detect a node set with a specific cut number, resulting in a computationally efficient algorithm, but limited by strong assumptions. Our goal is to develop an algorithm to efficiently localize an arbitrarily-shaped activated piece. The maximum likelihood estimator of the one-piece-localization problem under Gaussian noise is 2

min kx − µ1C k2 , µ,C

subject to C ∈ C.

(3)

A. Methodology 1) Localization with unit magnitude: Consider first (2) with a unit-magnitude signal, that is, µ = 1. Then (3) reduces to 2

min kx − 1C k2 , C

subject to C ∈ C.

(4)

This optimization problem is at the core of this paper. Hard thresholding based localization. The difficulty in solving (4) comes from the constraint, which forces the activated nodes to form a connected subgraph (piece). Since we do not restrict the size and shape of a piece, searching over all the pieces to find the global optimum is an NP-complete problem. We consider a simple algorithm that solves (4) in two steps: we first optimize the objective function and then project the solution onto the feasible set in (4). This simple algorithm guarantees to satisfy the constraints, but is not necessarily the optimal solution. The objective function can be formulated as min kx − C

2 1C k2

=

min

t∈{0,1}N

kx −

2 tk2

= min

ti ∈{0,1}

N X

where the projection operator PC (C) extracts the largest connected component (piece) in a node set C. Thus, this solver simply performs hard thresholding and then finds a connected component among the nodes with nonzero elements on it. Cut-based localization. The key idea in (5) is to partition all the nodes into two categories according to a given graph signal: activated nodes and inactivated nodes. This is similar to graph cuts: cutting a series of edges with minimum cost and obtaining two isolated components. The difference is that one optimizes edge weights in graph cuts, while (4) optimizes signal coefficients. Inspired by [29], [32], we add the number of edges connecting activated nodes and inactivated nodes to the objective function to induce a connected component. Let ∆ ∈ RM ×N be the graph difference operator (the oriented incidence matrix of G [41]). When ei is an edge that connects the jth node to the kth node (j < k), the elements of the ith row of ∆ are   1, ` = j; −1, ` = k; ∆i,` =  0, otherwise. A signal ∆1C ∈ RM records the first-order differences within 1C . The number of edges connecting C and C is k∆1C k0 = kA 1C k1 − 1TC A 1C , where A is the adjacency matrix, kA 1C k1 is the sum of the degrees of the nodes in C and 1TC A 1C is the inner connection of nodes within C. When there are no edges connecting the activated nodes, k∆1C k0 = kA 1C k1 . When all the activated T nodes are well connected Pand form a piece, 1C A 1C is often large and k∆1C k0  i∈C di with di the degree of the ith node. Thus, minimizing k∆1C k0 induces C to be a piece. We thus solve   2 Cλ = PC arg min kx − 1C k2 + λ k∆1C k0 , (6a) C

∗ Ccut

=

2

arg min kx − 1Cλ k2 .

(6b)



Given a λ, (6a) solves the regularized optimization problem and extracts the largest connected component. In (6b), we optimize over λ and obtain a solution. The previous hard thresholding solution (5) is a subcase of (6b) when we force λ = 0. Let us look at the regularized problem (6a) in more detail, 2

min kx − 1C k2 + λ k∆1C k0

2

(xi − ti ) ,

i=1

where t ∈ RN is a vector representation of a node set C. Since each ti is independent, we can optimize each individual element to obtain t∗i = 1, when xi > 1/2; and 0, otherwise. In other words, the optimum of the objective function is 1 2 {vi ∈ V | xi > } = min kx − 1C k2 . C 2 This step is nothing but global hard thresholding. We then project this solution onto the feasible set; that is, we solve   2 ∗ Cthr = PC arg min kx − 1C k2 (5) C   1 = PC {vi ∈ V | xi > } , 2

C

= =

2

kx − tk2 + λ k∆tk0   X X (xi − ti )2 + λ min |ti − tj | , min

t∈{0,1}N

ti ∈{0,1}

i

j∈Nei(i)

where Nei(i) is the neighborhood of the ith node. That is, we have t∗i = 1 when X X (xi − 1)2 + λ |xj − 1| < x2i + λ |xj |. j∈Nei(i)

j∈Nei(i)

The solution is (t∗cut )i =



1, xi > 21 + λ2 (α0 − α1 ) ; 0, otherwise,

(7)

4

where α1 counts the neighbors of xi equaling to 1 and α0 counts the neighbors of xi equal to 0. Theoretically, (7) is a local thresholding. In other words, the value of each element depends on the values of its neighbors. Computationally, we can solve the regularized optimization problem in (6a) by using the Boykov-Kolmogorov algorithm [42], [43]. We call this cutbased localization. Path-based localization. Cut-based localization is an efficient solver to localize activated nodes that can be easily separated from inactivated ones. The difficulty of separating these two classes is described by the cut number, which is good at promoting ball-shaped pieces, but not good at promoting elongated paths. For example, to solve (4), we introduced k∆1C k0 in (6a) to promote fewer connections between activated nodes and inactivated ones. This is because when C is a piece, 1TC A 1C is large. When nodes in C are fully connected, C forms a ball-shaped piece (see Figures 4 (a)–(b)) and 1TC A 1C = |C|(|C| − 1)/2; however, when nodes in C are weakly connected, for example, when C forms an elongated path, 1TC A 1C = |C| − 1. In other words, (6a) seems better suited to capturing ball-shaped pieces than elongated paths. To address elongated paths, we restrict kA 1C k∞ , the maximum degree of the nodes in the activated set to be at most 2 and solve the following optimization problem, 2

min kx − 1C k2 ,

kA 1C k∞ ≤ 2.

subject to

C

In other words, the constraint requires that each activated node connect to at most two activated nodes, which promotes elongated paths. To solve this combinatorial problem efficiently, we relax it to a convex problem, 2

t∗path = min kx − tk2 , t∈RN

subject to kA tk∞ ≤ 2

and

0 ≤ t ≤ 1.

We then set various thresholds for t∗path , extract the largest connected component, optimize over the threshold to obtain a solution,   Cλ = PC 1t∗path ≥λ , ∗ Cpath

=

2

arg min kx − 1Cλ k2 .

(8)



We call this path-based localization. Combined solvers. We can combine these two solvers by choosing the one with a smaller objective value. The final solution is C ∗ = Loc1 (x) = arg

min

∗ ,C ∗ C∈{Ccut path }

2

kx − 1C k2 ,

(9)

where Loc1 (·) is the operator that finds an activated piece with unit magnitude. This solution is not the global optimum of (7); it considers two typical subsets in the feasible set and combines a graph-cut solver with a quadratic programming problem, both of which are efficient. In Section II-B, we validate it empirically. 2) Localization with unknown magnitude: We next consider a more general case where the magnitude of the activated piece is unknown, (2), with the associated optimization problem (3).

We iteratively solve C and µ until convergence; that is, given C, we optimize over µ and then given µ, we optimize over C. In the kth iteration, 2

µ(k)

=

min kx − µ1C (k) k2 =

C (k)

=

1 Loc1 ( x). µ

µ

xT 1C (k) , 1TC (k) 1C (k)

We obtain a pair of local optima by alternatively minimizing these two variables. We denote this solver by µ∗ , C ∗ = Locunknown (x).

(10)

B. Experimental Validation We test our localization solver on two graphs: the Minnesota road network and the Manhattan street network. On each graph, we consider localizing two classes of simulated graph signals with different activated sizes under various noise levels. 1) Minnesota road network: We model the Minnesota road network as a graph with intersections as nodes and road segments between intersections as edges. The graph includes 2, 642 nodes and 3, 342 undirected edges. We simulate two classes of one-piece graph signals: a ball-shaped class and an elongated path class. Ball-shaped class. To generate a ball-shaped piece, we randomly choose one node as a node center and assign all the other nodes that are within k steps of the node center to an activated node set, where the radius k is either 5 or 10. Figures 4(a)–(b) show examples of ball-shaped pieces with radii k = 5 and 10 (left and right columns, respectively). The nodes in green (lighter) indicate the activated nodes and the nodes in black (darker) indicate the inactivated nodes. We aim to localize the activated pieces from noisy one-piece graph signals. We vary the noise variance σ 2 from 0.1 to 1 with interval 0.1 and at each randomly generate 1, 000 noisy onepiece graph signals. We measure the quality of the localization with two measures: the Hamming distance and the F1 score. The Hamming distance is equivalent to the Manhattan distance between two binary graph signals, that is, it counts the total number of mismatches between the two,

b C) = 1 b − 1C = |C| + |C| b − |C ∩ C|, b dH (C, C 1 b is where C is the ground truth for the activated piece and C the localized piece. The lower the Hamming distance the better the quality: the Hamming distance reaches its minimum value b C) = 0. at dH (C, The F1 score is the harmonic mean of the precision and recall, and measures the matching accuracy, b b b C) = 2 precision(C, C) recall(C, C) , F1 (C, b C) + recall(C, b C) precision(C, where precision is the fraction of retrieved instances that are relevant (true positives over the sum of true positives and false positives), while recall is the fraction of relevant instances that are retrieved (true positive over the sum of true positives and b sum false negatives) [44]. Thus, with true positives = |C ∩ C|,

5

Radius 5.

Radius 10.

Length 15.

Length 91.

(a)–(b) Ball-shaped piece. F1 score

(a)–(b) Path.

F1 score

F1 score

1

1

1

0.8

0.8

0.8

0.6

0.6

0.6

0.4 0.2 0 0

hard cut number path LSPC

0.4 0.2

0.5

1

0 0

Noise level

hard cut number path LSPC 0.5

1

F1 score

hard cut number path LSPC

0.4 0.2

0 0

0.5

30

Hamming distance

Hamming distance

80 60

20

0 0

hard cut number path LSPC

0.5

1

1

Hamming distance

25 80 20 60

15 10

20

Noise level

0.5

Noise level

(c)–(d) F1 score.

40 10

0 0

1

Noise level

100

hard cut number path LSPC

0.6

0.2

Noise level

hard cut number path LSPC

0.8

0.4

(c)–(d) F1 score. Hamming distance

1

hard cut number path LSPC

5

0 0

0.5

1

Noise level

0 0

0.5

1

Noise level

40

hard cut number path LSPC

20 0 0

0.5

1

Noise level

(e)–(f) Hamming distance.

(e)–(f) Hamming distance.

Fig. 4: Localizing ball-shaped pieces in the Minnesota road network as a function of the noise level using hard thresholding (blue dashed line), cut-based localization (red solid line), path-based localization (yellow-circle line) and local-set-based piecewise-constant (purple-square line). Cut-based localization provides the best performance.

Fig. 5: Localizing elongated paths in the Minnesota road network as a function of noise level using hard thresholding (blue dashed line), cut-based localization (red solid line), path-based localization (yellow-circle line) and local-set-based piecewise-constant (purple-square line). Path-based localization provides the best performance.

b and the sum of the of true positives and false positives = |C| true positives and false negatives = |C|, we have

matching pursuit algorithm to choose one atom to localize an activated piece. We observe that: (1) Cut-based localization works significantly better than the other three approaches as expected; hard thresholding is a special case of cut-based localization, pathbased localization is designed for capturing elongated paths and LSPC is a data-independent dictionary and cannot adapt the shape of its atom to the given piece; (2) The noise level influences the localization performance; as it grows, the F1 score decreases and the Hamming distance increases. (3) The size of the activated piece influences the localization; localizing a piece with radius 10 is easier than localizing a piece with radius 5; this was also observed in [32]. Elongated path class. To generate an elongated path, we randomly choose two nodes as starting and ending nodes and compute the shortest path between these two. We look at two categories of path lengths: shorter than 15 and longer than 80 1 . Figures 5(a)–(b) show examples of paths with lengths 15 and

b C) = 2 F1 (C,

b b |C ∩ C| 2|C ∩ C| = . b b C) + |C ∩ C| b |C| + |C| dH (C,

The higher the F1 score the better the quality: the F1 score reaches its maximum value at F1 = 1 and minimum value at F1 = 0. The Hamming distance emphasizes the difference between two sets while the F1 score emphasizes their agreement; the higher the F1 score, or the lower the Hamming distance, the better the localization quality. Figures 4(c)–(d) show the F1 scores and Figures 4(e)–(f) the Hamming distances of localizing ball-shaped pieces with radii k = 5 and 10 (left and right columns, respectively) as functions of the noise variance. Each figure compares four methods: hard thresholding (5) (hard, blue dashed line), cutbased localization (6b) (cut number, red solid line), path-based localization (8) (path, yellow-circle line) and local-set-based piecewise-constant dictionary (LSPC, purple-square line) [14]. LSPC is a predesigned graph dictionary and we use the

1 The path length is measured by the geodesic distance between the two end nodes.

6

(a) Signal.

(b) Noisy signal.

(a) Signal.

(b) Noisy signal.

(c) Activated piece (cut).

(d) Activated piece (path).

(c) Activated piece (cut).

(d) Activated piece (path).

F1 score

F1 score

Hamming distance

Hamming distance

1 600 0.8

hard cut number path LSPC

0.6 0.4

hard cut number path LSPC

400

hard cut number path LSPC

0.8 0.6

100 0.4

200

0 0

0.5

1

0 0

0.5

1

0 0

hard cut number path LSPC

50

0.2

0.2

150

0.5

1

0 0

0.5

1

Noise level

Noise level

Noise level

Noise level

(e) F1 score.

(f) Hamming distance.

(e) F1 score.

(f) Hamming distance.

Fig. 6: Localizing (a) Central Park in the Manhattan street network as a function of the noise level from (b) its noisy version. (c) Activated piece obtained by cut-based localization with F1 = 0.81 and dH = 220. (d) Activated piece obtained by path-based localization with F1 = 0.16 and dH = 617. Cut-based localization outperforms path-based localization.

Fig. 7: Localizing (a) 5th Avenue in the Manhattan street network as a function of the noise level from (b) its noisy version. (c) Activated piece obtained by cut-based localization with F1 = 0.32 and dH = 120. (d) Activated piece obtained by path-based localization with F1 = 0.44 and dH = 231. Path-based localization outperforms cut-based localization.

91 (left and right columns, respectively). We aim to localize the activated pieces from noisy path graph signals. We vary the noise variance σ 2 from 0.1 to 1 with interval 0.1 and at each randomly generate 1, 000 noisy path graph signals at each noise level. We measure the quality of the localization by the Hamming distance and F1 score. Figures 5(c)–(d) show the F1 scores and Figures 5(e)–(f) the Hamming distances of localizing elongated-path pieces with lengths 15 and 91 (left and right columns, respectively) as functions of the noise variance. We compare the same four methods as in Figure 4: hard thresholding (5) (hard, blue dashed line), cut-based localization (6b) (cut number, red solid line), path-based localization (8) (path, yellow-circle line) and local-set-based piecewise-constant dictionary (LSPC, purplesquare line). We observe that: (1) When the path is short, cut-based localization and path-based localization provide similar performance; when the path is long, path-based localization outperforms cut-based localization. (2) The noise level influences the localization performance; as it grows, the F1 score decreases and the Hamming distance increases. Moreover, cut-

based localization is more robust to noise than the pathbased localization. (3) The path length does not influence the localization performance. This is different from what we observed in the ball-shaped case. Localization of elongated paths is more susceptible to noise and we can only localize multiple sections instead of a single, long path. Since we only extract the largest connected component, we usually obtain one section of a path, which is equivalent to localizing a short path. (4) Localizing a path is harder than localizing a ball-shaped piece. The performance drop in Figure 5 with added noise is much steeper than in Figure 4. 2) Manhattan street network: We model the Manhattan street network as a graph with intersections as nodes and city streets as edges. The graph includes 13, 679 nodes and 17, 163 undirected edges. We generate two classes of one-piece graph signals: Central Park and 5th Avenue. Central Park is considered a ball-shaped piece and 5th Avenue is considered a path. We then compare the results with those from the Minnesota road network to validate our conclusions. Figures 6(a)–(d) show a piece activating the nodes in Central Park, a noisy version with noise variance σ 2 = 1, the activated

7

F1 score

Hamming distance

LSPC PCD

0.8

400

0.6

300

0.4

200

0.2

100

0 0

0.5

0 0

1

Noise level

(a) Signal.

(b) Noisy signal.

LSPC PCD

0.5

1

Noise level

(a) F1 score.

(b) Hamming distance.

Fig. 9: Localizing multiple ZIP codes as a function of noise level. The proposed PCD significantly outperforms LSPC.

(c) Decomposition (LSPC).

(d) Decomposition (PCD).

Fig. 8: Localizing a piecewise-constant graph signal associated with ZIP codes 10128 and 10030. LSPC fails to localize ZIP code 10030, but PCD successfully localizes both ZIP codes. The F1 scores of LSPC and PCD are 0.41 and 0.57, respectively, and the Hamming distances of LSPC and PCD are 162 and 141, respectively.

piece provided by cut-based localization and the activated piece provided by path-based localization, respectively. We see that, even when the graph signal is corrupted by a high level of noise, cut-based localization still provides accurate localization, while path-based localization fails to localize Central Park. Figures 6(e)–(f) show the F1 score and Hamming distance when localizing Central Park as a function of the noise level, respectively. The results are averaged over 1, 000 runs. We see that, when the noise level is low, cut-based localization significantly outperforms the other methods, which is consistent with what we observed with the Minnesota road network. LSPC is less sensitive to noise and slightly outperforms the other methods when the noise level is high. Since LSPC is dataindependent, it chooses the most relevant predesigned atom to fit a noisy signal; on the other hand, as a data-adaptive method, cut-based localization designs an atom from the noisy signal and it easily fits noisy data. Overall, the localization performance of LSPC highly depends on the shape of its atoms: when LSPC has a predesigned atom matching the ground truth, it is robust to noise and provides effective localization performance; when LSPC does not have a predesigned atom matching the ground truth, it fails to localize well. Figures 7(a)–(d) show a piece activating the nodes along 5th Avenue, a noisy version with noise variance σ 2 = 0.5, the activated piece provided by cut-based localization and the activated piece provided by path-based localization, respectively. When the graph signal is corrupted by a high level of noise, both cut-based localization and path-based localization fail to localize 5th Avenue. Figures 7(e)–(f) show the F1 score and Hamming distance when localizing 5th Avenue as a function of the noise level, respectively. The results are averaged over

1, 000 runs. We see that the path-based localization is sensitive to noise: when the noise level is low, it outperforms the other method; when the noise level is high, cut-based localization performs similarly. This is similar to what we observed with the Minnesota road network. We also observe that localizing 5th Avenue is harder than localizing Central Park because the results in especially when the noise level is high, which is again consistent with what we saw in the Minnesota road network. III. S IGNAL D ECOMPOSITION ON G RAPHS We now extend the localization discussion by considering multiple activated pieces making it a decomposition problem. We extend the solver of the localization problem in Section II to the decomposition problem and validate it through simulations. Consider localizing K activated pieces Ci , i = 1, 2, . . . , K, in a noisy, piecewise-constant graph signal x

=

K X

µi 1Ci + ,

i=1

where 1C is the indicator function defined in (1), K  N , Ci are connected and  ∼ N (0, σ 2 IN ) is Gaussian noise. Our goal is to develop an algorithm to efficiently decompose such a graph signal into several pieces. The corresponding optimization problem is

2 K

X

µi 1Ci , subject to Ci ∈ C. min x − (11) µi ,Ci

i=1

2

A. Methodology The goal now is not only to P denoise or approximate a graph K signal, which would only give i=1 µi 1Ci . We aim to analyze the composition of a graph signal through decomposition; that is, we localize each piece and estimate its corresponding magnitude, yielding µi and Ci . Since a graph signal is decomposed into several one-piece components, we call this piecewiseconstant decomposition. To solve the optimization problem (11), we update each piece at a time by coordinate descent [45]. When we freeze the other variables, optimizing over µi and Ci is equivalent to localization with unknown magnitude in Section II-A2, because X x− µj 1Cj = µi 1Ci +  j6=i

8

Normalized MSE

Normalized MSE

0.8

0.8

0.6

0.6 0.4

0.4 0.2 0 0

LSPC PCD graph Fourier (A) graph Fourier (L)

10

20

0.2 0 0

30

# coefficients

LSPC PCD graph Fourier (A) graph Fourier (L)

10

20

30

# coefficients

(a) Signal.

(b) Approximation error.

(a) Signal.

(b) Approximation error.

(c) Approximation by three pieces.

(d) Approximation by nine pieces.

(c) Approximation by three pieces.

(d) Approximation by nine pieces.

Fig. 10: Decomposing the restaurant distribution in Manhattan using a piecewise-constant decomposition. is a one-piece graph signal. Thus, for each piece, we solve the localization problem (3) by using the localization solver (10), X µ∗i , Ci∗ = Locunknown (x − µj 1Cj ). j6=i

We iteratively update each piece until convergence and we obtain a series of solutions providing a local optimum [45]. B. Experimental Validation We validate the piecewise-constant decomposition solver on the Manhattan street network through two tasks: (1) We simulate a series of piecewise-constant graph signals by a linear combinations of selected ZIP codes with the goal to localize the selected ZIP codes. (2) We decompose two real graph signals in Manhattan, including restaurant distribution and taxi-pickup activity and analyze their composition. 1) Localizing ZIP codes: There are 43 ZIP codes in Manhattan. In each study, we randomly select two of those and generate a piecewise-constant graph signal, x = µ1 1ZIP1 + µ2 1ZIP2 + , where the signal strength is uniformly distributed µi ∼ U(0.5, 1.5); node set ZIPi indicates the nodes belonging to the selected ZIP code; and the noise  ∼ N (0, σ 2 I) with σ 2 varying from 0.1 to 1 with interval 0.1. At each noise level, we generate 1, 000 piecewise-constant graph signals. Figure 9(a) shows a piecewise-constant signal generated by ZIP codes 10022 and 10030 and (b) shows a noisy graph signal with noise variance σ 2 = 1. We use LSPC [14] and piecewise-constant decomposition (11) (PCD) to decompose this noisy graph signal into two activated pieces. Figures 9(c)– (d) show the results provided by LSPC and PCD, respectively. We see that LSPC fails to localize ZIP code 10030, but PCD

Fig. 11: Decomposing the taxi-pickup activity in Manhattan On Friday, June 5th, at 11 pm, using a piecewise-constant decomposition. successfully localizes both ZIP codes and provides acceptable localization performance. To quantify the localization performance, we again use the F1 score and the Hamming distance. Figures (9)(a)–(b) show the F1 score and Hamming distance when localizing ZIP codes in Manhattan as a function of noise level. The results at each noise level are averaged over 1, 000 runs. Again, we see that PCD provides significantly better performance than LSPC, but PCD is sensitive to noise, while LSPC is robust to noise. 2) Restaurant and taxi-pickups: We now analyze two real graph signals: restaurant density and taxi-pickup activities. We aim to understand their components through decomposition. Figure 10(a) shows the restaurant density including the positions of 10,121 restaurants in Manhattan. We project each restaurant to its nearest intersection and count the number of restaurants at each intersection. We use piecewise-constant decomposition to decompose this graph signal. Figures 10(c)– (d) show the decomposition by using three and nice pieces, respectively. Through the decomposition, we can get a rough idea about the distribution of restaurants. For example, areas around East Village and Times Square have more restaurants. By using more pieces, we can obtain a finer resolution and a better approximation. Figure 10(b) shows the approximation error where the xaxis is the number of expansion coefficients and the y-axis is the normalized mean square error. We compare four methods: nonlinear approximation by using graph Fourier basis based on the adjacency matrix [1] (graph Fourier (A), yellow-square line), nonlinear approximation by using graph Fourier basis based on the graph Laplacian matrix [3] (graph Fourier (A), purple-square line), sparse coding based on local-set-based piecewise-constant graph dictionary [14] (LSPC, blue dashed line)) and piecewise-constant decomposition (PCD, red solid

9

two East Village pieces is 0.59. This indicates that the taxipickup activity on Friday nights is highly correlated with the restaurant distribution, reporting a well-understood pattern of urban lifestyle.

(a) East Village taxi pickups.

(b) East Village restaurants.

IV. D ICTIONARY L EARNING ON G RAPHS We now extend signal decomposition, where we find activated pieces from a single graph signal, to dictionary learning where the aim is to find activated pieces that are shared by multiple graph signals. In other words, we learn activated pieces as building blocks that are used to represent multiple graph signals. We extend the solver of the decomposition problem in Section III to the dictionary learning problem and apply it to mine traffic patterns in the Manhattan data set. Consider a matrix of L graph signals as columns, X = D Z + E ∈ RN ×L ,  with D = 1C1 1C2 · · · 1CK the graph dictionary with a predefined number of K one-piece atoms as columns, the coefficient matrix Z ∈ RK×L is sparse, Ci are connected and Ei,j ∼ N (0, σ 2 ) is Gaussian noise. We store all the activated pieces in D as building blocks. Thus, each graph signal (column in X) is approximated by a linear combination of several pieces from D. We learn the pieces Ci from X by solving the following optimization problem,

  2 min X − 1C1 1C2 · · · 1CK Z 2 , (12) 

(c) Times Square pickups.

(d) Times Square restaurants.

Fig. 12: Both restaurant distribution and taxi-pickup activity on Friday nights activate the same areas around East Village and Times Square. The F1 score between the two Times Square pieces is 0.65 and the F1 score between the two East Village pieces is 0.59, indicating a high overlap.

line). We see that, by using the same number of expansion coefficients (pieces), piecewise-constant decomposition significantly outperforms the other methods and graph Fourier basis based on adjacency matrix [1] is better than using the graph Laplacian matrix [3]. In other words, piecewise-constant decomposition provides an effective data-adaptive and structurerelated representation. Figure 11(a) shows the taxi-pickup activity in Manhattan, on Friday, June 5th, at 11 pm. We project each taxi pickup to its nearest intersection and count the number of pickups at each intersection. We use piecewise-constant decomposition to decompose this graph signal. Figures 11(c)–(d) show the decomposition by using three and nine pieces, respectively. Through this decomposition, we get a rough idea about the distribution of taxi-pickup activity. For example, areas around East Village and Times Square are busier. Figure 11(b) shows the approximation error. Similarly to the results for the restaurant distribution, we see that using the same number of expansion coefficients (pieces), piecewise-constant decomposition significantly outperforms the other methods and graph Fourier basis based on the adjacency matrix [1] outperforms the one based on the graph Laplacian matrix [3]. An interesting observation (but not surprising) is that both the restaurant distribution and the taxi-pickup activity on Friday nights activate the areas around East Village and Times Square (see Figure 12). Each piece in Figure 12 comes from one of the three pieces in Figures 10(c) and 11(c); we observe that the corresponding pieces highly overlap. The F1 score between the two Times Square pieces is 0.65 and the F1 score between the

Z,Ci

subject to Ci ∈ C

and

kZk0,∞ ≤ S,

where the sparsity level S helps avoid overfitting and kZk0,∞ = maxi=1,··· ,N kzi k0 with zi the ith column in Z, which is the maximum number of the nonzero elements in each column. A. Methodology The goal here is to learn those pieces and the corresponding coefficients from a matrix of graph signals. It is also possible to use the decomposition techniques in the previous section to localize pieces from each graph signal and to store all the activated pieces into the dictionary, but this approach cannot reveal the correlations among graph signals. We use a learning approach to find common pieces in the graph signals as it allows the same piece to be repeatedly used as a basic building block to represent graph signals and reveals the correlations among them. To solve (12), we update the dictionary and the coefficient matrix successively; that is, given the dictionary, we optimize over the coefficient matrix and then, given the coefficient matrix, we optimize over the dictionary. When updating the coefficient matrix, we fix Ci and consider

  2 min X − 1C1 1C2 · · · 1CK Z F , (13) Z

subject to kZk0,∞ ≤ S. where k·kF denotes the Frobenius norm. This is a common sparse coding problem, which can be efficiently solved by orthogonal matching pursuit [46]. When updating the graph dictionary, we fix Z and consider

  2 min X − 1C1 1C2 · · · 1CK Z F , (14) Ci

subject to Ci ∈ C.

10

(a) Piece 1.

(b) Piece 2.

(c) Piece 3.

(d) Piece 4.

(e) Piece 5.

Fig. 13: Top five most frequently-used pieces learned from taxi-pickup activities during rush hours in 2015.

Similarly to the piecewise-constant decomposition, we iteratively update each piece. Let zi be the ith row of Z and P Rj = X − i6=j 1Ci zTi ∈ RN ×L . Then,

 

X − 1C1 1C2 · · · 1CK Z 2 (a) = F

(b) 2

Rj −1C zTj = j F   (c) T Tr Rj Rj − RTj 1Cj zTj − zj 1TCj Rj +zj 1TCj 1Cj zTj =   T Tr RTj Rj + zTj zj 1 − 2 Rj zj 1Cj , where 1C is the indicator function defined in (1), (a) follows from the denotation of Rj , (b) from the definition of Frobenius norm and the trace operator, and (c) from the property Tr(X Y) = Tr(Y X). Since the first term is a constant, to update each piece, we freeze all the other pieces and solve T min zTj zj 1 − 2 Rj zj 1Cj , subject to Ci ∈ C, Ci

where the objective function is an inner product of two vectors. This is equivalent to solving (4) by setting x = Rj zj /zTj zj ; that is,

2

R z

2 1

j j

T − 1Ci =

T 2 Rj zj − zj zj 1Ci 2 = T

zj zj

zj zj 2  1 T T T 2 (Rj zj ) (Rj zj ) − 2(Rj zj ) zj zj 1Ci T zj zj  +(zTj zj )2 1TCi 1Ci =   zTj RTj Rj zj 1 T T T z zj 1Ci 1Ci − 2(Rj zj ) 1Ci , 2 + T zj zj j zTj zj where zTj zj is a scalar and 1TCi 1Ci = 1T 1Ci because 1Ci is a binary vector. In other words, we update each piece in the graph dictionary by using the localization solver (9). All the pieces are updated iteratively by coordinate descent until convergence. We obtain a local optimum [45]. B. Experimental Validation We use the proposed dictionary learning techniques (13), (14) to mine traffic patterns in Manhattan. We use a public dataset of taxi pickups in Manhattan2 , which is particularly interesting because taxis are valuable 2 Data

from http://www.nyc.gov/html/tlc/html/about/trip record data.shtml.

sensors of city life [47], [48], [49]. The information associated with taxi pickups provides insight into economic activity, human behavior and mobility patterns. We use the dataset for 2015 and focus on the rush hour periods (6-8 pm). We accumulate taxi-pickup activities within an hour and obtain 1, 095 (365 × 33 ) graph signals. Each graph signal thus shows taxi-pickup activity at each intersection in Manhattan at a specific hour. We next mine taxi-pickup patterns through two tasks: detecting events in Manhattan and checking whether weekdays and weekends exhibit differences in traffic patterns. 1) Event detection: We consider two types of events: common events and special events. When a place is frequently or periodically crowded, we consider it a common event, which shows a natural traffic behavior. When a place is rarely crowded, we consider it a special event, which shows an anomaly in traffic behavior. Traffic accidents or holiday celebrations usually lead to special events. We are going to use the learned pieces and their corresponding coefficients to analyze the taxi-pickup activities during rush hours. We set the size of the dictionary to K = 500 and its sparsity to S = 30. This means that K = 500 activated pieces are learned from L = 1, 095 graph signals of dimension N = 13, 679 (13, 679 intersections in Manhattan) and each graph signal is approximated by at most 30 pieces. When a piece is used to represent a graph signal, it means that the corresponding area is particularly crowded at a specific time and we need this piece to capture this traffic information. On the other hand, when a piece is not used, it just means that the corresponding area is not particularly crowded compared to other areas; it does not necessarily mean that there are no taxi pickups in the corresponding area because those taxi pickups may activate other related pieces. Common events. Common events show natural traffic behaviors, which are detected by frequently-used pieces. When a row in the coefficient matrix has many nonzero entries, the corresponding piece is frequently used to represent graph signals. The top five most frequently-used pieces learned from taxi-pickup activities during rush hours are shown in Figure 13. For example, Figure 13(b) shows that a small area around Penn Station is frequently crowded and we use Piece 2 to capture this information. We then check which day uses this piece. Figure 14(b) shows a histogram of the usage, where we see that Piece 2 is more frequently used on Sundays (as well as Fridays and Saturdays) during the entire 2015. This indicates 33

indicates three hours (6 pm, 7 pm, 8 pm)

11

(a) Piece 2 Penn Station.

(b) Histogram.

(a) Piece Apple Store, 5th Avenue.

(b) Zoom-in plot.

(c) Taxi-pickup activity at 8 pm, July 3rd.

(d) Approximation by 30 pieces from the dictionary.

Fig. 14: When is Penn Station particularly crowded compared to other places? (a) Precise locations of Piece 2. (b) Histogram indicating that Penn Station is significantly more crowded than other places during weekends.

Fig. 16: 8 pm on July 3rd activates the 377th piece, indicating that the 5th Ave. Apple store area is particularly crowded. (a) Piece W 82nd Street.

(b) Zoom-in plot.

(c) Taxi-pickup activity at 9 pm, Nov. 25th.

(d) Approximation by 30 pieces from the dictionary.

Fig. 15: Inflation of the balloons the day before Macy’s Thanksgiving Parade is detected. 6-8 pm on Nov 25th activates the 466th piece, indicating locations around W 82nd Street are particularly crowded. that, compared to other places in Manhattan, Penn Station is particularly crowded during weekends. This is intuitive due to a large number of commuters going through Penn Station and visiting the nearby Madison Square Garden on weekends. Special events. Special events show anomalous traffic behaviors, which are detected by the rarely-used pieces. When a row in the coefficient matrix has a few nonzero entries, the corresponding piece is rarely used to represent graph signals. Figure 15(a) shows the 466th piece in the graph dictionary activating the area around West 82nd street and (b) shows a zoom-in plot. This piece is only used three times during the entire year of 2015, between 6-8 pm on November 25th, the

night before Thanksgiving Day, indicating that the area around West 82nd street was much more crowded compared to other areas on that night only. Figures 15(c)–(d) show the taxi-pickup activities at 9 pm on November 25th and its approximation by using the graph dictionary, respectively. We thus successfully detect the inflation of the balloons for the Macy’s Thanksgiving parade that happens on 77th and 81st streets between Columbus and Central Park West, purely from the taxi-pickup activity. Figure 16(a) shows the 377th piece in the graph dictionary activating the area around Apple Store on 5th Avenue and (b) shows a zoom-in plot. This piece is only used one time during the entire year of 2015, that is, 8 pm on July 3rd, the night before Independence Day. This indicates that the area around Apple Store on 5th Avenue was much more crowded compared to other areas on that night only. Figures 16(c)–(d) show the taxi-pickup activities at 8 pm on July 3rd and its approximation by using the graph dictionary, respectively. 2) What day is today in Manhattan?: Are traffic patterns on weekends different from traffic patterns on weekdays? We use the learned graph dictionary to answer this question. Graph dictionary learning not only provides traffic-correlated pieces, but also extracts traffic-correlated features through approximation. Similarly to principal component analysis, graphdictionary-based sparse representation corresponds to unsupervised learning; it reduces the dimension of the representation, thereby extracting key information; however, as a general method, principal component analysis is unaware of the graph structure, while graph-dictionary-based sparse representation extracts traffic-correlated features based on the graph structure. Since there are 500 pieces in the graph dictionary, we reduce the dimension of each graph signal from 13, 679 to 500 and the corresponding 500 expansion coefficients are traffic-correlated

12

features.

Accuracy 60 50 40 30 20 PCA gDictionary Average

10 0 0

2

4

6

8

Month Fig. 17: Classifying to which day a graph signal belongs. Graph-dictionary-based sparse representation significantly outperforms principal component analysis. Now we consider a 7-class classification task: given a graph signal, we aim to identify to which day of the week it belongs by using traffic-correlated features. Figure 17 shows the classification accuracy as a function of training data, where the x-axis is the number of months used in training and the yaxis is the classification accuracy. For example, when the month is x = 1, we use 93 (31×3) graph signals in January as training data and use the remaining 1002 graph signals as testing data. Since this task involves 7 classes, the classification accuracy of a random guess is 14.29%. We compare three methods: principal component analysis (PCA, blue dotted line), average value of each graph signal (Average, yellow-circle line) and the graph-dictionary-based sparse representation (gDictionary, red solid line). We see that graph-dictionary-based sparse representation significantly outperforms principal component analysis and the average value and the classification accuracy of graph-dictionary-based sparse representation increases as the number of training data grows. We now fix January graph signals as training data and the graph signals in the remaining 11 months as testing data; Figure 18 shows classification confusion matrix.4 We see that it is relatively easy to tell apart whether the day is a weekday or a weekend day, as the two groups are not confused with each other. There is also relatively little confusion between a Saturday and a Sunday. In contrast, weekdays are easily confused with each other indicating similar traffic patterns. We further conduct a series of pairwise classifications to verify whether traffic patterns on weekdays are different from traffic patterns on weekends. We still fix graph signals in January as training data and graph signals in the remaining 11 months as testing data, but we only consider distinguishing one weekday from another weekday at a time. Figure 19 shows the resulting pairwise classification accuracies. We see that the accuracy of telling Thursdays apart from Sundays is around 90%, but the accuracy of telling Mondays apart from Tuesdays is merely above random. It is significantly easier to distinguish Sundays from weekdays. 4 Given c classes and confusion matrix M ∈ Rc×c , element M i,j is a count of samples that belong to class i but are classified as class j. Perfect classification yields an identity confusion matrix.

Fig. 18: Classification confusion matrix using January graph signals as training data showing that it relatively easy to tell weekdays from weekends apart.

Fig. 19: Pairwise classifications showing that it is significantly easier to distinguishing Sundays from weekdays. 3) Discussion: We considered event detection and day-ofthe-week classification mined from taxi-pickup activities in Manhattan. In event detection, we were able to successfully detect the locations and times of both common and special events by using the graph dictionary learned from a spatio-temporal taxi-pickup data volume. In day-of-the-week classification, we were able to significantly outperform principal component analysis and provide insight into the traffic patterns on weekdays and weekends by using the same graph dictionary. These results suggest that the proposed graph dictionary learning techniques are a proising tool for exploring mobility patterns in a large volume of spatio-temporal urban data, which may aid in urban planning and traffic forecasting. V. C ONCLUSIONS We studied three critical problems allowing piecewiseconstant graph signals to serve as a tool for mining large amounts of complex data: graph signal localization, graph signal decomposition and graph dictionary learning. We used piecewise-constant graph signals to model local information in the vertex-domain and showed that decomposition and dictionary learning are natural extensions of localization. For each of these three problems, we proposed a specific graph signal model, an optimization problem and an efficient solver. We conducted extensive validation studies on diverse datasets. The results show that cut-based localization is good at localizing ball-shaped classes and path-based localization is good at localizing elongated path class. We also used proposed methods to analyze taxi-pickup activity in Manhattan in 2015 and showed that based on these, we can successfully detect

13

both common and special events as well as tell apart weekdays from weekends. These findings validate the effectiveness of the proposed methods and suggest that graph signal processing tools may aid in urban planning and traffic forecasting. R EFERENCES [1] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656, Apr. 2013. [2] A. Sandryhaila and J. M. F. Moura, “Big data processing with signal processing on graphs,” IEEE Signal Process. Mag., vol. 31, no. 5, pp. 80–90, Sept. 2014. [3] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. 30, pp. 83–98, May 2013. [4] A. Anis, A. Gadde, and A. Ortega, “Efficient sampling set selection for bandlimited graph signals using graph spectral proxies,” IEEE Trans. Signal Process., 2015, Submitted. [5] S. Chen, R. Varma, A. Sandryhaila, and J. Kovaˇcevi´c, “Discrete signal processing on graphs: Sampling theory,” IEEE Trans. Signal Process., vol. 63, no. 24, pp. 6510–6523, Dec. 2015. [6] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Sampling of graph signals with successive local aggregations,” IEEE Trans. Signal Process., vol. 64, pp. 1832–1843, Dec. 2015. [7] S. Chen, R. Varma, A. Singh, and J. Kovaˇcevi´c, “Signal recovery on graphs: Fundamental limits of sampling strategies,” IEEE Trans, Signal and Inform. Process. over Networks, 2016, Submitted. [8] S. Chen, A. Sandryhaila, J. M. F. Moura, and J. Kovaˇcevi´c, “Signal recovery on graphs: Variation minimization,” IEEE Trans. Signal Process., vol. 63, no. 17, pp. 4609–4624, Sept. 2015. [9] S. Chen, F. Cerda, P. Rizzo, J. Bielak, J. H. Garrett, and J. Kovaˇcevi´c, “Semi-supervised multiresolution classification using adaptive graph filtering with application to indirect bridge structural health monitoring,” IEEE Trans. Signal Process., vol. 62, no. 11, pp. 2879–2893, June 2014. [10] S. K. Narang, Akshay Gadde, and Antonio Ortega, “Signal processing techniques for interpolation in graph structured data,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Vancouver, May 2013, pp. 5445–5449. [11] X. Zhu and M. Rabbat, “Approximating signals supported on graphs,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Kyoto, Mar. 2012, pp. 3921–3924. [12] D. Thanou, D. I. Shuman, and P. Frossard, “Learning parametric dictionaries for signals on graphs,” IEEE Trans. Signal Process., vol. 62, pp. 3849–3862, June 2014. [13] N. Tremblay and P. Borgnat, “Subgraph-based filterbanks for graph signals,” IEEE Trans. Signal Process., vol. 64, pp. 3827–3840, Mar. 2016. [14] S. Chen, T. Ji, R. Varma, A. Singh, and J. Kovaˇcevi´c, “Signal representations on graphs: Tools and applications,” IEEE Trans. Signal Process., Mar. 2016, Submitted. [15] A. Agaskar and Y. M. Lu, “A spectral graph uncertainty principle,” IEEE Trans. Inf. Theory, vol. 59, no. 7, pp. 4338–4356, July 2013. [16] M. Tsitsvero, S. Barbarossa, and P. D. Lorenzo, “Signals on graphs: Uncertainty principle and sampling,” arXiv:1507.08822, 2015. [17] S.l K. Narang, G. Shen, and A. Ortega, “Unidirectional graph-based wavelet transforms for efficient data gathering in sensor networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Dallas, TX, Mar. 2010, pp. 2902–2905. [18] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Appl. Comput. Harmon. Anal., vol. 30, pp. 129–150, Mar. 2011. [19] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “Vertex-frequency analysis on graphs,” Appl. Comput. Harmon. Anal., February 2015, To appear. [20] D. I. Shuman, M. J. Faraji, , and P. Vandergheynst, “A multiscale pyramid transform for graph signals,” IEEE Trans. Signal Process., vol. 64, pp. 2119–2134, Apr. 2016. [21] N. Tremblay and P. Borgnat, “Graph wavelets for multiscale community mining,” IEEE Trans. Signal Process., vol. 62, pp. 5227–5239, Oct. 2014. [22] X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov, “Clustering on multi-layer graphs via subspace analysis on Grassmann manifolds,” IEEE Trans. Signal Process., vol. 62, no. 4, pp. 905–918, Feb. 2014. [23] P.-Y. Chen and A.O. Hero, “Local Fiedler vector centrality for detection of deep and overlapping communities in networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Florence, 2014, pp. 1120–1124.

[24] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, July 2009. [25] C. Faloutsos, “Large graph mining: patterns, cascades, fraud detection, and algorithms,” in Proc. Int. World Wide Web Conf., 2014, pp. 1–2. [26] A. Beutel, L. Akoglu, and C. Faloutsos, “Fraud detection through graphbased user behavior modeling,” in Proc. ACM Conf. Comput. Commun. Security, 2015, pp. 1696–1697. [27] E. Arias-Castro, E. J. Cand`es, and A. Durand, “Detection of an anomalous cluster in a network,” Ann. Stat., vol. 39, no. 1, pp. 278– 304, 2011. [28] C. Hu, L. Cheng, J. Sepulcre, G. E. Fakhri, Y. M. Lu, and Q. Li, “Matched signal detection on graphs: Theory and application to brain network classification,” in Proc. Int. Conf. Inf. Process. Med. Imaging, Pacific Grove, CA, 2013. [29] J. Sharpnack, A. Rinaldo, and A. Singh, “Changepoint detection over graphs with the spectral scan statistic,” in Proc. Artif. Intell. and Stat., 2013. [30] J. Sharpnack, A. Krishnamurthy, and A. Singh, “Detecting activations over graphs using spanning tree wavelet bases,” in AISTATS, Scottsdale, AZ, Apr. 2013. [31] J. Sharpnack, A. Krishnamurthy, and A. Singh, “Near-optimal anomaly detection in graphs using Lovasz extended scan statistic,” in Proc. Neural Information Process. Syst., 2013. [32] S. Chen, Y. Yang, S. Zong, A. Singh, and J. Kovaˇcevi´c, “Detecting structure-correlated attributes on graphs,” IEEE Trans. Signal Process., 2016, Submitted. [33] M. Vetterli, J. Kovaˇcevi´c, and V. K. Goyal, Foundations of Signal Processing, Cambridge University Press, Cambridge, 2014, http://foundationsofsignalprocessing.org. [34] R. R. Coifman and M. Maggioni, “Diffusion wavelets,” Appl. Comput. Harmon. Anal., pp. 53–94, July 2006. [35] S. K. Narang and A. Ortega, “Perfect reconstruction two-channel wavelet filter banks for graph structured data,” IEEE Trans. Signal Process., vol. 60, pp. 2786–2799, June 2012. [36] M. Gavish, B. Nadler, and R. R. Coifman, “Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning,” in Proc. Int. Conf. Mach. Learn., Haifa, Israel, June 2010, pp. 367–374. [37] M. Crovella and E. Kolaczyk, “Graph wavelets for spatial traffic analysis,” in Proc. IEEE INFOCOM, Mar. 2003, vol. 3, pp. 1848–1857. [38] K. Engan, S. O. Aase, and J. H. Husøy, “Method of optimal directions for frame design,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Phoenix, AZ, Mar. 1999, pp. 2443–2446. [39] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006. [40] J. Mairal, F. R. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for sparse coding,” in Proc. Int. Conf. Mach. Learn., Montreal, June 2009, pp. 689–696. [41] M. Newman, Networks: An Introduction, Oxford University Press, 2010. [42] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 11, pp. 1222–1239, Nov. 2001. [43] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cuts?,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 2, pp. 147–159, Feb. 2004. [44] R. Duda, P. Hart, and D. Stork, Pattern Classification, John Wiley & Sons, Englewood Cliffs, NJ, 2001. [45] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, NY, 2004. [46] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Proc. Asilomar Conf. Signal, Syst. Comput., Pacific Grove, CA, Nov. 1993, vol. 1, pp. 40–44. [47] N. Ferreira, J. Poco, H. T. Vo, J. Freire, and C. T. Silva, “Visual exploration of big spatio-temporal urban data: A study of New York City taxi trips,” IEEE Trans. Vis. Comput. Graphics, vol. 19, no. 12, pp. 2149–2158, Dec. 2013. [48] H. Doraiswamy, N. Ferreira, T. Damoulas, J. Freire, and C. T. Silva, “Using topological analysis to support event-guided exploration in urban data,” IEEE Trans. Vis. Comput. Graphics, vol. 20, no. 12, pp. 2634– 2643, 2014. [49] J. A. Deri and J. M. F. Moura, “Taxi data in New York City: A network perspective,” in Proc. Asilomar Conf. Signal, Syst. Comput., Pacific Grove, CA, Nov. 2015, pp. 1829–1833.

Suggest Documents