Discovering the Fourier Transform: A Tutorial on Circulant Matrices

0 downloads 0 Views 1MB Size Report
May 15, 2018 - generalization to higher dimensions. ... In more formal language, circulant matrices represent a class of mutually ... The following notation can be used to describe this formally .... which could be an alternative statement of the Lemma. ... the relation Sw = λw will reveal that an eigenvector w has a very ...
Discovering the Fourier Transform: A Tutorial on Circulant Matrices, Circular Convolution, and the DFT

arXiv:1805.05533v1 [eess.SP] 15 May 2018

Bassam Bamieh Mechanical Engineering ([email protected]) University of California, Santa Barbara

May 10, 2018

Abstract How could the Fourier transform be discovered if one didn’t know it? In the case of the Discrete Fourier Transform (DFT), we show how it arises naturally out of analysis of circulant matrices. In particular, the DFT can be derived as the change of basis that simultaneously diagonalizes all circulant matrices. Thus the DFT arises naturally from a linear algebra question. Rather than thinking of the DFT as a signal transform, it is more natural to think of it as a change of basis that renders a certain set of linear operations into a simple, diagonal form.

The Fourier transform in all its forms is ubiquitous. Its many useful properties are introduced early on in Mathematics, Science and Engineering curricula [1]. Typically, it is introduced as a transformation on functions and signals, and then its many useful and remarkable properties are derived. Those properties are then shown to be very effective in solving certain differential equations, or to analyze the action of time-invariant linear dynamical systems, amongst many other uses. To the student, the effectiveness of the Fourier transform in solving these problems may seem magical at first, before familiarity eventually suppresses that initial sense of wonder. In this tutorial, I’d like to step back to before one is shown the Fourier transform, and ask the following question: How would one naturally discover the Fourier transform rather than have it be postulated? The above question is interesting for several reasons. First, it is more intellectually satisfying to introduce a new mathematical object from familiar and well-known objects rather than having it postulated “out of thin air”. In this tutorial we demonstrate how the DFT arises naturally from the problem of simultaneous diagonalization of all circulant matrices, which share symmetry properties that enable this diagonalization. It should be noted that simultaneous diagonalization of any class of linear operators or matrices is the ultimate way to understand their actions, by reducing the entire class to the simplest form of linear operations (diagonal matrices) simultaneously. The same procedure can be applied to discover the other related transforms, namely the Fourier Transform, the z-Transform and Fourier Series. All can be arrived at by simultaneously diagonalizing a respective class of linear operators that obey their respective symmetry rules. To make the point above, and to have a concrete discussion, we consider in this tutorial only the case of circulant matrices, and later on circulant operators which are the natural generalization to higher dimensions. This case is also particularly useful because it yields the DFT, which is the computational workhorse for all Fourier-type analysis.

1

Given an ordered set of n numbers (an n-tuple, or an n-vector) x := (x0 , . . . , xn−1 ), define the associated matrix Cx whose first column is made up of these numbers, and each subsequent column is obtained by a circular shift of the previous column   x0 xn-1 xn-2 · · · x1  x1 x0 xn-1 x2     x1 x0 x3  Cx :=  x2 (1) .  . ..  . . . . .  . . . . xn-1 xn-2 xn-3 · · · x0

Note that each row is also obtained from the pervious row by a circular shift. Thus the entire matrix is completely determined by any one of its rows or columns. Such matrices are called circulant. They are a subclass of Toeplitz matrices, and have very special properties due to their intimate relation to the Discrete Fourier Transform (DFT) and circular convolution. Given an n-vector x as above, its DFT x ˆ is another n-vector defined by the expression x ˆk :=

n−1 X



xl e−i n kl ,

k = 0, 1, . . . , n-1.

(2)

l=0

A remarkable fact is that given a circulant matrix Cx , its eigenvalues are precisely the set of complex numbers {ˆ xk }, i.e. the DFT of the vector x that defines the circulant matrix Cx . There are many ways to derive this conclusion and other beautiful properties of the DFT, and its connections with circulant matrices and circular convolutions. Most treatments start with the definition (2) of the DFT, from which many of its seemingly magical properties are easily derived. To restate the goal of this tutorial, the question we ask here is: what if we didn’t know what the DFT is? How can we arrive at it in a natural manner without having to guess its form? There is a natural way to think about this problem. Given a class of matrices or operators, one asks if there is a transformation, a change of basis, in which their matrix representations all have the same structure such as diagonal, block diagonal, etc. The simplest such scenario is when a class of matrices can be simultaneously diagonalized with the same transformation. Since diagonalizing transformations are made up of eigenvectors of a matrix, then a set of matrices is simultaneously diagonalizable iff they share a full set of eigenvectors. An equivalent condition is that they each are diagonalizable, and they all mutually commute. Therefore given a mutually commuting set of matrices, by finding their shared eigenvectors, one finds that special transformation that simultaneously diagonalizes all of them. Thus, finding the “right transform” for a particular class of operators amounts to identifying the correct eigenvalue problem (which turns out to be relatively easy), and then calculating the eigenvectors, which then yield the transform. An alternative but complementary view of the above procedure involves describing the class of operators using some underlying common symmetry. For example, circulant matrices such as (1) have a shift invariance property with respect to circular shifts of vectors. This can also be described as having a shift-invariant action on vectors over Zn (the integers modulo n), which is also equivalent to having a shift-invariant action on periodic functions (with period n). In more formal language, circulant matrices represent a class of mutually commuting operators that also commute with the action of the group Zn . A basic shift operator generates that group, and the eigenvalue problem for that shift operator yields the DFT. This approach has the advantage of being generalizable to more complex symmetries that can be encoded in the action of other, possibly non-commutative, groups. These 2

techniques are part of the theory of group representations. However, we adopt here the approach described in the previous paragraph, which uses familiar Linear Algebra language and avoids the formalism of group representations. None the less, the two approaches are intimately linked. Perhaps the present approach can be thought of as a “gateway” treatment on a slippery slope to group representations [2, 3] if the reader is so inclined. Finally, it should be noted that the same linear algebra language used in the present approach generalizes to the other related transforms, namely the Fourier transform, Fourier series, and the z-transform. They can all be discovered by understanding the class of shift-invariant operators involved, and then identifying the basic eigenvalue problem whose solutions yields the correct transform.

1

Modular Arithmetic, Zn , and Circular Shifts

To understand the symmetry properties of circulant matrices, it is useful to first study and establish some simple properties of the set Zn := {0, 1, · · · , n − 1} of integers modulo n. The arithmetic in Zn is modular arithmetic, that is, we say k equals l modulo n if k − l is an integer multiple of n. The following notation can be used to describe this formally k = l (mod n) or k ≡n l

⇐⇒

∃i ∈ Z, s.t.

k−l = i n

Thus for example n ≡n 0, and n + 1 ≡n 1 and so on. There are two equivalent ways to define (and think) about Zn , one mathematically formal and the other graphical. The first is to consider the set of all integers Z and regard any two integers k and l such that k − l is a multiple of n as equivalent, or more precisely as members of the same equivalence class. The infinite set of integers Z becomes a finite set of equivalence classes with this equivalence relation. This is illustrated in Figure 1 where elements of Zn are arranged in “vertical bins” 2⇡

-2

n+1 n 2n-1

-1

-n 0

1-n 1

n-2 n-1

n

n+1

2n-2 2n-1

2 ⌘n n+2

2⇡

ei n

1 ⌘n n+1

1 0 n-1 -n+1 -n -1

(a)

ei n 2

2⇡/n

0 ⌘n n n-1 ⌘n -1

2⇡

e-i n

n-2 ⌘n -2

(c)

(b)

0 2⇡

e-i n 2

(d)

Figure 1: (a) Definition of Zn as the decomposition of the integers Z into equivalence classes each indicated as a “vertical bin”. Two integers in Z belong to the same equivalence class (and represent the same element of Zn ) if they differ by an integer multiple of n. (b) Another depiction of the decomposition where two integers that are vertically aligned in this figure belong to the same equivalence class. The arithmetic in Zn is just angle addition in this diagram. For example, (−1) + (n + 1) ≡n 0 ≡n n. (c) Using the set {0, 1, · · · , n − 1} as Zn . A few equivalent members are shown, and the arithmetic of Zn is just angle addition here. (d)  The nth roots of unity ρm := exp i 2π m lying on the unit circle in the complex plane. Identifying ρm with n m ∈ Zn shows that complex multiplication on {ρm } (which corresponds to angle addition) is equivalent to modular addition in Zn .

which are the equivalence classes. Each equivalence class can be identified with any of its members. One choice is to identify the first one with the element 0, the second one with 1, and so on up to the n’th class identified with the integer n − 1. Figure 1 also shows how elements of Zn can be arranged on a discrete circle so that the arithmetic in Zn is identified 3

with angle addition. One more useful isomorphism is between Zn and the nth roots of 2π unity ρm := ei n m , m = 0, . . . , n − 1. The complex numbers {ρm } lie on the unit circle each at a corresponding angle of 2π n m counter-clockwise from the real axis (Figure 1.d). Complex multiplication on {ρm } corresponds to addition of their corresponding angles, and the mapping ρm → m is an isomorphism from complex multiplication on {ρm } to modular arithmetic in Zn . Using modular arithmetic, we can write down the definition of a circulant matrix (1) by specifying the ij’th entry1 of the matrix Cx as (Cx )ij := xi−j ,

i, j ∈ Zn ,

(3)

where we use (mod n) arithmetic for computing i − j. It is clear that with this definition, the first column of Cx is just the sequence x0 , x1 , · · · , xn−1 . The second column is given by the sequence {xi−1 } and is thus x−1 , x0 , · · · , xn−2 , which is exactly the sequence xn−1 , x0 , · · · , xn−2 , i.e. a circular shift of the first column. Similarly each subsequent column is a circular shift of the column preceding it. Finally, is useful to visualize an n-vector x := (x0 , . . . , xn−1 ) as a set of numbers arranged at equidistant points along a circle, or equivalently as a function on the discrete circle. This is illustrated in figure 2. Note the difference between this figure and Figure 1, which depicts x4

x3

x5 x4

x2

x5

x1

x3

x2

x0 xn xn

(a)

1

xn

2

x1 x0 xn

xn-2 x x3 x xn-1 0 x1 2 1

-1

0

1

xn-2 x xn-1 0 x1

t

n-2 n-1 n n+1

2

(c)

2

(b)

Figure 2: A vector x := (x0 , . . . , xn−1 ) visualized as (a) a set of numbers arranged counter-clockwise on a discrete circle, or equivalently (b) as a function x : Zn −→ C on the discrete circle Zn . (c) An n-periodic function on the integers Z can equivalently be viewed as a function on Zn as in (b).

the elements of Zn and modular arithmetic. Figure 2 instead depicts vectors as a set of numbers arranged in a discrete circle, or as functions on Zn . A function on Zn can also be thought of as a periodic function (with period n) on the set of integers Z (Figure 2.c). In this case, periodicity of the function is expressed by the condition xt+n = xt

for all t ∈ Z.

(4)

It is however more natural to view periodic functions on Z as just functions on Zn . In this case, periodicity of the function is simply “encoded” in the modular arithmetic of Zn , and condition (4) does not need to be explicitly stated. 1

Here, and in this entire appendix, matrix rows and columns are indexed from 0 to n − 1 rather than the more traditional 1 through n indexing. This alternative indexing significantly simplifies notation, and corresponds more directly to modular arithmetic.

4

1.1

Symmetry Properties of Circulant Matrices

Amongst all circulant matrices, there is a special one. Let S and its adjoint S ∗ be the circular shift operators defined by the following action on vectors T T   S x0 · · · xn-2 xn-1 = xn-1 x0 · · · xn-2 T T   S ∗ x0 x1 · · · xn-1 = x1 · · · xn-1 x0 .

S is therefore called the circular right-shift operator while S ∗ is the circular left-shift operator. It is clear that S ∗ is the inverse of S, and it is easy to show that it is the adjoint of S. The latter fact also becomes clear upon examining the matrix representations of S and S ∗           0 1 x1 xn-1 x0 0 1 x0 1   x1   x0   .. ..   x1   ..            . .  S∗x =  Sx =  . =  . ,    ..  =  .  ,   . . . . . .   . .  .   .  1  .  x  n-1

1

0

xn-2

xn-1

1

0

xn-1

x0

which shows that S ∗ is indeed the transpose (and therefore the adjoint) of S. Note that both matrix representations are circulant matrices since S = C(0,1,0,...,0) and S ∗ = C(0,...,0,1) in the notation of (1). The actions of S and S ∗ expressed in terms of vector indices are (Sx)k := xk−1 ,

(S ∗ x)k := xk+1 ,

k ∈ Zn ,

(5)

where modular arithmetic is used for computing vector indices. For example (Sx)0 = x0−1 ≡n xn-1 . An important property of S is that it commutes with any circulant matrix. To see this, note that the matrix representation of S implies its ij’th entry is given by (S)ij = δi−j−1 . Now let Cx be any circulant matrix, and observe that X X X (SCx )ij = Sil (Cx )lj = δi−l−1 xl−j = δ(i−1)−l xl−j = xi−1−j , l

(Cx S)ij =

X

l

(Cx )il Slj =

l

X

l

xi−l δl−j−1 =

l

X

xi−l δl−(j+1) = xi−j−1 ,

l

where (3) is used for the entries of Cx . Thus S commutes with any circulant matrix. The converse is also true (see Exercise 1), and we state these conclusions in the next lemma. Lemma 1 A matrix M is circulant iff it commutes with the circular shift operator S, i.e. SM = M S. Note a simple corollary that a matrix is circulant iff it commutes with S ∗ since SM = M S

⇐⇒

S ∗ SM S ∗ = S ∗ M S S ∗

⇐⇒

M S ∗ = S ∗ M,

which could be an alternative statement of the Lemma. The fact that a circulant matrix commutes with S could have been used as a definition of a circulant matrix, with the structure in (1) derived as a consequence. Commutation with S also expresses a shift invariance property. If we think of an n-vector x as a functions on Zn (Figure 2.b), then SM x = M Sx means that the action of M on x is shift invariant. Geometrically, Sx is a counter-clockwise rotation of the function x in Figure 2.b. S (M x) = M (Sx) means that rotating the result of the action of M on x is the same as rotating x first and then acting with M . This property is illustrated graphically in Figure 3. 5

2 6 6 6 4

xn

x2

3

x0 x1 .. . 1

y2 x1

7 7 7= 5

y1

M

x0

yn-1

xn-1 yn-2

xn-2

S 2

3

x1 6 .. 7 6 . 7 6 7= 4xn 1 5 x0

y0

S x2

y2

x1

y1

x nx n-

1

M

2

6 6 =6 4

3

y0 y1 .. . yn

1

7 7 7 5

3 y1 6 .. 7 6 7 1 y n- = 6 . 7 4 yn 1 5 2 y ny0

y0

x0

2

2

Figure 3: Illustration of the circular shift-invariance property of the matrix-vector product y = M x in a commutative diagram. Sx is the circular shift of a vector, depicted also as a clockwise rotation of the vector components arranged on the discrete circle. A matrix M has the shift invariance property if SM = M S. In this diagram, this means that the action of M on the rotated vector Sx is equal to acting y = M x on x with M first, and then rotating the resulting vector to yield Sy. A matrix M has this shift-invariance property iff it is circulant.

2

Simultaneous Diagonalization of all Circulant Matrices Yields the DFT

In this section, we will derive the DFT as a byproduct of diagonalizing circulant matrices. Any two (diagonalizable) matrices that commute have the same eigenvectors, but possibly different eigenvalues. Since all circulant matrices mutually commute, they all have the same eigenvectors2 , and they only differ in their eigenvalues. Therefore, if we find the eigenvectors of any one particular circulant matrix, then we have found the eigenvectors of all circulant matrices. A diagonalizing transformation is made up of the eigenvectors of a matrix, we can then simultaneously diagonalize all circulant matrices with one transformation made up of those found eigenvectors. The shift operator is in some sense the most fundamental circulant matrix, and is therefore a good candidate for an eigenvector/eigenvalue decomposition. The eigenvalue problem for S will turn out to be the simplest one. Its immediate that S is normal, meaning that it commutes with its adjoint (we in fact have a stronger statement S ∗ S = SS ∗ = I). Any normal matrix has a full set of mutually orthogonal eigenvectors. We will not need this fact here since we will explicitly construct all the eigenvectors. Once its eigenvectors are found, we will be able to diagonalize all circulant matrices using them. Note that we have two options. To find eigenvectors of S or alternatively of S ∗ . We begin with S ∗ since this will end up yielding the classically defined DFT.

2.1

Construction of Eigenvectors/Eigenvalues of S ∗

Let w be an eigenvector (with eigenvalue λ) of the shift operator S ∗ . Note that it is also an eigenvector (with eigenvalue λl ) of any power (S ∗ )l of S ∗ . Applying the definition (5) to 2

To invoke this, we need to know that circulant matrices are diagonalizable. Indeed they are since the complex conjugate transpose (the adjoint) of a circulant matrix is also circulant, so any circulant matrix commutes with it’s adjoint, i.e. it is normal. Therefore, any circulant matrix has a full set of mutually orthogonal eigenvectors.

6

the relation Sw = λw will reveal that an eigenvector w has a very special structure S ∗ w = λw = λl w

(S ∗ )l w

⇐⇒ ⇐⇒

k ∈ Zn , k ∈ Zn , l ∈ Z.

wk+1 = λ wk , wk+l = λl wk ,

(6)

These relations can be used to compute all eigenvectors/eigenvalues of S ∗ . First, observe that although (6) is valid for all l ∈ Z, this relation “repeats” for l ≥ n. In particular, for l = n we have for each index k wk+n = λn wk

wk = λ n wk

⇐⇒

(7)

since k + n ≡n k. Now since the vector w 6= 0, then for at least one index k, wk 6= 0, and the last equality implies that λn = 1, i.e. any eigenvalue of S must be an nth root of unity λn = 1



λ = ρm := ei n m , m ∈ Zn .

⇐⇒

Thus we have discovered that the n eigenvalues of S are precisely the n distinct nth roots of unity {ρm }. To compute the corresponding eigenvectors, apply the last relation in (6) wk+l = λl wk , and use it to express the entries of an eigenvector w in terms of the first entry wk = λl w0 .

(8)

Applying this with the discovered eigenvalues of roots of unity wl = ρlm w0

 T w = w0 1 ρm ρ2m . . . ρn-1 m

⇐⇒

(9)

Note that w0 is a scalar, and since eigenvectors are only unique up to multiplication by a scalar, we can set w0 = 1 for a more compact expression for the eigenvector. We summarize the previous derivations in the following statement. Lemma 2 The circular left-shift operator S ∗ on Rn has exactly n eigenvalues, they are 2π m the nth roots of unity ρm := ei n m = ρm 1 =: ρ . The corresponding (mutually orthogonal) eigenvectors are w(m) =



1 ρm ρ2m . . . ρm(n-1)

T

,

m = 0, . . . , n − 1,

(10)

 Note that the eigenvectors w(m) are indexed with the same index as the eigenvalues {λm }. Once the form of the vector (10) is given, it is immediate to verify that it is an eigenvector  T S ∗ 1 ρm ρ2m . . . ρm(n-1) =

 m 2m T ρ ρ . . . ρm(n-1) 1  T = ρm 1 ρ ρ2m . . . ρm(n-2) ρ-1 m h iT m(n-2) m(n-1) = ρm 1 . ρ ρ2m . . . ρm ρm

(11)

While this last calculation provides an easy and immediate verification of Lemma 2, it is not a satisfactory starting point for the development of the subject, as the form of the eigenvalues/eigenvectors would have to be “guessed”. Instead, as shown above, they can be constructed systematically from the basic definitions that result in (8), and no guessing is required. 7

2.2

Eigenvalues Calculation of a Circulant Matrix Yields the DFT

Now that we have calculated all the eigenvectors of the shift operator in Lemma 2, we can use them to find the eigenvalues of any circulant matrix Cx from the relation      x0 xn−1 · · · x1 1 1  x1   ρm   x0 x2       ρm  (12) Cx w(m) = λm w(m) ⇐⇒ = λ  ..   ..  ,   . . m ..  .  .  . ..   ..  xn−1 xn−2 · · ·

x0

ρn-1 m

ρn-1 m

Each row of the above equation represent essentially the same equation (but multiplied by a power of ρm ). The first row is the easiest equation to write down and gives λm = x0 + xn−1 ρm + · · · + x1 ρn-1 m

-(n-1) = x0 + x1 ρ-1 m + · · · + xn−1 ρm

=

n−1 X

xl ρ−l m =

l=0

n−1 X

xl ρ−ml =

l=0

n−1 X



xl e−i n ml =: x ˆm ,

(13)

l=0

which is precisely the classically-defined DFT (2) of the vector x. Another way to state the previous conclusion is to define a complex function, the “ztransform” of x, which is a polynomial in the variable z −1 as follows x ˆ(z) := x0 + x1 z −1 + · · · + xn−1 z −(n−1) . Then our conclusion can be restated as the eigenvalues of Cx being the values of the function x ˆ(z) evaluated at z = ρm , the n-roots of unity . One might ask what the conclusion would have been if the eigenvectors of S have been used instead of those of S ∗ . A repetition of the previous steps but now for the case of S would yield that the eigenvalues of a circulant matrix Cx are given by µk =

n−1 X



xl ei n kl ,

k = 0, 1, . . . , n − 1.

l=0

(14)

While the expressions (13) and (14) may at first appear different, the sets of numbers {λm } and {µk } are actually equal. So in fact, the expression (14) gives the same set of eigenvalues as (13) but arranged in a different order since µk = λ−k . λ−k =

n−1 X



xl e−i n (−k)l =

l=0

n−1 X



xl ei n kl = µk .

l=0

Along with the two choices of S and S ∗ , there are also other possibilities. Let p be any number that is coprime with n. It is easy to show (Exercise 2) that a n × n matrix is circulant iff it commutes with S p . Therefore the eigenvectors of S p (rather than those of S) can be used to simultaneously diagonalize all circulant matrices. This would yield yet another transform distinct from the two transforms (13) or (14). However, the set of numbers produced from that transform will still be the same as those computed from the previous two transforms, but arranged in a different ordering.

8

3

Circular Convolution

We will start with examining the matrix-vector product when the matrix is circulant. By analyzing this product, we will obtain the circular convolution of two vectors. Let Cx by some circulant matrix, and examine the action of a such a matrix on any vector y = T  y0 y1 · · · yn−1 . The matrix-vector multiplication z = Cx y in detail reads 

z0 z1 .. .





x0 x1 .. .

      z = =      zn−1 xn−1

xn−1 · · · x0 .. . xn−2 · · ·

 x1  x2   ..   .  x0

y0 y1 .. .

yn−1



   = Cx y. 

(15)

Using (Cx )ij = xi−j , the above matrix-vector multiplication can be rewritten as zi =

n−1 X

(Cx )ij yj =

j=0

n−1 X

xi−j yj .

(16)

j=0

Note that this can be viewed as an operation on the two vectors x and y to yield the vector z, and allows us to reinterpret the matrix-vector product of a circulant matrix as follows. Definition 1 Given two n-vectors x and y, their circular convolution z = x ? y is another n-vector defined by z = x ? y



zk =

n−1 X

xl yk−l ,

(17)

l=0

where the indices in the sum are evaluated modulo n. It follows immediately from (16) that multiplication by a circulant matrix Cx is equivalent to convolving with its defining vector x, i.e. z = Cx y = x ? y.

(18)

The sum in (17) defining circular convolution has a nice circular visualization due to modular arithmetic on Zn . This is illustrated in figure 4. The elements of x are arranged in a discrete circle counter-clockwise, while the elements of y are arranged in a circle clockwise (the reverse orientation is because elements of x are indexed like xl while those of y are indexed like y.−l in the definition (17)). For each k, the array y is rotated counter-clockwise by k steps (figure 4 shows cases for three different values of k). The number zk in (17) is then obtained by multiplying the corresponding elements of the x and rotated y arrays, and then summing. This generates the n numbers z0 , . . . , zn−1 . From the definition, it is easy to show (see Exercise 3) that circular convolution is associative and commutative. • Associativity: for any three n-vectors x, y and z we have x ? (y ? z) = (x ? y) ? z • Commutativity: for any two n-vectors x and y x?y = y?x 9

y n- 2

yn-2 x3 x 2

y n- 1

x3 x 2

yn-1 x1 x0

x1

k=0

x0

y1

xn-1 xn-2

y1 y2

yn

x1

x0

y0

xn-1 xn-2

x3 x 2

y0

yn

xn-1 k = -1 = n-1

k=1

Figure 4: Graphical illustration of circular convolution zk =

y2

-1

y0

xn-2

y2

-2

y1

Pn−1

l=0 xl yk−l for k = 0, 1, −1 respectively. The y vector is arranged in reverse orientation, and then each zk is calculated from the dot product of x and the rotated, reverse-oriented y vector rotated by k steps counter clockwise.

The above two facts have several interesting implications. First, since convolution is commutative, the above matrix-vector product can be written in two equivalent ways x ? y = Cx y = Cy x. Applying this fact in succession to two circulant matrices Cx Cy z = Cx (y ? z) = x ? (y ? z) = (x ? y) ? z = Cx?y z. This means that the product of any two circulant matrices Cx and Cy is another circulant matrix Cx?y whose defining vector is x ? y, the circular convolution of the defining vectors for Cx and Cy . We summarize this conclusion and an important corollary of it next. Fact 1 1. Circular convolution of any two vectors can be written as a matrix-vector product with a circulant matrix x ? y = Cx y = Cy x. 2. The product of any two circulant matrices is another circulant matrix Cx Cy = Cx?y . 3. All circulant matrices mutually commute since for any two Cx and Cy Cx Cy = Cx?y = Cy?x = Cy Cx . The set of all n-vectors forms a commutative algebra under the operation of circular convolution. The above shows that the set of n × n circulant matrices is also a commutative algebra isomorphic to n-vectors with circular convolution.

4

The Big Picture

Let Cx be a circulant matrix made from a vector x as in (1). If we use the eigenvectors (10) of S ∗ as columns of a matrix W , the n eigenvalue/eigenvector relationships (12) Cx w(m) = 10

λm w(m) can be written as a single matrix equation as follows     x ˆ0 ..  (0)   (0)  (n-1) (n-1) . Cx w ··· w ··· w  = w  ⇐⇒

x ˆn−1

Cx W = W diagˆ x,



,

(19)

where we have used the fact (13) that the eigenvalues {λm } of Cx are precisely {ˆ xm }, the elements of the DFT of the vector x. The columns of W are mutually orthogonal, and thus W is a unitary matrix (up to a rescaling) W ∗ W = W W ∗ = nI, or equivalently W −1 = n1 W ∗ . Since the matrix W is made up of the eigenvectors of S ∗ , which in turn are made up of various powers of the roots of unity (10), it has some special structure which is worth examining     1 1 ··· 1  ρn-1    (0)  1 ρ · · · (n-1) W := w . ··· w  =  .. .. ..  . . . ρ(n-1)(n-1)

1 ρn-1 · · ·

The matrix W ∗ is thus the matrix W with each ∗ entry replaced by its complex conjugate. Furthermore, since for each root of unity ρk = ρ−k , we can therefore write  1 1  W∗ = .  ..

··· ···

1 ρ-1 .. .

1

ρ-(n-1) .. .

1 ρ-(n-1) · · ·

ρ-(n-1)(n-1)



  . 

Also observe that multiplying a vector by W ∗ is exactly taking its DFT. Indeed the m’th row of W ∗ y is   y0    yˆm = 1 ρ-m · · · ρ-m(n-1)  ...  , yn−1

which is exactly the definition (2) of the DFT. Similarly, multiplication by the inverse DFT Pn−1 P 2π yl = n1 k=0 yˆk ρkl = n1 n−1 ˆk ei n kl . k=0 y

1 nW

is taking

Multiplying both sides of (19) from the right by W −1 gives the diagonalization of Cx which can be written in several equivalent forms   1 Cx = W diagˆ x W −1 = W diagˆ x n1 W ∗ = diagˆ x W∗ (20) nW     √1 W √1 W ∗ . = diagˆ x (21) n n

The diagonalization (20) can be interpreted as follows in terms of the action of a circulant matrix Cx on any vector y  1 diagˆ x W∗ y Cx y = nW | {z } |

|

DFT of y

{z

}

multiply by x ˆ entrywise

{z

inverse DFT

11

}

Thus the action of Cx on y, or equivalently the circular convolution of y with x, can be performed by first taking the DFT of y, then multiplying the resulting vector componentwise by x ˆ (the DFT of the vector x defining the matrix Cx ), and then taking an inverse DFT. In other words, the diagonalization of a circulant matrix is equivalent to converting circular convolution to component-wise vector multiplication through the DFT. This is illustrated in Figure 5. 2 3 x ˆ0 . zˆ yˆ multiply by 4 . . 5 x ˆn-1

W⇤

1 nW

y

W⇤

1 nW

Cx convolve with vector x

multiply by circulant

z

Figure 5: Illustration of the relationships between circulant matrices, circular convolution and the DFT. The matrix-vector multiplication z = Cx y with the circulant matrix Cx is equivalent to the circular convolution z = x?y. The DFT is a linear transformation W on vectors with inverse n1 W . It converts multiplication by the circulant matrix Cx into multiplication by the diagonal matrix diag (ˆ x0 , . . . , x ˆn-1 ) whose entries are the DFT of the vector x defining the matrix Cx .

Note that in the literature there is an alternative form for the DFT and its inverse P P 2π −i 2π kl n , yl = √1n n−1 ˆk ei n kl , yˆk = √1n n−1 k=0 yl e k=0 y

which is sometimes preferred due to its symmetry (and is also truly unitary since with this definition kyk2 = kˆ y k2 ). This “unitary” DFT corresponds to the diagonalization given in (21) instead of (20). We do not adopt this unitary DFT definition here since it complicates3 the statement that the eigenvalues of Cx are precisely entries of x ˆ. Theorem 3

1. The following sets are isomorphic commutative algebras

(a) The set of n-vectors is closed under circular convolutions and is thus an algebra with the operations of addition and convolution. (b) The set of n × n circulant matrices is an algebra under the operations of addition and matrix multiplication. (c) The set of n-vectors is an algebra under the operations of additions and componentwise multiplication. 2. For any n-vector x = (x0 , . . . , xn−1 ), denote its associated circulant matrix (1) by Cx and its DFT (2) by (ˆ x0 , . . . , x ˆn−1 ). The set of n-vectors, their associated circulant matrices and their DFTs are isomorphic as algebras under the respective operations described above. Specifically, x[ ?y = x ˆ ◦ yˆ,

Cx?y = Cx Cy ,

where ◦ denotes the element-by-element product of vectors. 3

If the unitary DFT is adopted, the equivalent statement would be that the eigenvalues of Cx are the √ elements of the entries of n x ˆ.

12

i2

i1

0

(a)

(b)

Figure 6: (a) A depiction of the two dimensional torus Z7 × Z5 . The dots represent an element (k1 , k2 ), with k1 ∈ Z7 and k2 ∈ Z5 . Each horizontal arrow represents the addition operation (k1 , k2 ) + (1, 0), and each vertical arrow represents the operation (k1 , k2 ) + (0, 1). The dashed arrows represent the same additions but emphasize modular arithmetic, e.g. the first vertical dashed arrow represents (0, 4) + (0, 1) = (0, 0 (mod 5)). The set Z7 × Z5 with modular arithmetic can then be thought of as a lattice with periodic boundaries. (b) A torus Zn × Zm can be visualized as embedded in a higher dimensional space. The index k1 is a Toroidal coordinate (blue), and k2 is the Poloidal coordinate (red). Image (b) CC BY 4.0.

3. The eigenvalues of a circulant matrix Cx are given by the values of the z-transform x ˆ(z) := x0 + x1 z −1 + · · · + xn−1 z n−1 2π

evaluated at the roots of unity ρk := ei n k , i.e. x ˆ(ρk ) =

n−1 X l=0

xl

n−1 n−1  −l X X 2π xl e−i n kl =: x xl ρ−kl = ρk = ˆk , l=0

l=0

which are precisely the entries of x ˆ, the DFT of x.

5

The Multidimensional Case

In the one dimensional case covered earlier, one considers an n-vector as a real (or complex) valued function on the index set Zn , which is viewed as a discrete circle. In the multidimensional case, this set is replaced by Zdn , the d-dimensional torus. Figure 6 illustrates its structure. It can be either thought of as a lattice with periodic boundaries, or equivalently as a torus embedded in a higher dimensional space. It is convenient to use multi-index notation for an element k ∈ Zdn where k = (k1 , . . . , kd ), with each kq ∈ Zn for q = 1, . . . , d. The proper generalization of circulant matrices is to operators that act on functions on Zdn . Instead of n-vectors, we now have d-dimensional n-arrays which we equivalently consider as functions x : Zdn −→ R. We simply refer to such objects as arrays when their dimensions are clear from context. Multi-index notation is also used for elements of such arrays, i.e. xk stands for x(k1 ,...,kd ) for any array x : Zdn −→ R. For functions defined on Zdn , a general shift operator has the form (Sl x)(k1 ,...,kd ) := x(k1 −l1 ,...,...,kl −ld ) , and note how the operator is indexed by the multi-index l = (l1 , . . . , ld ). There are as many shift operators as the cardinality of Zdn , but amongst all of them there are q basic ones which we refer to as the coordinate-shift operators (Sq x)(k1 ,...,kd ) := x(k1 ,...,kq −1,...,kd ) , 13

i.e. Sq is a right circular-shift in only the q coordinate. There are d (the dimension of the Torus) such operators, and an important property they posses is that they generate all other shift operators by (22) S(l1 ,...,ld ) = S1l1 · · · Sdld . Observe that since all shift operators mutually commute, the order in which the coordinate shifts are performed does not matter. The shift operators encode the basic symmetries of Zdn , and we therefore use them to define circulant operators. d

d

Definition 2 A linear operator M : RZn −→ RZn is called circulant if it commutes with all the coordinate-shift operators Sq M = M Sq ,

q = 1, . . . , d.

Since coordinate shifts generate all other shifts by (22), it then follows that a circulant operator also commutes with all possible shifts Sl , l ∈ Zdn . An easy calculation will show that an operator is circulant iff it can be represented as circular convolution with an array. First define the multidimensional Kroenecker delta δ and its shifted version δ (l)   1 if k = (l1 , . . . , ld ), 1 if k = (0, . . . , 0), (l) δk := δk := 0 if k 6= (l1 , . . . , ld ). 0 if k 6= (0, . . . , 0), i.e. δ (l) is the array that is 1 at index l and zero otherwise. Note that any shifted delta can (l) be obtained from δ by acting with a shift operator δ (l) = Sl δ, i.e. δk = δk−l . We also note that any array y can be written as a weighted sum of shifted deltas X X y = yl δ (l) = yl Sl δ. (23) l∈Zdn

l∈Zdn

Let M be a circulant operator in the sense of Definition 2 and define the array x as the result of the action of M on the Kroenecker delta x := M δ, i.e. x is the “impulse response” of M . Now using the representation (23) and the commutativity of M with shifts, we can write the action of M on any array y with a formula that involves only the impulse response x and the array y   X X X z = My = M  yl S l δ  = yl S l M δ = yl Sl x ⇔

zk =

X

l∈Zdn

l∈Zdn

l∈Zdn

yl xk−l ,

l∈Zdn

where we have used the commutativity M Sl = Sl M in the third equality, and the fact that M δ is just the impulse response x. We have therefore uncovered multidimensional circular convolution as the representation of circulant operators.

14

Lemma 4 A linear operator M : y 7→ z on d-dimensional n-arrays is circulant iff it can be written as a circular convolution X X x(k1 −l1 ,...,kd −ld ) y(l1 ,...,ld ) xk−l yl ⇐⇒ z(k1 ,...,kd ) = zk = l1 ,...,ld ∈ Zn

l∈Zdn

where x is the “impulse response” M δ of M . In this case, we denote M = Cx , the circulant operator defined by the array x. Thus a circulant operator Cx is completely specified by the array x (its “impulse response”) in the same way a circulant matrix is completely specified by its first column. In contrast to the one dimensional case, matrix-vector notation does not yield a simple visualization of multidimensional circulant operators. In other words, matrix representations of circulant operators do not have an easily discernible matrix structure in higher dimensions.

5.1

The Mutlidimensional DFT

In a manner similar to the one dimensional case, we would like to simultaneously diagonalize all circulant operators and discover the multidimensional DFT in the process. In the 1 dimensional case, there was a single shift operator, the commutation with which characterizes all circulant matrices. In the d dimensional case, there are d such basic operators that all mutually commute. It turns out that for each of those operators Sq individually, their eigenvalues come with multiplicities, and there is not a unique way to select their eigenvectors. However, they do share one set of unique (up to scalar multiples) eigenvectors. To find these shared eigenvectors, consider the following eigenvalue problem Sq∗ w = λq w,

for q = 1, . . . , d,

i.e. look for an array w that is an eigenvector of all coordinate-shift operators Sq∗ simultaneously, but possibly with different eigenvalues λq . This condition can be written out explicitly as w(k1 ,...,kq+1 ,...,kd ) = λq w(k1 ,...,kq ,...,kd ) , for q = 1, . . . , d.  l We also have that Sq∗ w = λlq w which explicitly is w(k1 ,...,kq+l ,...,kd ) = λlq w(k1 ,...,kq ,...,kd ) ,

for q = 1, . . . , d.

(24)

Since for each q, we have kq+n ≡n kq we conclude that for each q 1 = λnq ,



i.e. each λq is an n-root of unity, which means that λq = ρmq = ei n mq for some mq ∈ Zn . Now, since eigenvectors are defined only up to a scalar multiple, we set w(0,...,0) = 1, and use (24) in succession to generate all components of w from w(0,...,0) = 1 by w(l1 ,0,...,0) = ρl1 m1 w(0,...,0) = ρl1 m1 w(l1 ,l2 ,0,...,0) = ρl2 m2 w(l1 ,0,...,0) = ρl2 m2 ρl1 m1 .. . w(l1 ,...,ld ) = ρld md · · · ρl1 m1 = ρl1 m1 +···+ld md We will use the notation m · l := l1 m1 + · · · + ld md (a dot product of the indices). Note that in the above formula, the eigenvector w is indexed by m ∈ Zdn . There are as many of them as the cardinality of Zdn . We summarize this conclusion next. 15

Lemma 5 The coordinate-shift operators S1∗ , . . . , Sd∗ have a total of n × d common eigenvectors w(m) , m ∈ Zdn . The entries of the arrays w(m) are given by (m)

wl

(m ,...,m )

q 1 = w(l1 ,...,l d)

= ρl1 m1 +···+ld md =: ρm·l ,

l, m ∈ Zdn ,

(25)



where ρ := ei n , the n-root of unity. Now that we have discovered the common eigenvectors of the coordinate-shift operators, we can diagonalize any circulant operator. Indeed, let Cx be a circulant operator generated by an array x. Since it is circulant, it commutes with all coordinate-shifts. It thus has the same eigenvectors as them. To find the eigenvalues of Cx , we simply apply it to the already computed eigenvectors (25)   X X (m) xk−l ρm·l , xk−l wl = = Cx w(m) k

l∈Zdn

l∈Zdn

where we have used Lemma 4 for the first equality, and (25) for the second equality. A reindexing of the sum (with j := k − l) uncovers the eigenvalues     X X X (m) Cx w(m) = xj ρm·(k−j) = ρm·k xj ρ−(m·j) = wk  xj ρ−(m·j)  . k

j∈Zdn

j∈Zdn

j∈Zdn

Therefore, the eigenvalues of Cx are the set of numbers X xj ρ−(m·j) =: x ˆm , j∈Zdn

which we will define to be the DFT of x. We summarize this in the next statement. Theorem 6 Let Cx be a circulant operator defined by the multidimensional array x over Zdn . The n × d eigenvalues of Cx are given by the multidimensional DFT of the array x X X 2π xl ρ−(m·l) , x ˆm = x ˆ(m1 ,...,md ) = x(l1 ,...,ld ) e−i n (m1 l1 +···+md ld ) = l∈Zdn

l∈Zdn



where ρ := ei n is the n-root of unity. Equivalently, the eigenvalues are given by evaluations of the multidimensional z-transform X x ˆ(z) := x(l1 ,...,ld ) z1−l1 · · · zd−ld l∈Zdn

with each coordinate evaluated at all possible roots of unity zq = ρm , q = 1, . . . , d, m ∈ Zn .

References [1] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, “Signals and systems, vol. 2,” Prentice-Hall Englewood Cliffs, NJ, vol. 6, no. 7, p. 10, 1983. [2] R. Plymen, “Noncommutative fourier analysis,” 2010. [3] M. E. Taylor and J. Carmona, Noncommutative harmonic analysis. American Mathematical Soc., 1986, no. 22. 16

A

Exercises

Exercise ch.1 Show that any matrix M that commutes with the shift operator S must be a circulant matrix, i.e. must have the structure shown in (1), or equivalently (3). Solution ch.1 Starting from the relation SM = M S, and using the definition (S)ij = δi−j−1 compute X X (SM )ij = Sil (M )lj = δi−l−1 (M )lj = (M )i−1,j , l

(M S)ij =

X

l

(M )il Slj =

l

X

(M )il δl−j−1 = (M )i,j+1 .

l

Note that since the indices i − j − 1 of the Kroenecker delta are to be interpreted using modular arithmetic, then the indices i − 1 and j + 1 of M above should also be interpreted with modular arithmetic. The statements (M )i−1,j = (M )i,j+1

⇐⇒

(M )i−1,j−1 = (M )i,j

⇐⇒

(M )i,j = (M )i+1,j+1

then mean that the i’th column is obtained from the previous i − 1 column by circular right shift of it. Alternatively, the last statement above implies that for any k, (M )i,j = (M )i+k,j+k , i.e. that entries of M are constant along “diagonals”. Now take the first column of M as mi := (M )i,0 , then (M )ij = (M )i−j,j−j = (M )i−j,0 = mi−j . Thus all entries of M are obtained from the first column by circular shifts as in (3). Exercise ch.2 Show that an n × n matrix M is circulant iff it commutes with S p where (p, n) are coprime. Solution ch.2 If M is circulant, then it commutes with S and also commutes with any of its powers S p . The other direction is more interesting. The basic underlying fact for this conclusion has to do with modular arithmetic in Zn . If (p, n) are coprime, then there are integers a, b that satisfy the Bezout identity ap + bn = 1, which also implies that ap is equivalent to 1 mod n since ap = 1 − bn, i.e. it is equal to a multiple of n plus 1. Therefore, there exists a power of S p , namely S ap such that S ap = S.

(26)

Thus if M commutes with S p , then it commutes with all of its powers, and namely with S ap = S, i.e. it commutes with S, which is the condition for M being circulant. Equation (26) has a nice geometric interpretation. S p is a rotation of the circle in Figure 3 by p steps. If p and n were not coprime, then regardless of how many times the rotation S p is repeated, there will be some elements of the discrete circle that are not 17

reachable from the 0 element by these rotations. The condition p and n coprime insures that there is some repetition of the rotation S p , namely (S p )a which gives the basic rotation S. Repetitions of S then of course generate all possible rotations on the discrete circle. In other words, p and n coprime insures that by repeating the rotation S p , all elements of the discrete circle are eventually reachable from 0. Exercise ch.3 Show that circular convolution (17) is commutative and associative. Solution ch.3 Commutativity: Follows from (a ? b)k =

X

al bk−l =

l

X

ak−j bj = (b ? a)k

l

where we used the substitution j = k − P l (and consequently l = k − j) . Associativity: First note that (b ? c)i = j bj ci−j , and compare (a ? (b ? c))k =

X

al (b ? c)k−l =

l

((a ? b) ? c)k =

X

X l

(a ? b)j ck−j =

j



al 

X

X X j

l

j



bj ck−l−j  =

al bj−l

!

ck−j =

X

al bj ck−l−j

l,j

X

al bj−l ck−j .

l,j

Relabeling j − l =: i (and therefore j = l + i) in the second sum makes it X X aj bi ck−(l+i) = aj bi ck−l−i , l,i

l,i

Which is exactly the first sum, but with a different labeling of the indices.

18