IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 3, JUNE 2011
487
A Touch Interface Exploiting Time-Frequency Classification Using Zak Transform for Source Localization on Solids Kattukandy Rajan Arun, XueXin Yap, Student Member, IEEE, and Andy W. H. Khong, Member, IEEE
Abstract—We propose a new approach to the development of a touch interface using surface-mounted sensors which allows one to convert a hard surface into a touch pad. This is achieved by using location template matching (LTM), a source localization algorithm that is robust to dispersion and multipath. In this interdisciplinary research, we employ mechanical vibration theories that model wave propagation of the flexural modes of vibration generated by an impact on the surface. We then verify that the amplitude variance across time for each propagating mode frequency is unique to each location on a surface. We show that the Zak transform allows us to faithfully track these amplitude variations and we exploit the uniqueness of this variance as a time-frequency classifier which in turn allows us to localize a finger tap in the context of a human-computer interface. The performance of the proposed algorithm is compared with existing LTM approaches on real surfaces. Index Terms—Human-computer interface, location template matching, time-frequency classification, touch interface.
I. INTRODUCTION
C
URRENT approaches to human-computer interface (HCI) technology rely on the need for input devices such as the alphanumeric keyboard and the optical pointing device (mouse). These input devices have since become an important feature in personal as well as commercial computing. As new interactive digital media software applications continue to evolve over the years, one of the main drawbacks of such input devices is that they impede ease of software operation or data manipulation which require complex user input operations. As a result, the use of keyboards or mouse limit, to a large extent, the scope, functionality, as well as ease of use of the personal computer (PC). To address these issues, research in HCI in recent years have been focused on the use of, for example, vision-based cameras for both eye and hand motion detection [1], gyro-based sensors to track head rotation [2], as well as localization of fingers on the surface of a waveguide medium via the frustrated total internal reflection principle of light-emitting diodes [3]. The common goal of these techniques is to achieve controllability of the PC in a way that is more natural for humans. We note
Manuscript received October 29, 2010; revised February 25, 2011; accepted February 25, 2011. Date of publication March 07, 2011; date of current version May 18, 2011. This work was supported by the Singapore National Research Foundation Interactive Digital Media R&D Program, under research grant NRF2008IDM-IDM004-010. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Zhengyou Zhang. The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
[email protected]. edu.sg;
[email protected];
[email protected]). Digital Object Identifier 10.1109/TMM.2011.2123084
that although vision-based cameras can achieve high tracking capability for motion detection, the problem of occlusion needs to be addressed. In addition, such technology often requires the use of a depth sensor which increases the cost, albeit not too significantly. One of the most commonly used HCI technology is the touch screen, which has been employed in portable devices such as smart-phones and laptops. These touch-sensitive screens often employ an embedded layer of capacitive sensors which detect changes in the electrostatic field caused by a finger tap. Localization of the finger tap is therefore achieved by detecting and locating such changes on the touch surface. Although this proven technology has gained much popularity, its high implementation cost often limits the size of the screens. In addition, most of such devices require fabrication which, consequently, limit their flexibility in terms of material type or size of the touch surfaces. As a result, capacitive touch screens are often limited to small portable devices. To complement the above technologies, one solution which has recently drawn increasing attention in the research community is the development of an HCI platform which allows the flexibility of converting an ordinary flat surface, such as a table-top and window pane, into a touch interface. This technology is based on the use of surface-mounted sensors as well as utilizing algorithms for the localization of finger taps. In particular, the touch interface considered in [4] and [5] describes impact localization methods based on well-known techniques such as time-differences-of-arrival (TDOA) and location template matching (LTM). The effectiveness of both methods has been verified experimentally for isotropic [6] and anisotropic [7] materials. Although progress has been made pertaining to the localization of finger taps on hard surfaces such as wood and glass, localization using signals received by the sensors mounted on such surfaces remains a challenge. These challenges include temperature variation, different types of wave propagation, multipath propagation of these waves through the material, as well as wave distortion due to dispersive effects of the channel between the finger tap and the sensor. Due to the complexity of these challenges, which affects the accuracy of source localization to a large extent, results presented thus far have been aimed on improving the accuracy of the source localization algorithms. The TDOA algorithm utilizes an array of sensors and estimates the differences between the arrival times of the signals to these sensors. The TDOA between two sensors defines a hyperbolic function with the sensors corresponding to the foci of the hyperbola. Using several TDOA estimates, the intersection of these hyperbolae provides an estimate of the source location. In practice, this intersection can be achieved by solving a set of
1520-9210/$26.00 © 2011 IEEE
488
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 3, JUNE 2011
nonlinear equations related to the assumed source and known sensor positions. Although the TDOA approach has been investigated for localizing finger taps on a surface [4], it assumes knowledge of the wave propagation speed. Determination of this speed is often a challenge, particularly for solid surfaces due to the dispersive effects where different frequencies travel at different speeds [8]. In addition, similar to acoustic propagation in air within an enclosed space, the presence of multipaths within the material render TDOA estimation a challenging task. To overcome the shortcomings of TDOA-based approaches, LTM approaches have been proposed. These LTM-based algorithms utilize only one sensor and localize finger taps by comparing the received signal with a set of prerecorded signals from known locations. In order to achieve this comparison, LTM computes the similarity by employing the cross-correlation measure (CC-LTM) [9] as well as comparing the closeness of the auto-regressive coefficients using the all-pole model (AP-LTM) [10], [11]. It is important to note that, unlike TDOA approaches, LTM-based approaches exploit the dissimilarity between the source-sensor paths for different source locations. Such dissimilarity arises, to a large extent, in the presence of multipaths and dispersion, and it is therefore expected that LTM-based algorithms achieve higher localization performance compared to TDOA-based approaches for anisotropic materials. In this interdisciplinary research, we develop an HCI and achieve localization of a tap on solids via the LTM approach. Unlike CC-LTM and AP-LTM, we gain insights into the modes of wave propagation that is generated by an impact on the surface using mechanical vibration theories. We show that these propagating modes are unique at each location and that faithful representation of the resultant signal arising from these flexural modes of vibration can be achieved using time-frequency analysis. Exploiting these useful properties, we verify that one of the time-frequency analysis, the Zak transform, allows us to faithfully achieve a time-frequency domain representation of a tap. More importantly, we show that the amplitude variance across time for each frequency-bin is unique for each location. This in turn allows us to exploit this variance as a unique time-frequency classifier for the localization of the finger-tap in the context of HCI. Since mode frequencies generated by mechanical wave propagation are now taken into account, the performance of our proposed Z-LTM source localization algorithm for various materials including the aluminum, glass, as well as plywood is expected to outperform that of CC-LTM and AP-LTM.
Fig. 1. Diagram of a touch interface with representing a predefined point.
as the prerecorded signal corresponding to a known source location at position index . We assume that the length of is and that there is one such prerecorded signal equivalent to number of predefined locations, LTM-based for each of the with algorithms compute a similarity measure between for . The estimated location is then that of given by the location corresponding to the prerecorded signal which is closest to the received signal. A. Cross-Correlation-Based LTM (CC-LTM) To compute the similarity, the use of the cross-correlation and , for , has been probetween posed [9], [12]. This approach exploits the fact that the received signal is unique with respect to each of the source location due to variations in scattering and reflections of the wave along the source-sensor path. It has been shown, given that the sensor is placed in an off-symmetry position, every point on the plate has a unique “finger-print” [9]. Defining as an vector containing correlation coefficients between and such that each element is given by (2) and denote the mean of and while where and denote the standard deviation of and , respectively. The cross-correlation-based LTM (CC-LTM) estimates the location of the impact corresponding to the maximum of over all position index , i.e., (3)
II. REVIEW OF EXISTING LTM TECHNIQUES We review LTM-based source localization algorithms by considering a single-input, single-output system as shown in Fig. 1. We assume that a vibration caused by an impact generates a re, where is ceived signal . The aim of LTM is therefore to estimate the the length of source location using . To achieve this, the LTM requires prior knowledge of the source signature at a known location. This signature can be obtained during the calibration phase where we define (1)
It has been shown in [9] that location estimation using CC-LTM achieves a localization accuracy of . B. All-Pole Filter Model Approach to LTM (AP-LTM) To increase the localization accuracy, the all-pole model for LTM (AP-LTM) algorithm is proposed [11]. This algorithm exploits the dissimilarity between the spectral peaks of the received signal at each tap location. The differences between spectral peaks of and can be made by comparing the locations of these dominant frequencies in the unit circle using an
ARUN et al.: TOUCH INTERFACE EXPLOITING TIME-FREQUENCY CLASSIFICATION
all-pole filter. Given a received signal , the error between the actual and predicted signal is given by
(4) where is the order of prediction and is the th prediction filter coefficient. Defining similar to that of (4) for at location index , the sum of the prerecorded signal the squared differences of the prediction coefficients is given by (5) As opposed to CC-LTM, this approach estimates the location of the source corresponding to the minimum distance between the prediction coefficients, i.e.,
489
the thickwhere is the absorption coefficient, the density, ness of the plate, the stiffness of the plate, the Young’s modulus, Poisson’s ratio, and the biharmonic . operator In order to solve (7), we express due to impact at as (8) where , , and are the dirac delta functions with respect to variables , , and , respectively. Substituting (8) into (7) and applying the boundary condition of simply supported plate, the solution of (7) for zero initial condition is given by the transfer function of the plate [11] (9) where
is Laplace variable and
(6) As shown in [11], AP-LTM achieves an improvement in localization accuracy with lower computational costs compared to CC-LTM. However to achieve high accuracy, there is need to and such that their bandwidths match preprocess that of the sensor as well as the frequency response of the material. III. VIBRATIONAL MODEL-BASED TIME-FREQUENCY ANALYSIS AND CLASSIFICATION Unlike CC-LTM and AP-LTM, we formulate the LTM-based localization problem on a surface by formulating the problem using mechanical vibration theories. We consider vibrating waves on the surface of a thin rectangular plate, a common structure covering a wide range of objects including tables, walls, and glass windows. Using our mathematical model, we gain insights into the mode frequencies and show that one can represent the signal in the time-frequency domain using the Zak transform. This in turn allows us to localize the finger tap using a time-frequency classifier that is unique to the position of the finger tap. A. Mathematical Model for Flexural Vibration With reference to Fig. 1, we define the coordinate of a tap where is the position location as denotes the total number of possible tap locaindex while tions. We employ the well-known Kirchoff’s hypotheses for linear, elastic small-deflection theory of thin plates. Defining as the pressure at location at time due to a finger impact made at location , the vertical of the plate at a particular location displacement must satisfy the motion equation [13]
(7)
(10) such that , , is the mode frequency while is the reduced coefficient of absorption and are the modes of wave propagation. By transthat , forming (9) back to the time domain, we obtain an important due to a tap expression for the vertical displacement at made at given by
(11) . where As shown in (11), we can represent as the summation of mode response corresponding to each mode frequency . For a given tap location and material, each mode response can be represented as the product of the shaping function and the time-dependent function . It can be seen that the time-dependent function is inversely proportional increases with , , the contributo . This implies that as reduces with respect to mode tion of that mode to , . In addition, as shown in Fig. 2 for a tap made at , the shaping function is location dependent on . More importantly, as shown in (10), is a function of . This implies that the time-dependent function is weighted by a location-dependent shaping function and as a result, the vertical displacement will vary with tap position due to its amplitude variation of mode frequencies with respect to time. To illustrate the above, Fig. 3(a) and (b) shows the varifor a ation of mode response tap made at location and , respectively. For each of these locations, we illustrate mode responses corresponding to and . We note that the amplitudes . of these mode responses vary with tap location
490
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 3, JUNE 2011
the time- and frequency-marginal properties [14] and is represented, respectively, as (12) (13)
Fig. 2. Shaping function
W
for mode
l = 5, b = 5.
is the TFR of with time variable and frewhere quency variable . It is also useful to note that there are different types of TFR and the choice of TFR is important to accurately track the amplitude variation of the mode frequencies. It is therefore important to verify whether these TFRs satisfy the marginal properties. We utilize the properties of a subclass of TFR falling under the category of Cohen’s class of bilinear time-frequency representation which can be expressed as [14]
(14)
(x
W (e
;y ) = (0:01; 0:01) m
Fig. 3. Variation of mode response, , (b) tions (a)
sin( t)= ), for tap posi(x ;y ) = (0:1; 0:1) m.
In addition, the amplitude reduces with increasing , , as expected. We therefore exploit these amplitude variations for source localization through the use of time-frequency analysis. It is important to note that although the sensor’s transfer function has not been taken into account for the development of LTM-based approaches, the performance of such algorithms is not affected by this function. This is because the transfer funcand consetion of the sensor has the same effect on quently the amplitude variation of mode frequencies are consistent across all positions on the surface. As a result, the uniqueis still preserved for each tap location. ness of B. Time-Frequency Analysis For classification purpose, it is important to represent signals in a domain that is unique for each tap location. Utilizing the mechanical model and as shown in Fig. 3, the amplitudes of the mode frequencies vary with respect to the tap location and hence we propose to employ a time-frequency representation and (TFR) for the analysis of these mode frequencies in . Although we note that for LTM classification, TFR need not be an ideal representation of the signal, we propose to represent the signal using a TFR that faithfully represents the mode frequencies. Unlike existing ad-hoc LTM approaches, this accurate representation allows us to formulate the LTM problem in the context of wave propagation which in turn allows the algorithm to accurately track any amplitude variation of such mode frequencies with time for localization. It is therefore important to track the instantaneous change in time and frequency content such that summing the energy distribution of the TFR across all frequencies at a particular time instant gives the instantaneous energy while summing the TFR across time at a particular frequency gives the energy density spectrum. These desirable properties of a TFR are referred to as
and is the kernel function which is where dependent on the TFR that one wishes to represent, while and denotes the time and frequency variables, respectively. Based on Cohen’s class, the time- and frequency-marginal conditions and , respectively. are satisfied when 1) Short-Time Fourier Transform (STFT): The short-time Fourier transform (STFT) divides a time-domain signal into overlapping/non-overlapping frames such that (15) is the analysis window. One of the most widely where form of STFT is defined by the energy density spectrum . The Gaussian-shaped window since it satisfies the Heisenberg unceris often used for tainty equality [15] giving one of the best representations of the signal both in time and frequency. This Gaussian window described by (16) where controls the spread of the window in the time- and frequency-domain. We verify if the STFT will achieve good representation of by our wave propagation model when operated on substituting (11) and (16) into (15) giving
(17)
ARUN et al.: TOUCH INTERFACE EXPLOITING TIME-FREQUENCY CLASSIFICATION
Unlike , the STFT contains terms which are not only but also undesirable terms which depend on a function of that are derived from and not alone. Therefore, and the STFT will not give a true representation of consequently will not be suitable for our algorithm to accurately track any amplitude variations of mode frequencies with time for localization. , as defined after (14), does not satisfy To verify that marginal conditions, we note that the spectrogram is a member of Cohen’s class, such that for real signals
(18) where we have utilized the definition of in (16). Equation (18) implies that our model under the STFT does not satisfy the time- and frequency-marginal conditions. 2) Wigner Distribution (WD): The Wigner distribution (WD) with a time is another TFR computed by correlating and frequency translated version of itself [14], i.e.,
(19) As shown in [14], the WD is also a member of Cohen’s class whose kernel function is While this implies that the WD satisfies the time- and frequency-marginal conditions for any input signal, the main drawback of the WD is the presence of cross-terms, similar to that of the STFT. This can be verified by substituting (11) into (19) giving
491
To analyze the model using the Zak transform, we substitute (11) into (21) and obtain
(22) As shown in [18], time- and frequency-marginal conditions are satisfied for the Zak transform. This implies that we can utilize the Zak transform to track any instantaneous change in time and frequency which the mode frequencies are dependent on. consists of In addition, as can be seen from (22), only a linear combination of functions relating to . Unlike STFT and WD, the absence of undesirable frequencies as well as cross-terms in the Zak transform provide a reliable estimate of the mode frequencies which, when combined with the marginal conditions, can be used for tracking the amplitude variations of the mode frequencies faithfully. For numerical implementation, we use the discrete Zak transof length form (DZT) where we operate on a signal resulting in a two-dimensional time-frequency representation. and as the DZT time and freDefining , the DZT quency indices, respectively, such that is defined as [17] (23) We can compute
by first defining
.. .
(20) where
(21)
..
.
(24)
.. .
such that each row is constructed from consecutive samples in . The DZT of is then given by taking the column-wise , i.e., discrete Fourier transform (DFT) of
.. . while and . We note is our desired term containing the that an impulse term at mode frequencies while all other cross-terms will cause nonzero values in the TFR even though the mode frequencies are not present in . 3) Zak Transform: We verify the validity of representing our signal using the Zak transform [16], [17]. Similar to the STFT and WD, the Zak transform maps a signal into the time and frequency domains and is defined by
.. .
.. .
..
.
.. . (25)
where denotes the discrete Fourier matrix. for the case of Fig. 4(a) shows an illustrative plot of a tap made on an aluminum surface at location while Fig. 4(b) shows its corresponding plot of with and . We note from Fig. 4(b) that is sparse since its energy is concentrated within the lower frequencies. This observation is consistent with that of (11) and Fig. 3 where we note that the amplitudes of mode frequencies reduce with increasing mode indices , . More importantly, as noted in Fig. 4(b), the amplitudes of these dominant frequencies vary
492
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 3, JUNE 2011
TABLE I COMPUTATION COMPLEXITY
Fig. 4. (a) Time domain and (b) Zak transform representation of a tap on an and . aluminum when tapped at location (0.1, 0.1) m with
Q = 256
R = 16
with time and therefore, we propose to employ the time variation of these amplitudes as a classifier for the localization. C. Time-Frequency Classification In order to determine the location of the finger tap using LTM and similar to that of CC-LTM and AP-LTM, we compare the and . The choice of classifier is similarity between therefore important since this classifier must be dependent on the position of the tap location. As noted in Fig. 3, the amplitude of each mode frequency is unique with position. One therefore expects that the variance across time between two signals for each frequency will be minimum if these two signals are generated from the same location. To verify that this variance is indeed dependent on the location, we define
up to frequency index due where we compute to the symmetric property of the FFT. Since as can be seen from Fig. 4(b), the frequency changes significantly with time, in order to track changes in the amplitude for each mode frequency, we smallest elements in such that propose to null-out there are only number of nonzero elements in . Using this and is sparse feature vector, the similarity between given, for our proposed Z-LTM algorithm, by (30)
(26) The estimated location is then given by Taking the variance across time for each mode frequency , , we obtain
(31)
D. Computational and Memory Requirement
(27) is the expectation operator. We therefore note that, where based on the mechanical model, the variance across each mode frequency , is dependent on and hence dependent on tap location given by (10). Based on the above discussion, we therefore propose to accurately represent a received signal in the TF domain by utilizing the Zak transform given by (25). Since we note that the variance across time for each mode frequency is dependent on the tap location, we then compute the variance across each row of , i.e.,
(28)
To estimate the location of the tap, we define a
vector (29)
We evaluate the computational complexity of the algorithms by computing the number of real multiplications the algorithm requires. As the proposed Z-LTM algorithm is to be implemented for a real-time HCI, only the computational complexity during detection phase (as oppose to the prerecording phase) will be of concern. We illustrate the computational complexity as shown in Table I for an example case where there are tap positions, samples while for AP-LTM for Z-LTM. We note that CC-LTM, AP-LTM, and and Z-LTM require approximately 307 450, 12 375, and 99 136 real multiplications, respectively. This implies that AP-LTM requires 4% while Z-LTM requires 32.24% that of CC-LTM in terms of real multiplications. Although the proposed Z-LTM algorithm requires higher complexity compared to AP-LTM, we show in Section IV that Z-LTM achieves higher performance compared to both CC-LTM and AP-LTM. We note, from (25), that although the Z-LTM requires a series of fast Fourier transform (FFT) operations, efficient computation of such FFTs can be achieved using, for example, algorithms proposed in [19] and [20]. In addition, for a fixed sensor setup, the proposed algorithm only needs to compute the variance of the selected bins which depends on the mode of vibration as shown in (30).
ARUN et al.: TOUCH INTERFACE EXPLOITING TIME-FREQUENCY CLASSIFICATION
493
IV. EXPERIMENTAL RESULTS We present experimental results to verify the performance of our proposed algorithm in the context of a touch interface. It is important to note that the Murata PSK1-4A1 picks up predominantly vibration signals and to a lesser extent acoustic signals. Although the proposed Z-LTM algorithm exploits the mechanical vibration model, any acoustic wave that is picked up by Murata will not cause significant localization errors as will be seen from the presented results. For these LTM-based approaches, we note that comparisons between time-domain signals or features extracted from these time-domain signals are required. As such, time alignment is necessary before computing the cross-correlation coefficients for the CC-LTM, extracting the auto-regressive coefficients for AP-LTM or constructing the feature vector for the proposed Z-LTM algorithm. We therefore propose and such that the maximum of each to time align signal corresponds to a sample index of 512, i.e., the maximum and correspond to and amplitudes of , respectively. In addition, a window length of is used with a sampling rate of 96 kHz. To quantify the performance of the algorithms, we adopt the notion of classification accuracy which defines the ability to correctly identify the observations belonging to the tap location . Defining as the number of correct classification as the total number of observations under test, the clasand sification accuracy is defined as
D for (a) CC-LTM and (b) Z-LTM for an experi-
Fig. 5. Dissimilarity matrix mental data set of aluminum.
(32) To further verify the robustness of the algorithms, we exploit the notion of a dissimilarity matrix
.. .
.. .
.. .
.. .
To quantify the difference in disparity between diagonal and off-diagonal elements in , we define the feature identity ratio as (34)
(33)
where the th and th element denotes the distance measure between two instances of feature vectors for taps made at position indices and . Therefore, for and where CC-LTM and AP-LTM, is a vector of prediction coefficients corresponding to the th position index. Similarly for corresponding to the th position Z-LTM, we have index for CC-LTM and Fig. 5(a) and (b) illustrates a plot of Z-LTM, respectively, where five taps are made for each of the . For this illustrative example, 25 tap locations giving the first set of five taps correspond to the first location while the second set of five taps corresponds to the next location. We note from these plots that exhibits greater disparity between the off-diagonal elements and the diagonal elements for Z-LTM compared to CC-LTM. This large disparity between the diagonal/off-diagonal elements for Z-LTM implies that different instances of a received signal corresponding to the same tap lois used as a feature cation will exhibit a low distance when vector compared to when is used. It is therefore expected that the Z-LTM approach achieves a higher source localization accuracy than CC-LTM.
This measure therefore allows us to quantify, for the same tap location, the robustness of the algorithm to any inconsistencies in the feature vectors brought about by external factors such as background noise. It is therefore expected that a low value of denotes lower dissimilarity between features corresponding to the same tap locations, which in turn implies better localization performance compared to one with a higher . A. Experimental Results Using Real Materials To verify the performance of the proposed algorithm, an experiment was conducted on three materials; aluminum, glass and plywood with dimensions , (0.6, 0.6, 0.004) m and (0.75, 0.6, 0.005) m, respectively. A Murata PKS1-4A1 shock sensor is used to measure the flexural vibrations caused by the taps. Twenty-five tap locations are arranged in a grid of 5 5 array as shown in Fig. 1 where the distance between the second, third, fourth, and fifth row of tap positions with respect to the first row is 1, 3, 6, and 10 cm, respectively. The tap locations are also separated column-wise in the same manner. The first tap location is positioned at with respect to sensor position. As seen in Fig. 1, the sensor is denoted by with a “ ” and each of the tap locations are denoted by a “ ”. The
494
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 3, JUNE 2011
Fig. 6. Experimental setup for collection of data for aluminum.
tap locations are sequentially tapped and received signals are digitized using a PreSonus FireStudio at 96 kHz with 16-bit resolution. Each position is tapped five times to evaluate the feature identity ratio, , and to test the ability of the classifier to identify taps of the same tap location. Results of classification accuracy and feature identity ratio from these experiments are presented in Figs. 7–9 for the aluminum, glass, and plywood, respectively. Although a human finger tap will not deviate the results significantly, we employ a mechanized tapper for our 125 taps. This setup is shown in Fig. 6. For each of these plots, we study the performance of AP-LTM . We also varied the number by varying the prediction order in for Z-LTM. In addition, the perof nonzero elements formance of Z-LTM was studied for various values of and . Since , increasing the value of will increase the spectral resolution at the expense of time resolution. It is therefore expected that the performance of Z-LTM will depend on . Therefore, with , the number of the ratio frequency-bins is 16 times that of time samples. The performance of the algorithms in terms of for an aluminum plate with various nonzero feature elements in and prediction order is shown in Fig. 7(a). The CC-LTM algoand is independent of rithm achieves an accuracy of and while the accuracy of AP-LTM varies with the prediction order such that the highest accuracy of for . In addition, Z-LTM achieves nearly 100% for various . For each , the accuracy increases with the number since a larger implies that more feaof nonzero features tures are taken into account during classification. For the case , the lowest value of with gives the highest acof for the Z-LTM algorithm. Fig. 7(b) curacy of 100% is shows the robustness of the algorithms in terms of the feature identity index . We note that the proposed Z-LTM algorithm , 16, 32 comachieves a low value of for all cases of pared to that of AP-LTM. This implies that there is a greater disparity between the diagonal and off-diagonal elements of which translates to Z-LTM being more robust to any inconsistencies in the feature vectors for the same tap location. As before, we note that AP-LTM achieves good localization perfor. mance when The classification accuracy of the algorithms for a glass plate is shown in Fig. 8(a). As can be seen, AP-LTM achieves the highest when , similar to the case of aluminum. For clarity of presentation, the performance of Z-LTM
Fig. 7. (a) Classification accuracy, C , and (b) feature identity ratio, , for rectangular aluminum.
for , 16, 32 are shown and it is noted that, for each , increases with . It is interesting to note that the performance of Z-LTM reduces with increasing and this implies that for this glass material, a good time resolution is desirable. The perforwhen mance of Z-LTM achieves that of CC-LTM at and . However, as shown in Table I, this comparable performance is achieved with a lower complexity for Z-LTM compared to CC-LTM. Fig. 8(b) illustrates the variation for the glass plate. We note, as before, that of with and the AP-LTM is robust to inconsistencies in the feature vectors . As seen from the lower for the same tap location with values of , the proposed Z-LTM algorithm achieves higher roand 16. bustness for the cases of The same experiment was conducted on a piece of plywood, an anisotropic material. The performance in terms of and are illustrated in Fig. 9(a) and (b), respectively. As can be seen in Fig. 9(a), CC-LTM achieves an accuracy of while when . AP-LTM achieves an accuracy of The accuracy of Z-LTM is consistently higher, for the cases of and 32, reaching , compared to CC-LTM and , reduces to 67% imAP-LTM. For the case where plying that a high frequency resolution is required for plywood. and for Z-LTM Fig. 9(b) shows the variation of with and CC-LTM, respectively. This result is consistent with the fact that AP-LTM achieves highest robustness to any inconsistencies in the feature vectors for the same tap location when .
ARUN et al.: TOUCH INTERFACE EXPLOITING TIME-FREQUENCY CLASSIFICATION
Fig. 8. (a) Classification accuracy, C , and (b) feature identity ratio, , for rectangular glass.
Similar to the case of glass and aluminum, the proposed Z-LTM , 16, and 32. achieves highest robustness for the cases of We show additional results illustrating the performance of the algorithms for a trapezoid acrylic. The trapezoid dimension is given by 46 cm and 57 cm for the parallel sides while the height of the trapezoid is 27 cm. Fig. 10 shows the classification accuracy results obtained for this nonrectangular surface. Comparing this result with Fig. 8(a) for the rectangular glass plate, the performance of the Z-LTM algorithm is similar. This result implies that although the performance of the algorithms are not exactly the same for different object shapes, the Z-LTM algorithm is expected to achieve a performance that is at least close to that of CC-LTM but with lower computational complexity. We illustrate the performance of the algorithms for different tap resolutions using the rectangular glass plate. In this experiment, we evaluated the performance of the algorithms using uniform tap spacings with 0.5, 1, 1.5, 2, 3, and 5 cm resolutions. For each resolution, we defined a grid of 25 tap positions and for each position, five taps are made. We then adjusted and to achieve the best performance of AP-LTM and Z-LTM, and respectively. This corresponds to a typical value of . In addition, for all resolutions, is chosen for Z-LTM. Fig. 11 shows the variation of for various tap spacings. We note that although the classification accuracy of the proposed Z-LTM algorithm reduces modestly when the tap resolution reduces below 1 cm, it achieves the highest classification accuracy compared to AP-LTM and CC-LTM.
495
Fig. 9. (a) Classification accuracy, C , and (b) feature identity ratio, , for rectangular plywood.
Fig. 10. Classification accuracy, C , for trapezoid acrylic.
B. Variation of Accuracy With
and
for Z-LTM
We illustrate, for Z-LTM, the variation of with and . Fig. 12(a)–(c) shows results for the aluminum, glass, and plywood, respectively. As described in Section IV-A, larger implies higher frequency resolution. Comparing these figures, we note that, for each , the accuracy increases initially with thereafter reduces with increasing for aluminium and glass. This reduction in beyond a certain value of implies that a high frequency resolution is no longer able to ensure the uniqueness of the frequency-bins since the time-resolution is lost for high values. For the case of plywood shown in Fig. 12(c), the
496
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 3, JUNE 2011
Fig. 11. Classification accuracy, C , for various tap spacings on a rectangular glass plate.
Q
Fig. 12. Variation of C with plywood.
M and 0 for (a) aluminum, (b) glass, and (c)
modest reduction in for increasing implies that it is important to have high frequency resolution to achieve high localizaachieves a tion accuracy. For all the cases shown, a higher high localization accuracy since more features are used in the feature vector . We gain further insights into the performance of Z-LTM for for aluminium, glass, and these materials by illustrating plywood shown in Fig. 13(a)–(c), respectively. For these illusand giving . trative examples, we chose For all these cases, we observe a peak at approximately 2.5 kHz corresponding to the resonant frequency brought about by the transfer function of the Murata sensor. Comparing Fig. 13(a) with that of Fig. 13(b) and (c), we note that the TFR for aluminium decays less significantly with time compared to that of glass and plywood while the TFR for plywood decays most significantly. This implies that for plywood, a higher frequency resolution is desirable and this observation is consistent with a high value of as shown in Fig. 12(c). Although we do not claim that the performance of Z-LTM is independent of the sensor position with respect to the touch location, we note that LTM-based algorithms are generally less dependent on the sensor positions compared to, for example, time-differences-of-arrival techniques where sensor positions are assumed to be known. Results presented in Figs. 7–10,
un
Fig. 13. Zak transform-based time-frequency representation, jZ j, of ( ) using = 256, = 16 for (a) aluminum, (b) glass, and (c) plywood when tapped at point (0.14, 0.1).
R
Fig. 14. Illustration of estimated positions by CC-LTM, AP-LTM, and Z-LTM which are marked using “ ”, “”, and “3”, respectively. The true source location is marked as “”.
where the proposed Z-LTM algorithm achieves nearly 100% accuracy tested using tap locations that are not just confined to within the neighborhood of the sensor location, suggests that the algorithm is robust, to some extent, the relative source-sensor positions. C. Illustration of Touch Interface To further illustrate the localization performance of the algorithms, we chose a tap location where Z-LTM performs worst compared to other test locations on a piece of plywood. Similar to the above, five taps were made at each of the 25 locations and the taps made at location denoted by “ ” shown in Fig. 14 are used for localization. The estimated tap locations for the CC-LTM, AP-LTM, and Z-LTM are denoted by “ ”, “ ”, and “ ”, respectively. As can be seen from Fig. 14, none of the estimated positions by CC-LTM or AP-LTM corresponds to the true position. The closest position estimated by CC-LTM and AP-LTM is 1 cm (two instances) and 3 cm (one instance) away. The Z-LTM achieves consistently good localization accuracy with three out of five taps being localized accurately. For this illustrative example, CC-LTM and AP-LTM achieves 0% classification accuracy while Z-LTM is able to achieve a 60% classification accuracy.
ARUN et al.: TOUCH INTERFACE EXPLOITING TIME-FREQUENCY CLASSIFICATION
V. CONCLUSION A time-frequency classification algorithm is proposed for source localization on solid surfaces using a single sensor. In this interdisciplinary research, we analyzed the wave propagation of flexural vibration due to an impact on a surface. A time-frequency analysis is exploited to track the amplitude variations of mode frequencies with respect to time. We showed that the Zak transform allows one to achieve faithful representation of these variations, and we proposed to use the variance in each frequency bin with respect to time as a unique time-frequency classifier for each location. With reduced computational complexity compared to CC-LTM, the proposed Z-LTM algorithm achieves an improvement of 2.4% and 20.4% in terms of localization accuracy for the case aluminium and plywood, respectively. REFERENCES [1] G. Shinand and J. Chun, “Vision-based multimodal human computer interface based on parallel tracking of eye and hand motion,” in Proc. IEEE Int. Conf. Convergence Information Technology, 2007. [2] Y. W. Kim, “A novel development of head-set type computer mouse using gyro sensors for the handicapped,” in Proc. IEEE 2nd Annu. Int. EMB Special Topic Conf. Microtechnol. in Med. and Biol., 2002. [3] J. Kim, J. Park, H. Kim, and C. Lee, “HCI (human computer interaction) using multi-touch tabletop display,” in Proc. IEEE Pacific Rim Conf. Commun., Comput., and Signal Process., 2007, pp. 391–394. [4] W. Rolshofen, D. T. Pham, M. Yang, Z. B. Wang, Z. Ji, and M. Al-Kutubi, “New approaches in computer-human interaction with tangible acoustic interfaces,” in Proc. Virtual Int. Conf., May 2005. [5] D. T. Pham, Z. Wang, Z. Ji, M. Yang, M. Al-Kutubi, and S. Catheline, “Acoustic pattern registration for a new type of human-computer interface,” in Proc. Virtual Conf., May 2005. [6] C. Bornand, A. Camurri, G. Castellano, S. Catheline, A. Crevoisier, E. B. Roesch, K. R. Scherer, and G. Volpe, “Usability evaluation and comparison of prototypes of tangible acoustic interfaces,” in Proc. 2nd Int. Conf. Enactive Interfaces, Genoa, Italy, Nov. 2005. [7] T. Kundu, S. Das, S. A. Martin, and K. V. Jata, “Locating point of impact in anisotropic fiber reinforced composite plates,” Ultrasonic, vol. 48, pp. 193–201, 2008. [8] A. Sulaiman, K. Poletkin, and A. W. H. Khong, “Source localization in the presence of dispersion for next generation touch interface,” in Proc. IEEE Cyberworlds, 2010, pp. 2490–2493. [9] D. T. Pham, M. Al-Kutubi, Z. Ji, M. Yang, Z. Wang, and S. Catheline, “Tangible acoustic interface approaches,” in Proc. Virtual Conf., May 2005. [10] K. Poletkin, X. Yap, and A. W. H. Khong, “A touch interface exploiting the use of vibration theories and infinite impulse response filter modeling based localization algorithm,” in Proc. IEEE Int. Conf. Multimedia and Expo, 2010, pp. 286–291. [11] X. Yap, A. W. H. Khong, and W.-S. Gan, “Localization of acoustic source on solids: A linear predictive coding based algorithm for location template matching,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 2010. [12] G. Ribay, S. Catheline, D. Clorennec, R. K. Ing, N. Quieffin, and M. Fink, “Acoustic impact localization in plates: Properties and stability to temperature variation,” IEEE Trans. Ultrason., Ferroelect., Freq. Control, vol. 54, no. 2, pp. 378–385, Feb. 2007. [13] E. Ventseland and T. Krauthammer, Thin Plates and Shells: Theory, Analysis, and Applications. New York: Marcel Dekker, 2001. [14] L. Cohen, “Time-frequency distributions-a review,” Proc. IEEE, vol. 77, no. 7, pp. 941–981, Jul. 1989.
497
[15] F. J. Harris, “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proc. IEEE, vol. 66, no. 1, pp. 51–83, Jan. 1978. [16] P. Angelidisand and G. Sergiadis, “Time-frequency representation of damped sinusoids using the Zak transform,” J. Magn. Resonance, vol. 103, pp. 191–195, 1993. [17] J. R. O’Hair and B. W. Suter, “The Zak transform and decimated timefrequency distributions,” IEEE Trans. Signal Process., vol. 44, no. 5, pp. 1099–1110, May 1996. [18] H. Bolcskei and F. Hlawatsch, “Discrete Zak transforms, polyphase transforms, and applications,” IEEE Trans. Signal Process., vol. 45, no. 4, pp. 851–866, Apr. 1997. [19] D. Sundararajan, M. Ahmad, and M. Swamy, “Fast computation of the discrete Fourier transform of real data,” IEEE Trans. Signal Process., vol. 45, no. 8, pp. 2010–2022, Aug. 1997. [20] C. Cheng and K. Parhi, “Low-cost fast VLSI algorithm for discrete Fourier transform,” IEEE Trans. Circuits Syst., vol. 54, no. 4, pp. 791–806, Apr. 2007. Kattukandy Rajan Arun received the B.Tech. degree in electronics and telecommunication engineering from the University of Calicut, Kerala, India, in 2003 and the M.Sc. degree in signal processing from Nanyang Technological University (NTU), Singapore, in 2008. Currently, he is pursuing the Ph.D. degree in signal processing at NTU. He worked for five years in the digital signal processing division of the product development team in various telecommunication and automotive industries. His research interest includes video and acoustic signal processing.
XueXin Yap (S’10) was born in Singapore in 1982. He received the diploma in electronics and computer engineering from Ngee Ann Polytechnic, Singapore, in 2002 and the B.Eng. degree from Nanyang Technological University (NTU), Singapore, in 2008. He is currently pursuing the M.Eng. degree at NTU and researching on source localization techniques on solids by means of wave-propagation in solids for human-computer interface. His other research interests include time-reversal signal processing, the application of electronics in audio and musical equipment, and psychoacoustics related fields.
Andy W. H. Khong (M’06) received the B.Eng. degree from Nanyang Technological University (NTU), Singapore, in 2002 and the Ph.D. degree from the Department of Electrical and Electronic Engineering, Imperial College London, London, U.K., in 2005. He served as a Research Associate in the Department of Electrical and Electronic Engineering, Imperial College London, from 2005–2008. He is currently an Assistant Professor in the School of Electrical and Electronic Engineering, NTU. His postdoctoral research involved the development of signal processing algorithms for vehicle destination inference as well as the design and implementation of acoustic array and seismic fusion algorithms for perimeter security systems. His Ph.D. research was mainly on partial-update and selective-tap adaptive algorithms with applications to mono- and multichannel acoustic echo cancellation for hands-free telephony. He has also published works on acoustic blind channel identification and equalization for speech dereverberation. His other research interests include human-computer interface, source localization, speech enhancement, and blind deconvolution.