Wavelet-based Indexing of Audio Data in Audio ... - Semantic Scholar

Wavelet-based Indexing of Audio Data in Audio/Multimedia Databases S.R. Subramanya and Abdou Youssef Department of Electrical Engineering and Computer Science, The George Washington University, Washington, DC 20052. fsubra,[email protected]

Abstract

The phenomenal increase in the amounts of audio (and multimedia) data being generated, processed, and used in several computer applications have necessitated the development of audio (and multimedia) database systems with newer features such as content-based queries and similarity searches to manage and use such data. Example applications of audio databases are in digital libraries, entertainment industry, forensic laboratories, and virtual reality, among others. Fast and accurate retrievals for content-based queries are crucial for such systems to be useful. Ecient content-based indexing and similarity searching schemes are keys to providing fast and relevant data retrievals. This paper presents a scheme for indexing of audio data based on wavelets. The performance of this scheme has been experimentally evaluated and is seen to be more resilient to noise than the indexing schemes using signal-level statistics, and give better retrieval precision than DCT-based indexing.

1 Introduction There has been an explosive growth in the generation and use of multimedia data such as video, audio, and images in computer applications and this is expected to continue at an accelerated pace in the near future. There has been considerable attention devoted to the problem of content-based image indexing ([1, 2, 3, 4, 5]), and video indexing ([6, 7, 8, 9, 10]), and very little to content-based audio indexing. Although tremendous amount of work has been done in the analysis, synthesis, recognition, coding, and transmission of speech data, and analysis and synthesis of music, the on-line storage of digital audio data in com This research was supported in part by a Grant-In-Aid of Research from Sigma-Xi.

puters and their organization as databases with query and retrieval functions are just beginning and are expected to have tremendous growth. Audio could serve as an independent data type or as part of multimedia data. Audio/multimedia databases with newer features are being designed and developed to ease the management and use of such data. Due to several inherent characteristics of audio data, there have been demands for huge storage spaces, large bandwidth and real-time requirements for transmission, contentbased queries, similarity-based search and retrievals, and synchronization of retrieval results. Of interest to the user are easy-to-use queries with fast and correct retrievals from the audio/multimedia database. To this end, (1) derivation of good features from the data to be used as indices during search, (2) organization of these indices in a suitable multi-dimensional data structure with ecient search, and (3) a good measure of similarity (distance measure) are important factors. An audio database supporting content-based retrievals should have the indices structured with respect to the audio features which are extracted from the data. Given a query, the same kinds of features are derived and compared with data features for similarity during retrievals. Since the content-based retrieval schemes usually search the entire collection of data, good indices which characterize the data features well, while still being of low dimensionality are desirable, which facilitate correct and fast retrievals. There are two ways in which the indexing paradigm based on audio features could be approached: (1) model-based approach and (2) general-purpose approach. The former assumes a priori knowledge of how audio data is structured, which is the model. This requires more computations to derive the features and is expected to perform better in terms of

relevance of retrievals. The latter approach requires a more general model describing which features should be examined and how dierent instances of those features should be compared for similarity. In this paper, we focus on the derivation of indices from the raw audio data supporting the general purpose approach described above, and propose the use of wavelets to derive the indices. Although wavelets are beginning to be used in signal processing and data compression, their use in audio/multimedia database indexing have not been extensively explored. Section 2 describes a few representative research eorts in the indexing of audio data for content-based retrievals. Section 3 presents the proposed scheme of using wavelets for deriving the indices. Experimental results are given in Section 4, followed by conclusions.

2 Schemes for Indexing Audio Data The characteristic feature of a data item which is used in the search process to search and retrieve the data item is referred to as the index. The retrievals in conventional databases and text documents use keys and keywords as indices. For audio/multimedia databases, the keyword-based indexing is very limiting. This is due to several reasons including (1) the inexact nature and the richness of expression of audio/multimedia data, (2) the subjective interpretations of such data, and (3) the diculty in formulating exact keyword-based queries for describing the `qualities' of the required objects. For eective use of data in such applications, the user needs to have contentbased queries or queries-by-example, which are unrestricted and unanticipated. The indices supporting the above types of queries for audio (multimedia) data are multidimensional. These indices need to be organized in a suitable multidimensional index structure which supports ecient operations of insertion, deletion, and search. In this paper, we only focus on the derivation of indices using wavelets, from the raw audio data. The readers are referred to the details on typical multidimensional index structures to [17, 18, 19, 20, 21, 22] and to the references therein. In [13], the authors describe a system for contentbased classi cation, search, and retrieval of audio. The sound is rst analyzed to derive a set of acoustical attributes such as pitch, loudness, brightness, bandwidth, and harmonicity, to which statistical techniques are applied to do classi cation and retrieval. In [14], a system for search and retrievals for queryby-example for music tunes is described, which uses

melodic contour, which is a sequence of relative differences in pitch between successive notes, to discriminate between melodies. Based on the melodic contour, a string on the alphabet (U; D; S ) is derived, where a symbol denotes whether the (current) pitch is above, below, or same as the previous pitch, respectively. The search then uses some known approximate string pattern matching algorithms. In [15], an indexing scheme for audio data based on transforms was proposed and was shown to be quite eective and resilient to noise. The dependence of search times and search accuracy on query sizes were presented. This technique does not involve extensive analysis and computations, and supports only query-by-example, which is quite sucient for many purposes. Also, it is more general purpose and can handle any kind of audio data { speech, music, and other sounds. We can broadly classify the commonly used schemes for indexing audio data as follows: 1. Indexing based on signal statistics. In this scheme, the signal statistics such as (1) mean (2) variance (3) zero-crossings, (4) autocorrelation, (5) histograms of samples/dierence of samples, either on the whole data or blocks of data are used. 2. Indexing based on acoustical attributes. In this scheme, using the measurable quantities of audio signal, acoustical attributes such as pitch, loudness, brightness, and harmonicity are derived. Subsequently, statistical analysis (mean, variance, etc.) are applied on these attributes to derive feature vectors which are used in indexing and searching. 3. Transform-based indexing. In this scheme, a transform (such as DCT) is applied to (blocks of) the data and suitable coecients are selected and used as indices for search and retrieval. Issues such as the appropriate transform to be used, the block size, the number of coecients to be retained, etc. need to be addressed. The above methods have drawbacks: the rst method does not give very accurate results for highly varying signals. The second method is better than the rst but requires extensive computations. Also, the rst two methods are not robust to noise. The DCTbased method is more resilient to noise. In general, most audio signals are non-stationary. Transforms applied to the whole signal give the frequency content information in terms of dierent components and their strengths (amplitudes) but does not have any time information such as where the dierent frequencies occur in the signal. Since the indexing is based on using the frequency content information in the signal,

it would be better to have more accurate information about what frequency bands exist at what time intervals. This would result in better indexing and searching. Hence, the DCT-based scheme had to consider suitably sized blocks of data such that the signal within the block could be considered stationary and the short-time transform was applied to these blocks. This works reasonably well in many situations. However, the drawback with the short-time transform is that narrow windows give good time resolution but poor frequency resolution, while wide windows give good frequency resolution but poor time resolution. Further, wide windows may violate the condition of stationarity. The wavelet transform solves the resolution dilemma to a certain extent.

3 Wavelet-based Indexing

lter is represented a set of numbers, called lter taps which represents the impulse response of the lter. The ltering operation corresponds to the convolution of the discrete signal with the lter. Figure 1 shows the DWT/subband coding process for 3 levels, where h[n] and g[n] denote the low-pass and high-pass lters used by the DWT. Original signal, Frequency range: 0 − ω Lowpass filter Downsample by 2

2

0 − ω/2

2

0 − ω/4 h[n]

Level 1 DWT Coefficients ω/2 − ω

g[n]

2

3.1 Preliminaries of wavelets Wavelets are mathematical functions which extract from a given data, dierent frequency components and study each component with a resolution matched to its scale. The term `scale' used in the context of wavelet analysis corresponds to the inverse of frequency or to the time resolution. There are broadly two schemes { (1) continuous wavelet transform (CWT) and (2) discrete wavelet transform (DWT). In CWT, a mother wavelet (or analyzing wavelet) is chosen, the signal is multiplied by the wavelet function (window), and the transform is computed separately for dierent segments of the time-domain signal. The window width is changed for every spectral component and then the transform is computed, which is the most signi cant characteristic of the wavelet transform (dierent from the short-time Fourier or DCT). The DWT analyzes the signal at dierent frequency bands with dierent resolutions by decomposing the signal into a (coarse) approximation coecients and detail coecients. In this sense, the wavelet decomposition of a signal is similar to subband coding, where, a signal is passed through several band-pass lters to obtain signal components at dierent bandwidths, for subsequent analysis and processing. DWT employs two sets of functions called scaling functions and wavelet functions, which are associated with lowpass and high-pass lters, respectively. The decomposition of the signal into dierent frequency bands is obtained by successive low-pass and high-pass ltering of the time domain signal and down-sampling the signal after each ltering. In the discrete domain, a

2

h[n]

Highpass filter

g[n]

h[n]

Level 2 DWT Coefficients ω/4 − ω/2

g[n]

2 0 − ω/8

2

Level 3 DWT Coefficients ω/8 − ω/4 DWT coefficients

Approximation coefficients

level 2 level 1 level 3 Detail Coefficients

Figure 1: The subband decomposition process. After the signal of length, say k, is passed through the lters of any stage l, the outputs { the lowand high-frequency subbands, are signals of the same length k but with only half the frequency band as the input. Thus half of the samples in each of the lter outputs can be removed without loss of information, according to Nyquist criterion. This is done by down-sampling, where every other sample is discarded. The high-frequency subband at stage (level) l is termed level-l detail coecients. The process is repeated a given number of times on the low-frequency subband at each level. The low-frequency subband of the last stage is called the approximation coecients. The combination of approximation coecients and the detail coecients of all levels is termed DWT coecients. Since the half-band low-pass lter removes half of the frequencies present in the input, the frequency resolution of its output is twice that of its input. After the downsampling, the time resolution is halved since the output contains only half the number of points as the input. Thus the combination of ltering and down sampling increases the frequency resolution but decreases the time resolution. Note that, only the low-frequency subbands were succes-

−3

0.2

5

0

0

−0.2

0

200

400

600

800

1000

−5

0.01

0.04

0

0.02

−0.01

0

−0.02

0

5

10

15

20

25

−0.02

0.1

0.1

0

0

−0.1

0

20

40

60

80

−0.1

x 10

0.2

0.2

0

0

−0.2 0

5

10

15

20

25

10

20

30

40

50

100

0

200

400

150

600

5

10

15

20

25

−0.5

0

0

−0.2

0

0

0

300

−0.2

0.2

0

200

1000

0.5

0.2

100

800

0

0.2

0

600

0

0.2

−0.2

400

0.5

−0.5 0

200

0.2

−0.2 0

0

−0.2

0

0

20

40

100

60

200

80

300

−0.2 0.1

0

−0.1

0

0

5

10

10

15

20

20

30

25

40

0

50

100

150

0

200

400

600

Figure 2: Wavelet decomposition of two dierent audio signals. sively decomposed into two ( ner) subbands, while the high-frequency subbands were untouched. Thus the approximation coecients (low frequencies) have a high frequency resolution but poor time resolution, while the detail coecients of level 1 (high frequencies) have a poor frequency resolution but good time resolution. (The frequency and time resolutions of the detail coecients progressively change in successive levels). Thus, in DWT (unlike DFT or DCT), the time localization of the frequencies are not lost. However, the time localization will have a resolution that depends on which level they appear. Most signals require these kinds of resolutions, which makes this scheme attractive for their analysis. The wavelet decomposition up to 6 levels of two dierent audio signals, using Daubechies-4 wavelet is shown in Figure 2. Each of the 2 sub-plots on the left and right, has 8 windows: the top-left window shows the original signal, the top-right window shows the approximation coecients. The windows in rows 2 to 4 and left to right in each row, show in order, the detail coecients at levels 6; 5; : : : 1. The frequencies that are most prominent in the original signal will appear as high amplitudes in that region of the DWT transformed signal (bands) which includes those particular frequencies, while the frequencies that are not prominent in the original signal will have low amplitudes in those parts of the DWT transformed bands which contain those frequencies.

Figure 3 shows (a) the original signal, (b) the DWT coecients, and (c) the rst 15% of the DWT coef cients for two dierent audio data. Note that the rst 10 ? 15% of the DWT coecients carry relevant information. This is the reason for the use of approximation coecients (and a few detail coecients) as indices. (Also, the general shape of the DWT coecients in the initial portion corresponds to the general shape (envelope) of the whole signal).

3.2 Derivation of indices In our wavelet-based indexing scheme, we use the built-in MATLAB function WAVEDEC to analyze the audio data. WAVEDEC(x; N; wname) performs a multi-level 1-D wavelet analysis of signal x using a speci c wavelet given by wname and returns the wavelet decomposition of x at level N (DWT coef cients). The output decomposition structure contains the wavelet decomposition vector C and the bookkeeping vector L. The decomposition vector C is the set of DWT coecients shown at the bottom of Figure 1, which consists of the approximation and detail coecients. The structure is organized as follows: C = [app:coef:(N ) j det:coef:(N ) j : : : j det:coef:(1)], L(1) = lengthof(app:coef:(N )), L(i) = lengthof(det:coef:(N ? i + 2)) for i = 2; : : : ; N +1, and L(N +2) = length(X ). Appropriate DWT coecients are selected to serve as indices.

Original signal

Original signal

0.3

0.2

0.2

0.1

0.1

0

0

−0.1

−0.1

0

100

200

300

400

500 600 DWT signal

700

800

900

−0.2

1000

0.2

0.4

0.1

0.2

0

0

−0.1

−0.2

−0.2

0

100

200

300

400 500 600 First 15% of the DWT signal

700

800

900

−0.4

1000

0.1

0.4

0.05

0.2

0

0

−0.05

−0.2

−0.1

0

50

100

−0.4

150

0

100

200

300

0

100

200

300

0

400

500 600 DWT signal

400 500 600 First 15% of the DWT signal

50

700

800

900

1000

700

800

900

1000

100

150

Figure 3: Original signal and DWT coecients of two dierent audio data. Two broad schemes to derive indices from the DWT coecients are shown in Figure 4. In both the schemes, all the approximation coecients are selected. In scheme 1, the rst few of the detail coecients of the dierent levels are selected. For example, this could be the rst 20% of the detail coecients of levels N n1 , the rst 10% of the detail coecients of levels n1 n2 , and none of the detail coecients of levels n2 1. In scheme 2, the 20%, 10%, etc. of the highest valued coecients (instead of the rst 20%, 10%, etc.), are selected. C is derived by appropriate selection of coecients from C , and L is derived from L to re ect the number and positions of the selected coecients. C is used as the index and L for auxiliary information. These are explained in the following examples. Consider scheme 1, and N = 3 (3-level decomposition). Let na and ni denote the number of selected approximation and level i detail coecients, respectively. Then, C = [C (1 : na ) j C (L(1) + 1 : L(1) + n3 ) j C (L(1) + L(2) + 1 : L(1) + L(2) + n2 ) j C (L(1)+ L(2)+ L(3)+1 : L(1)+ L(2)+ L(3)+ n1)] and L = [hL(1); na i; hL(2); n3 i; hL(3); n2 i; hL(4); n1 i]. In scheme 2, the detail coecients of C are basically the collection of sets of a certain number of highest valued of the dierent levels of detail coecients of C , as explained in the previous paragraph. Thus, all selected detail coecients of a level may not be consecutive. This requires additional location information to be stored in L . 0

0

0

0

0

0

0

Approximation Detail Coefficients coefficients level 2 level 1 level 3

Index (a) Scheme 1

DWT coefficients

Index

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

denote the locations Let l1 ; lni of the level i detail coecients in L. Then, L = [hL(1); nai; hL(2); n3 ; l1 ; ln3 i; hL(3); n2 ; l1 ; ln2 i; hL(4); n1; l1 ; ln1 i].

0

(b) Scheme 2

Figure 4: Index derivation from DWT coecients.

3.3 Pseudocode for indexing and searching Based on the assumption of the average length S of a data item and a suitable number of approximation coecients ka , the number of decomposition levels N is determined: N blog2(S ) ? log2 (ka )c. The

wavelet decomposition is applied to the data N times using a suitable wavelet lter. Then appropriate coecients are chosen to be used as indices for search and retrieval. The pseudocode for index generation and search are given below.

Algorithm 3.1 SelectCoeffs (x; N :in; indices:out) fx: Input signal, N :no. of decomposition levelsg findices: combination of approx.& detail coes.g 1. begin 2. [C; L] WAVEDEC(x; N; wname). fApply the Wavelet decomposition on the signal x using the wavelet speci ed by wname to obtain approx. and detail coes.g 3. 4.

Retain all of the ka approximation coes. Select the most signi cant kd detail coes. and note their frequency bands and positions within the bands and obtain [C ; L ]. f[C ; L ] is the index.g fSuitably organize the indices in an appropriate index structure.g Repeat for all data. 0

0

5.

0

0

6. 7. end

The following algorithm describes the search based on the indices obtained by wavelet tranform. Note that this is a generic search and does not consider the operations speci c to the data structure in which the indices are organized, since the main objective here is to establish the viability of using the wavelet transform coecients as indices for audio data.

Algorithm 3.2 Search (Q, :in; Rlist:out) fQ: Example query, : no. of best matches; Rlist: retrieval result list.g 1. begin 2. 3.

4. 5.

[Cq ; Lq ] WAVEDEC(Q; N; wname). Select the ka approximation coes. and kd detail coecients to obtain [Cq ; Lq ]. Match the corresponding approx. and detail coecients of the query and each of the data and determine the `distance'. fex.distance measure: Euclidean distance.g Select all the matches above a threshold (determined empirically) as the matches. Rlist (Best matches).

6. 7. end

0

0

Complexity: The time complexities for (a) deriving the indices from the data les, and (b) searching for a given query, for the three schemes{(1) histogrambased, (2) DCT-based, and (3) wavelet-based schemes are derived in detail in [16], and are summarized below. Let F denote the number of les, S the length of a data item (number of samples), B the block size, b = d BL e the number of blocks in a data item, and N the number of levels of decomposition for wavelet indexing. Let h be the number of bins used in the histogram-based scheme, and k denote the DCT coecients selected per block in the DCT-based scheme. The time complexities of deriving the indices in the three schemes are respectively: O(FbB ), O(FbB log B ), and O(2S log S ). The complexities of search for a given query are respectively, O(Fhb), O(Fkb), and O(F ( 2SN )). The time for index derivation is the lowest for histogram-based scheme, followed by the DCT-based and wavelet-based schemes. The search time however, is not signi cantly dierent in the three schemes, as the index sizes are approximately the same.

4 Experimental Results The proposed scheme has been implemented in a Sun Sparcstation, using MATLAB and C functions. The experiments used over 200 audio data les, and several queries-by-example. The les and the query were approximately of the same sizes (15 seconds). The les were Sun/Next format (.au les). An .au le consists of a 32-byte header which contains information such as the le name, the sampling rate, the durations, etc., followed by the actual data. The data consists of audio samples, which were sampled at 8 KHz, with each sample converted to -law encoded format, and represented using 8-bits. The data les consisted of speech, music, and several other kinds of data such as sounds produced by animals and birds, sounds of bells and whistles, special eect sounds, etc. Daubechies-4 wavelet with 12 levels was used to derive DWT coecients. In the experiments, random noise was added to the queries (and some data). The performance of the indexing scheme based on wavelets is compared with the schemes based on (1) signal statistics [16] and (2) discrete cosine transform [15]. For signal statistics based indexing, the audio data was divided into xed-size blocks (of 256 samples) and the mean, variance, and 7-bin histogram of the samples were used. Euclidean distance was used as the distance measure in the matches in all schemes.

Qry 1 2 3 4 5 6 7 8 9 10

SigSt 2/35 2/26 1/29 2/32 2/23 2/14 2/19 1/16 3/21 1/12

10 Blks

Precision = Ratio of correct matches to total reported matches

DCT DWT 3/28 3/23 2/9 2/7 2/5 2/4 2/24 2/18 3/14 3/10 2/3 2/3 3/5 3/7 2/5 2/5 4/11 3/9 2/4 2/5

20 Blks

SigSt DCT DWT 1/15 1/9 1/7 2/12 2/3 2/3 1/20 2/3 2/3 1/18 1/5 1/3 1/14 2/3 2/3 1/9 2/3 2/3 1/12 2/5 2/5 1/12 2/4 2/3 3/17 3/6 3/7 2/11 2/3 2/3

40 Blks

SigSt DCT DWT 1/9 1/2 1/2 1/7 1/2 1/2 1/12 1/1 1/1 1/7 1/2 1/1 1/8 1/2 1/2 1/8 1/2 1/2 1/10 2/4 2/5 1/9 1/3 1/2 1/13 2/6 2/5 1/12 1/2 1/2

80 Blks

SigSt DCT DWT 1/6 1/1 1/1 1/6 1/1 1/1 1/9 1/1 1/1 1/6 1/1 1/1 1/7 1/2 1/1 1/5 1/2 1/2 1/9 1/2 1/2 1/7 1/1 1/1 2/10 2/3 2/3 1/6 1/2 1/2

Table 1: Comparison of retrieval precision in the three schemes. The performance metric was precision, which is de ned as: of Relevant Objects Returned Precision = Number Total Number of Objects Returned The retrieval precision for ten representative queries using the three schemes is given Table 1, for dierent block sizes of the queries. A plot of the retrieval precision averaged over 30 queries is shown in Figure 5. The following observations can be made: (1) The DCT and wavelet-based indexing show more resilience to noise, (2) The wavelet-based scheme gives the most precision, followed by the DCT-based scheme and the scheme based on signal statistics, (3) query size of more than 40 blocks will not signi cantly result in increased precision of retrievals. 0.9

Sig.Stat. DCT Wavelet

0.8

0.7

Retrieval precision

0.6

0.5

0.4

0.3

0.2

0.1

0 10

20

30

40 50 Query block sizes

60

70

80

Figure 5: Retrieval precision of dierent schemes.

5 Conclusions and future directions Derivation of good indices for audio/multimedia data is critical to fast and accurate retrievals for content-based queries in audio/multimedia databases. The indices need to be structured with respect to the features of the audio data. This paper proposed a scheme of using wavelets for indexing audio data for retrievals in audio/multimedia databases for queriesby-example. The proposed scheme was implemented and was demonstrated by experiments to give better hit ratios compared to DCT-based scheme and more resilient to noise compared to schemes using acoustical attributes, thus establishing the eectiveness of wavelet coecients as indices for audio les. Some future directions include (1) study of dierent wavelet lters yielding indices providing more accurate retrievals, (2) handling data of widely varying sizes and subsequence matches for short queries, (3) determination of a weight vector which assigns appropriate weights to the approximation coecients and detail coecients of dierent levels, for a better similarity measure, and (4) a combination of wavelet and subsequent DCT transform to further reduce the index dimensionality.

References [1] M.D'Allegrand. Handbook of image storage and retrieval systems. Van Nostrand Reinhold, New York, 1992. [2] W.Grosky and R.Mehrotra, eds. Special issue on image database management. IEEE Computer, Vol. 22, No. 12, Dec 1989. [3] V.Gudivada and V.Raghavan. Special issue on content-based image retrieval systems. IEEE Computer, Sept. 1995, Vol. 28, No. 9. [4] A.D.Narasimhalu, ed. Special issue on content-based retrieval. ACM Multimedia systems, Vol. 3, No. 1, Feb 1995. [5] R.Jain, et al. Similarity measures for image databases. SPIE, Vol. 2420, pp. 58{61. [6] M.La Casia and E.Ardizzone `Jacob: Just A Contentbased Query System for Video Databases', IEEE Conf., 1996 pp 1216{1219. [7] Y-L.Chang et al. Ìntegrated Image and Speech Analysis for Content-Based Video Indexing', Proc. 3rd IEEE Int'l. Conf. on Multimedia Computing and Systems, 1996, pp306{313. [8] A.Hampapur et al. `Digital Video Segmentation', ACM Multimedia 94 Conf. Proc., pp 357{364. [9] A.Hauptmann and M.A.Smith `Text, Speech and Vision for Video Segmentation: The Informedia Project'. [10] S.W.Smoliar and H.J.Zhang `Content Based Video Indexing and Retrieval', IEEE Multimedia, Summer 1994, pp62{72. [11] A.Graps Àn Introduction to Wavelets', IEEE Computational Science and Engineering, Vol.2, No.2, Summer 1995.

[12] R.Polikar [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]

`The

Wavelet

Tutorial',

www.public.iastate.edu/ rpolikar/WAVELETS/ WTtutorial.html

. E.Wold et al. Content-based classi cation, search and retrieval of audio data. IEEE Multimedia Magazine, 1996. A.Ghias et al. Query by humming. Proc. ACM Multimedia Conf., 1995. S.R.Subramanya et al. `Transform-Based Indexing of Audio Data for Multimedia Databases', IEEE Int'l Conference on Multimedia Systems, Ottawa, June 1997. S.R.Subramanya Èxperiments in Indexing Audio Data' Tech. Report, GWU-IIST, January 1998. A.Guttman `R-trees: A Dynamic Index Structure for Spacial Searching', Proc. ACM SIGMOD, June 1984, pp 47-57. T.Sellis et al. `R+ -tree: A Dynamic Index Structure for Multidimensional Objects', Proc. 13th Int'l Conf. Very Large Databases, Sep. 1987, pp 507-518. N.Beckmann et al. `The R* -tree: An Ecient and Robust Access Method for Points and Rectangles', ACM SIGMOD, May 1990, pp322-331. D.B.Lomet and B.Salzberg `The hb-tree: A Multiattribute indexing method with good guaranteed performance' ACM TODS, 15(4), Dec. 1990, 625-658. J.T.Robinson `The k-D-B-tree: A Search Structure for Large Multidimensional Dynamic Indexes', Proc. ACM SIGMOD, 1981, pp 10-18. J.Nievergelt et al. `The The Grid File: An Adaptable Symmetric Multi-key File Structure', ACM Trans. on Database Systems, Vol. 9, No. 1, 1984.

Wavelet-based Indexing of Audio Data in Audio ... - Semantic Scholar

Wavelet-based Indexing of Audio Data in Audio ... - Semantic Scholar

Suggest Documents

Audio Indexing - CiteSeerX

Speech Processing for Audio Indexing

Audio Documents Analysis and Indexing: Entropy ... - Semantic Scholar

audio interpolation - Semantic Scholar

Content-Based Audio Indexing and Retrieval - Semantic Scholar

MULTIMODAL INDEXING OF DIGITAL AUDIO ... - MultiMatch Project

MULTIMODAL INDEXING OF DIGITAL AUDIO-VISUAL ... - MultiMatch

Evaluation of Features for Audio-to-Audio Alignment - Semantic Scholar

Semantic indexing of soccer audio-visual sequences: a ... - CiteSeerX

Audio Engineering Society - Semantic Scholar

Audio Content Transmission - Semantic Scholar

Learning Bimodal Structure in Audio-Visual Data - Semantic Scholar

Efficient audio-driven multimedia indexing through ...

Avatar Therapy: an audio : an audio-visual ... - Semantic Scholar

AUDIO INDEXING: WHAT HAS BEEN ACCOMPLISHED ... - CiteSeerX

An Online Audio Indexing System - Infoscience

FAST UNCONSTRAINED AUDIO SEARCH IN ... - Semantic Scholar

Data Sheet - Ashly Audio

Audio-Based Interventions in Sport - Semantic Scholar

Toward semantic indexing and retrieval using hierarchical audio models

SSE Audio SSE Audio - SSE Audio Group

Automated Classification of Audio Data and Retrieval Based on Audio ...

Use of Transforms for Indexing in Audio Databases

MYSTORYPLAYER: SEMANTIC AUDIO VISUAL ... - Semantic Scholar