Fast Polynomial Regression Transform for Video

0 downloads 0 Views 340KB Size Report
Fast Polynomial Regression Transform for Video Database. Keesook J. .... ~0 + X y: 8. For the video database transformation, ~0 can be re- moved to reduce the ...
Fast Polynomial Regression Transform for Video Database Keesook J. Han and Ahmed H.Tew k Department of Electrical and Computer Engineering University of Minnesota 200 Union St. S.E., Minneapolis, MN 55455, USA [email protected] and tew [email protected]

Abstract

Important issues in multimedia information systems are the development of ecient storage layout models and e ective retrieval system to manage multimedia database. In multimedia information systems, the capability for interaction with video media type is still extremely limited due to a huge amount of video database. We present the Fast Polynomial Regression Transform (FPRT) based metadata scheme for manipulating the large video database in this paper. This new metadata scheme provides the simpli cation of the query mechanism and the improvement of communication in multimedia systems. The main advantage of the proposed approach is the reduction of the data storage space while preserving rich video content information. Other advantages are computational simplicity for video information coding and accuracy for video indexing. Visual tools for video representation and browsing are also presented in this paper.

1 Introduction There has been a growing need for new multimedia information management technology because of the dramatically increasing amount of multimedia data [6] [9] [10] . Multimedia information system requires effective and ecient methods to store, search, and retrieve the various types of media such as text, image, video, and audio [3]. A huge amount of video database should be properly structured for interacting with the multimedia system. In the current multimedia database system, the metadata approach is prevalently employed to simplify the query mechanism and increase interactivity for the various types of media. The metadata approach decouples the data delivery process from the multimedia database by providing the concise information about the location and characteristics of the data to be retrieved. The most convenience of a metadata based

retrieval system is that the system displays an information of data before retrieving the actual data. The user may easily select the proper data to be delivered. The metadata mechanism is especially useful to retrieve the large video data which can be costly to deliver and can consume signi cant portions of a data server's I/O bandwidth [4]. In this paper, we introduce the new semantics-preserving video information compression and video segmentation techniques to build the optimistic metadata for the video information retrieval system. Semantics-preserving video information is a key to implement the optimistic video metadata. The compact representations that preserve essential image similarities can be achieved by FPRT. Commonly, the color, texture, shape, location of image objects and regions are used for video contents [7] [8]. The selection of the set of features which are used to describe each video frame is important for similarity retrieval based on these features. In our approach, two feature vectors are selected to have horizontal and vertical information of each frame. These features are compressed and reconstructed by FPRT pairs. The compressed video spatial information is stored in digital library. The recovery of the spatial information from these schemes is feasible to be used for similarity retrieval. The video segmentation technique is generally applied to provide the e ective metadata system. The proposed video indexing algorithms have capabilities of extracting the key frames and classifying the relevant clips, and building the scene transition graph which allows easy access to the video image-based information. This metadata type is ideal for interfaces to multimedia information systems. Moreover, the new types of micons are developed to provide the visual indices for video browsing. The previous micons show top and right edges of each of the frames in the video [11] [1]. This micon type can not show the global motion e ects of the video stream.

The basic objective in the development of the new micons is to provide the global information of two directional motions in the video streams. This mechanism is adequate to scan the entire frames to identify the locations of interests. The results from query may be e ectively handled with these visual indices.

2 Polynomial Regression Analysis

The concept of regression is the process of estimating a real-valued function from a nite set of noisy data. The polynomial regression is a special case of multiple linear regression. Thus, a polynomial model may use the multiple regression techniques. The general polynomial regression models can be written in matrix notation as

y = X + ":

(1)

Suppose that n > k observations are available, this model describes k , dimensional space of the regressor variables. In general, y = [y1 ; y2 ;    ; yn ]T is the observation vector. The parameter j , j = 1; 2;    ; k is often called partial regression coecient because it represents slope, and 0 is the intercept of hyper plane. X is the regressor variables matrix and the element of this matrix is X (i; j ) = x(i)j,1 where 1  i  n, and 1  j  (k + 1). " = ["1 ; "2 ;    ; "n]T is the random error vector. The estimated regression coecients vector is ^ = (X T X ),1 X T y, and the predicted observation vector is y^ = X ^. A polynomial of suciently high degree is required to approximate any regression curve. However, the degree of the polynomial should be selected as low as possible to estimate the regression coecients accurately. As the order of the polynomial increases, the X T X matrix becomes ill-conditioned and X T X matrix inversion computations provide serious error for estimating the parameters. Orthogonal polynomials and the centering methods are used to eliminate the non-essential ill-conditioning [5]. For large n, the procedures of the orthogonal polynomial models are more complicated than the centering methods. Hence, the simple centering method is used to implement polynomial regression model for video database.

3 FPRT for Video Database

In this section, fast polynomial regression model for video database will be derived. FPRT is a lossy transform which leads to signi cant simpli cations for video database transformation. The descriptions of video images are not required to have exact solutions of linear least squares. Hence, the estimates can be

computed in online processing which is an important consideration for the real-time multimedia application. The general polynomial regression model provides the general least squares normal equations. There are k + 1 linear equations to solve the k + 1 unknown regression coecients. Since the reconstructed data will be normalized, the intercept vector may be treated as a redundant term and it will be removed by following procedures. Eq. 1 may be decomposed to have k unknown regression coecients by reducing the dimensions of X and the and adding 0 . The reduced regression coecients vector is = [ 1 ; 2 ;    ; k ]T and the elements of the regression variable matrix X (i; j ) = x(i)j where 1  i  n, and 1  j  k. Thus, the general polynomial regression model may be expressed as y = 0 + X + ": (2) The reduced regression coecients vector is = [ 1 ; 2 ;    ; k ]T and the elements of the regression variable matrix X (i; j ) = x(i)j where 1  i  n, and 1  j  k. Let 0 = y , X P , y~ = (y , y) and X~ = (X , X ) where X (i; j ) = n1 ni=1 x(i)j . Eq. 2 is rearranged to provide k linear equations for the least squares normal equations such that ~ + ": y~ = X (3) The method of least squares is used to estimate the regression coecients in Eq. 3. The least squares funcP tion, ( ) = ni=1 "2i , is given by ~ )T (~y , X ~ ): ( ) = (~y , X (4) The least squares function is to be minimized with respect to . The least squares estimators must satisfy

@ j = ,2X~ T y~ + 2X~ T X~ ^ = 0: @ ^

(5)

The following equation is called the least squares normal equations and it is derived from Eq. 5 X~ T X~ ^ = X~ T y~: (6) If the regressors are linearly independent, then (X~ T X~ ),1 matrix will always exist and the k unknown regression coecients can be obtained. The solution of the least squares estimator of is ^ = (X~ T X~ ),1 X~ T y~ = (y , y): (7) Since y^~ = X~ ^ is the vector of tted values corresponding to the observed values y~, the predicted observation

vector y^ can be expressed as follows: y^ = y + X~ ^ = y , X ^ + X ^ = ^0 + X ^ = ^0 , X y + X y = ~0 + X y:

per frame are stored to represent the frame content. Data Reconstruction

(8)

For the video database transformation, ~0 can be removed to reduce the computational complexity while preserving the important video contents.

4 Video Database Coding Algorithm

For video image data, the entries of x(i) = i where 1  i  n. Since x(i) is equally spaced and n can be xed. is computed and stored in the library for fast video database compression. Unlike the lossless transforms which requires the matrix inverse for perfect reconstruction, FPRT pairs use the simple matrix multiplication for video database transformations. Fast Polynomial Regression Transform pairs are de ned by and

 = Y encoding compression

Y^ = X  decoding reconstruction where = ((X , X )T (X , X )),1 (X , X )T .

(9) (10)

Data Selection

The horizontal and vertical spatial information matrices are obtained from the subsampled image Is (Nx = 160, Ny = 120 and q = 2 are selected)P[2]. The eley ments of these matrices are Y# (x; j ) = Ny=1 Isj (x; y), P Nx 1  x  Nx, and Y (y; j ) = x=1 Isj (x; y), 1  y  Ny where 1  j  M . M is an arbitrary window size. These feature matrices should be normalized before encoding the data.

Eq. 10 is used to obtain the decompressed video spatial and temporal information. Y^# = X# # and Y^ = X  should be normalized again to perform the similarity retrieval or display the visual information of video frames.

5 Video Segmentation and Indexing

The e ective organization and retrieval of video information is still limited because the video database is a very dicult medium to represent the video contents. The indexed video can e ectively manipulate multimedia information system. The video segmentation algorithms have problems in association with special e ects. To provide the e ective solution to the problem, we proposed the eigen-image based video segmentation and indexing [2]. We present a new approach for further enhancement of the video partitioning system. The similarity between the eigen-image approach and polynomial regression approach is that the compressed video information is stored in the digital library. The di erence between two approaches is the feature selection. Unlike the eigen-image approach, the reconstructed video signal is selected for features. The reconstructed signal is the de-noising signal from the noisy data. The noise from the camera movement and the object motions are suppressed. The spatial noise-free video information is used to determine the temporal relationship for video frames. Key Frame Extraction

To segment video streams into cuts, two directional spatial di erence measurement vectors are provided by

v uX Nx N yu t #j+1 = N (Y^# (i; j + 1) , Y^# (i; j ))2 x i=1

Model Selection

The model parameter k is selected by the analytical model selection criteria. The optimal polynomial order, k = 13, is chosen to compute both # (k ,by ,Nx) and  (k , by , Ny ). Data Compression

Eq. 9 is the simple video database encoding scheme to compress the feature matrices. The elements in # = # Y# and  =  Y are the regression coecients of kth order polynomial. The 26 coecients

and

v u Ny u X t j+1 = (Y^ (i; j + 1) , Y^ (i; j ))2 : i=1

(11)

(12)

Usually, the variation of di erence measurement vector # is larger than  because the objects and camera frequently move in horizontal direction in video streams. Therefore, the vector  is selected to detect the shot boundaries. j = j+1 , j is rst computed and

next thresholded by the dynamic range thresholding technique such as

j = j+1 = 0 if jj j < t :

(13)

Two directional information of video contents are combined by providing the new variable vector ^ to reduce the shot boundary errors. ^ is de ned by  fj ; #j g if j > t ^j = max (14) 0 otherwise: The discriminant function for the shot boundaries is  ^j > t^ D^ s (j ) = 10 ifotherwise (15) : In FPRT based video segmentation, the jth key frame is the mean vector of jth shot in Y . The system collects the sequence of key frames automatically. Key Frame Clustering

We use a second discriminant function to group similar key frames and produce the Rframe sets instead of searching the complete key frames. Hence, each group can be treated as a unit. This second discriminant function Dg (i; j ) takes the value 1 if key frames i and j belong to same group. It is de ned by  ij < t : same group Dg (i; j ) = 10 ifotherwise (16) where v u Nx uX ij = t ( (x; i) , (x; j ))2 : (17) x=1 Once clustering is completed using the above function, the system produces the scene change graph [2]. To obtain the similarity ranking of key frames,  is simply sorted.

6 Video Abstraction

The video metadata should be well indexed for ease and speed of retrieval. The combination of manual and automatic video abstraction processes has been chosen to provide a sophisticated indexing product that is normally considerable essential in the e ective information retrieval system. Automatic video abstraction is the process of segmenting a video into clips and manual video abstraction is the process of inserting annotations to enrich the description of the video content. Video indexing involves the description of the contents of video clips in a video database by means of

Histogram

1 0.5 0

0

50

100

150

200

250

300

350

400

450

500

550

450

500

550

450

500

550

450

500

550

Likelihood Ratio

1 0.5 0

0

50

100

150

200

250

300

350

400

Eigen−Image

1 0.5 0

0

50

100

150

200

250

300

350

400 FPRT

1 0.5 0

0

50

100

150

200

250

300

350

400

Figure 1: Video segmentation a series of descriptors, indexing terms, or key words which act as secondary keys for the retrieval of that video clips in response to subsequent queries. Manual indexing tasks places considerable demands on the indexing personnel. The text descriptors are commonly biased because any set of text description cannot be suciently or consistently characterized the content of video. However, the manual indexing process is needed to record the key words such as people, objects, and events for feasible video information retrieval.

7 Hybrid Searching Engine The new video indexing technology has developed to produce the hybrid searching facility which consists of the Boolean searching and the similar match searching. In hybrid searching engine, users may use either a video frame or key words for querying the similar video data retrieval. To query the video frame, the vector space model is employed for matching the ltered feature data. The vector space model and the metadata allow users to retrieve the video data eciently. The convenient way of video database searching and retrieving for users may be achieved by using the hybrid searching engine.

8 Video Browsing Tools There are two main issues when dealing with video database. The video contents should be well represented and supported for browsing. Representing a video by a keywords is not suitable to provide much information about the spatial and temporal contents of video. This is the problem with textural indices into a visual and temporal medium. To solve this problem,

micon (video icon) is commonly used as visual representation of video sequences. We propose the new types of icon cover faces to show spatial and temporal motions of video frames. The top surface is covered with the reconstructed data Y^# and the right surface is covered with the Y^ . The subsampled image is used for the icon face. As shown in Fig. 2, sub-icons can be displayed by placing a cursor alongside of the right surface and top surface.

original reconstruct

originaloriginal reconstruct reconstruct

flashing lights similar histograms of two frames

DAVE

motion of small and large objects

FIRM

static

panning zooming

BROADCAST

motion of large objects

STING

Figure 2: Visual tools

9 Experimental Result

We have developed the FPRT for compressing and de-noising the video database. The experimental results indicate that the compression rate of the video image contents is very high (i.e., overall compression rate is 2954:1 for 240-by-320 video image). Fig. 1 shows that the FPRT and Eigen-image based video segmentation results are better than histogram and likelihood ratio approaches (i.e., shot boundaries are occurred at circled data). However, the FPRT based video segmentation is the most fast and reliable video indexing method for the large video database. The FPRT based video indexing and scene change graph are provided to build the e ective visual tools for content-based digital library. Especially the motion based micon is very ecient and e ective visual tool for the video database.

10 Conclusion

The FPRT based video information storage and retrieval methodologies have been developed to provide the reliable metadata for communicating the multimedia information system. The motion based micons enable users to detect the macroscopical changes. This micon scheme is useful for browsing and analyzing video contents.

References

[1] P. Aigrain, P. Joly, and V. Longueville, \Medium Knowledge-Based Macro-Segmentation of Video into Sequences", Intelligent Multimedia Information Retrieval, edited by M. Maybury, AAAI Press/The MIT Press, 1997 [2] K. J. Han and A. H. Tew k, \Eigen-Image Based Video Segmentation and Indexing", International Conference on Image Processing, 1997 [3] R. Hjelsvold, R. Midtstraum, and O. Sandsta, \Searching and Browsing a Shared Video Database", Multimedia Database Systems, Kluwer Academy Pub., 1996 [4] T. Little, G. Ahanger, H. Chen, R. Folz, J. Gibbon, A. Krishnamurthy, P. Lumba, M. Ramanathan, and D. Venkatesh \Selection and Dissemination of Digital Video via the Virtual Video Browser", Multimedia Tools and Applications, Kluwer Academy Pub., 1996 [5] D. C. Montgomery and E. A. Peck, Introduction to linear regression analysis, John Wiley & Sons, 1982 [6] J. G. Neal, S. C. Shapiro, Knowledge-Based Multimedia Systems, Multimedia Systems, J. F. Buford, AddisonWesley Publishing Company, 1994 [7] S. Ravela \Image Retrieval by Appearance", Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval , ACM, 1997 [8] E. Remias, G. Sheikholeslami, A. Zhang, and T. SyedaMahmood \Supporting Content-Based Retrieval in Large Image Database System", Multimedia Database Management Systems, edited by B. Thuraisingham, K. Nwosu, and P. Berra, Kluwer Academy Pub., 1997 [9] Peter Sch}auble, \Content-Based Information Retrieval from Large Text and Audio Databases", Multimedia Information Retrieval, Kluwer Academy Pub., 1997 [10] V. Subrahmanian, Principle of Multimedia Database Systems, Morgan Kaufmann Publishers, Inc., 1998 [11] H. Zhang, S. Smoliar, and Y. Tan, \Towards Automatic Content-Based Video Indexing and Retrieval", Multimedia Modeling, 1993