Fuzzy Logic Recursive Change Detection for tracking and denoising of Video Sequences Vladimir Zlokolicaa , Matthias De Geyterc , Stefan Schulteb , Aleksandra Pizuricaa , Wilfried Philipsa and Etienne Kerreb a Ghent
University, Dept. of Telecommunications and Information Processing (TELIN), IPI, Sint-Pietersnieuwstraat 41, 9000 Gent, Belgium b Ghent University, Department of Applied Mathematics and Computer Science, 9000 Gent, Belgium c VRT, Brussels, Belgium ABSTRACT
In this paper we propose a fuzzy logic recursive scheme for motion detection and temporal filtering that can deal with the Gaussian noise and unsteady illumination conditions both in temporal and spatial direction. Our focus is on applications concerning tracking and denoising of image sequences. We process an input noisy sequence with fuzzy logic motion detection in order to determine the degree of motion confidence. The proposed motion detector combines the membership degree appropriately using defined fuzzy rules, where the membership degree of motion for each pixel in a 2D-sliding-window is determined by the proposed membership function. Both fuzzy membership function and fuzzy rules are defined in such a way that the performance of the motion detector is optimized in terms of its robustness to noise and unsteady lighting conditions. We perform simultaneously tracking and recursive adaptive temporal filtering, where the amount of filtering is inversely proportional to the confidence with respect to the existence of motion. Finally, temporally filtered frames are further processed by the proposed spatial filter in order to obtain denoised image sequence. The main contribution of this paper is the robust novel fuzzy recursive scheme for motion detection and temporal filtering. We evaluate the proposed motion detection algorithm using two criteria: robustness to noise and changing illumination conditions and motion blur in temporal recursive denoising. Additionally, we make comparisons in terms of noise reduction with other state of the art video denoising techniques.
1. INTRODUCTION In digital video, the general motion detection problem is quite complex, because it is not always easy to distinguish illumination changes from real motion and because of the aperture problem. As such, “close to perfect” solutions tend to be very time-consuming. However, in many applications it is sufficient to detect changes in the scene rather than actual motion, and even to detect only some of the changes. In this paper we consider motion detection in terms of robust change detection against noise and slowly varying illumination conditions. This means that our method will not be able to distinguish completely changes due to motion from other changes due to rapidly varying illumination, scene-cuts, fast camera zoom and panning, . . . Despite these restrictions there are numerous applications for this kind of motion detection: it can be used for surveillance objectives, e.g. to monitor a room in which there is not supposed to be any motion, or the detection results can be a useful input for more advanced, higher level video processing techniques, like the tracking of objects through time (e.g. [1]). Other applications are noise removal and deinterlacing, where one applies temporal filtering depending on the outcome of the motion detector. Given the outlined restrictions, the main problem to be tackled is to distinguish image noise, caused by various sources, from real changes in picture intensity. Additionally, we try to adapt the method to changeable illuminating conditions both in spatial and temporal direction. A trivial and very fast (but not very good) solution for pixel-by-pixel change detection is to simply subtract the grey levels of successive frames, and to Further author information: (Send correspondence to Vladimir Zlokolica) Vladimir Zlokolica: E-mail:
[email protected], Telephone: +32 9 264 8909, Fax.: +32 9 264 4295
Noisy input current frame
Fuzzy motion detection
Previously temporally filtered frame
motion confidence Motion confidence thresholding
Motion tracking
Update α, β, σ
Recursive temporal filtering
Spatial filtering
T
Spatio-temporal denoised sequence
Figure 1. The proposed algorithm framework
conclude that the pixel has changed when the outcome exceeds a present threshold. Because only one pixel is considered at a time, the computational cost is quite low. While this technique works reasonably well at low noise levels, the performance degrades rapidly with increasing noise level. Also, in practice the threshold will have to be tuned to the noise level, which must be estimated on a global or local basis. To tackle the noise problem, some more sophisticated techniques are needed. One could examine regions with more than one pixel, and define and calculate some characteristic functions (e.g., the spatial average in a window of pixels) as in [2,3], at the expense of an increased complexity. In this paper, we propose an alternative approach and we restrict ourselves to a recursive algorithm requiring a small number of computations. The idea is to use adaptive thresholding, where the threshold is adapted to the local grey scale statistics and the spatial context of the pixel. The resulting method is insensitive to noise and to slow illumination changes. Moreover, it is locally adaptive, i.e., to spatially varying noise levels. In [4], some more complex techniques to obtain a threshold based on the noise or signal properties are proposed, and applied to simple frame differencing. Our method is different, since it uses data coming from a longer period of time, and the threshold is adapted to both temporal and spatial information. It is quite obvious that different applications require different approaches; in the case of motion detection for tracking, it is important that more or less only “real” motion is detected: the goal is to detect moving objects. This means, on the one hand, that if the input sequence is noisy, we do not want the noise to be labeled as motion, while on the other hand, it is not that important if not every single changed pixel of an object is detected. However, in the case of motion detection for denoising, where the detection result is used for temporal filtering, undetected changes in an object can lead to (unwanted) motion blur, while it is allowed that some (preferably as little as possible) of the noise is labeled as motion. By using fuzzy logic we aim at defining a confidence measure with respect to the existence of motion, to be called hereafter (the degree of) motion confidence. Based on the defined motion confidence we optimize temporal denoising and motion detection. We show the results of video denoising performance in terms of PSNR and visual point of view, and motion detection performance in terms of its robustness to changeable noise and illuminating conditions. In section 2 we present the framework of the algorithm for simultaneous motion detection and video denoising, and show the main blocks of the proposed algorithm. Subsequently, we explain the proposed fuzzy logic recursive motion detector in section 3. In section 4 adaptive temporal and spatial filtering scheme is introduced. Finally, experimental results are given in section 5 and conclusions are drawn in section 6.
2. ALGORITHM DESCRIPTION The proposed algorithm is depicted in Fig. 1. We first process an noisy input sequence with fuzzy logic motion detection using a current and previously processed temporal frame in order to determine the degree of the motion
confidence. The degree of the motion confidence is expressed as a real number between the two extremes: zero (no motion for sure) and one (motion for sure), in a fuzzy logic manner. The determined motion confidence is further used for motion tracking and for video denoising. For tracking we perform motion confidence thresholding where the threshold used is application dependent. By thresholding we determine two states of the motion tracker: motion - no motion, and subsequently use the estimated information for motion tracking of movements and certain objects (in that case motion information is combined with image segmentation) of the noisy video sequence. The assumed noise in corrupted input video is the white Gaussian noise. On the other hand, motion confidence is used to update the parameters (α, β and σ) for temporal recursive filtering and spatio-temporal adaptation of the algorithm to the change of noise levels and illumination changes, in the input corrupted sequence. Specifically, the parameter α is a weighting factor of the recursive temporal filter, which controls the amount of filtering based on the amount of noise present (input noise variance) and based on the motion confidence determined by the fuzzy motion detector. We estimate the standard deviation of the Gaussian noise σ(x, y, t) from the input video sequence [5] only for the first frame (t = 0) and subsequently try to adapt the variance of noise to the input video illumination and noise changes by spatio-temporal adaptation of the standard noise deviation (σ), where parameter β is used to control the amount of updating noise and illumination changes. Next, we perform temporal recursive filtering using the current input noisy frame and the previously temporally filtered frame. Subsequently, the proposed spatial filter is applied to the temporally filtered frame. Since the temporally filtered frames will have non-stationary noise introduced, the proposed spatial filter aims at locally (in space) adapting to the estimated signal variance. As can be seen in Fig.1 the temporal filter and the fuzzy logic motion detector work in a closed loop, that is in a recursive manner where at each step the performance of each module is improved by the one. This is achieved through adaptation to a local spatio-temporal noise variance and illumination changes and motion present in the input noisy video, recursively.
3. FUZZY LOGIC RECURSIVE MOTION DETECTOR In a previous work [5] we have introduced a binary motion detector, which can only distinguish between motion and no motion. This motion detector was used to control the performance of the spatio-temporal filter, according to two states only: temporal information included or not. A disadvantage was that in case of higher noise levels, due to motion ambiguities, the filter was introducing motion blurring and artifacts resulting in a reduced denoising performance. Recently, in [6] a recursive scheme for motion detection was proposed for motion tracking in the presence of low-noise levels, where only information from the current and from the previously processed (temporally accumulated) frame was used to detect motion aiming at reducing memory requirements. However, still in the same approach the proposed motion detector was binary, and consequently was not well suited for motion detection in presence of higher noise levels and usage for video denoising. In this paper we propose a fuzzy logic recursive motion detector for both video denoising and tracking. Instead of determining motion by binary logic, that is motion - no motion, we determine the degree of motion confidence which is expressed as a real number between zero (no motion for sure) and one (motion for sure). We use a membership function to determine the membership degree for the absolute difference between the pixel value at the same spatial positions in two consecutive frames. Additionally, we do not only consider the current pixel position but spatial neighbors in a 2D-window (3 × 3) as well. Finally we combine the membership degree appropriately using the proposed fuzzy rules in order to get the resulting fuzzy logic motion detection output. In order to get a binary output for tracking, we impose simple thresholding on the fuzzy logic motion detection output.
3.1. Fuzzy logic based information processing Fuzzy logic based information processing helps to model imperfect knowledge and uncertainty about threshold values [7]. Extension of the classical set theory to fuzzy sets presents the essential part of fuzzy logic [8]. A fuzzy set F in an universe U characterized by a membership function η F which takes a value of the interval [0, 1]: F : U → [0, 1], u 7−→ ηF (u)
(1)
The classical intersection and union of sets do also have their fuzzy counterparts. A fuzzy intersection of two fuzzy sets A and B in U is given by: A ∩ B : U → [0, 1], u 7−→ ηA (u) ∧ ηB (u)
(2)
where the ∧-operator represents a triangular norm [9, 10]. Additionally, a fuzzy union of A and B is given by: A ∪ B : U → [0, 1], u 7−→ ηA (u) ∨ ηB (u)
(3)
where the ∨-operator represents a triangular co-norm [9, 10].
In general a conditional rule such as “IF antecedent THEN consequent” formulates the necessary conditions in the antecedent so that the consequent is activated. The fuzzy control rule “IF x is F THEN y is G” is represented by a fuzzy relationship using a fuzzy implication function I: R : U × V → [0, 1], (u, v) 7−→ IR (u, v) = `(ηF (u), ηG (v)).
(4)
Several choices for the implication function are available [7]. Rules are activated by means of an inference mechanism. In general compound antecedents can be used in a rule set. For example, “x 1 is F1 and x2 is F2 ” is represented by a triangular norm: ηF1 ∧F2 (u1 , u2 ) = ηF1 (u1 ) ∧ ηF2 (u2 )
(5)
For more detailed description concerning fuzzy logic controllers we refer to [11,12]. In general fuzzy controller is composed of three important steps: fuzzyfication, an application of a rule set, and the defuzzification. The rule set of a MISO-system (Multiple-Input-Single-Output) can be characterized as follows:
R1 R2
: :
Rm
:
IF x1 is A1,1 AND . . . AND xn is A1,n IF x1 is A2,1 AND . . . AND xn is A2,n ... IF x1 is Am,1 AND . . . AND xn is Am,n
THEN THEN
z is C1 z is C2
THEN
z is Cm
(6)
Such a canonical form only contains antecedents which use the conjunction operator only. Every rule evaluates to a certain truthness value which is called a plausibility in fuzzy logic (as opposed to probability they are not normalized). If multiple rules infer the same consequent, those results need to be combined by a triangular conorm. Finally, defuzzification is the inverse process of fuzzification: the plausibilities of the different consequents need to be combined to obtain a meaningful result (e.g., a single scalar output) [13].
3.2. Motion detection The proposed fuzzy logic based motion detector combines several exemplars of the difference signal ∆(x, y, t) in a fuzzy manner in order to determine the motion confidence for a certain spatio-temporal position (x, y, t), horizontal, vertical and temporal coordinate, respectively, where the difference signal ∆(x, y, t) is defined as follows: ∆(x, y, t) = |In (x, y, t) − Ip (x, y, t − 1)|
(7)
Namely, the difference signal ∆(x, y, t) represents the absolute difference of the pixel value from the input noisy frame currently being processed In (x, y, t) and pixel value in the same spatial position from the previously processed sequence Ip (x, y, t − 1). Using the difference signal ∆(x, y, t), we define the membership functions SMALL and BIG as follows: 1 γSM ALL (∆(x, y, t)) = 1− 0 γBIG (∆(x, y, t)) =
∆(x,y,t)−a b−a
0
∆(x,y,t)−a b−a
1
, ∆(x, y, t) < a , a ≤ ∆(x, y, t) ≤ b , ∆(x, y, t) > b
, ∆(x, y, t) < a , a ≤ ∆(x, y, t) ≤ b , ∆(x, y, t) > b
(8)
(9)
with the following parameters values used: a = k ∗ σ mean and b = 4q(σ(x, y, t)), where σ(x, y, t) and σmean (t) correspond to the standard deviation of the noise for the processed video signal at position (x, y, t) and mean value of σ(x, y, t) for the whole frame “t”, respectively. The parameter k depends on the resolution of the input video and usually ranges in k ∈ [0, 0.25]. Additionally, we define q(σ(x, y, t)) in the following way: q(σ(x, y, t)) = 2σmean (t − 1) − (σ(x, y, t − 1) − kbµ σmean (t − 1)/µ(x, y, t) + kf µ µ(x, y, t)/σmean (t − 1))
(10)
where µ(x, y, t) represents a rough estimation of the noise variance and motion together and is defined as follows: µ(x, y, t) =
P1
i,j=−1
|∆(x + i, y + j, t)|
(11) 9 and kbµ and kf µ represent adaptable constants for the no motion (k bµ = 1.5) and motion (kf µ = 0.75) undergoing area, respectively. Specifically, the absolute value of difference signal ∆ from (7) is averaged in a 2D (3 × 3) window, in order to get more spatially smoothed value µ, where the determined µ value is used for the correction of the slope of the membership functions γBIG (∆(x, y, t)) and γSM ALL (∆(x, y, t)). The bigger the value of µ(x, y, t) is (relatively to the σmean (t − 1)), the bigger the slope of the membership functions will be and consequently the more sensitive the motion detection will be. On the other hand if µ(x, y, t) is relatively small (region without motion - only noise in the background), the slope of the membership functions will be smaller, hence reducing the sensitiveness of the motion detector. In such way, we aim at improving at same time the sensitiveness of the motion detection to movements and its robustness against noise. We not only consider the membership degrees γBIG (∆(x, y, t)) and γSM ALL (∆(x, y, t)) at the current pixel position (x, y, t) but also for those of the spatial neighbors in a in a 2D-window (K × K), where K = 3 is in our implementation. These two membership functions (SM ALL and BIG) are visualized in Fig. 2. Finally, we combine the membership degree (8) and (9) of all difference signals (∆) in a 2D sliding window, by using our fuzzy rule set (Table 1), which represents the core of the fuzzy controller. As a result every rule decides whether there is motion (M (x, y, t) = true)) or not (M (x, y, t) = f alse)). The first rule states that there is motion (M (x, y, t) = true)) if the signal difference in the window Fig. 3(a) ∆(x, y, t) can be considered as big in a fuzzy way and at least three neighboring signals ∆(x + i, y + j, t) (i, j = −1, 0, 1) can be considered as big too. It introduces N r1 rules:
(a) SM ALL
(b) BIG 1.5
membership degree
membership degree
1.5
1
SMALL
0.5
0
a
∆
1
BIG
0.5
0
b
a
∆
b
Figure 2. Membership functions
rule 1
antecedent ∃K ((i1 ,j1 )6=(i2 ,j2 )6=(i3 ,j3 ))((i1,2,3 ,j1,2,3 )6=(0,0)) {(∆(x, y, t) = BIG) AND
consequent
i1 ,j1 ,i2 ,j2 ,i3 ,j3 =−K
2 3
(∆(x + i1 , y + j1 , t) = BIG) AND (∆(x + i2 , y + j2 , t) = BIG) AND (∆(x + i3 , y + j3 , t) = BIG)} (∆(x, y, t) = SM ALL) (∆(x, y, t) = BIG) AND (∀Ki,j6=0 (∆(x + i, y + j, t) = SM ALL))
M(x,y,t) = true M(x,y,t) = false M(x,y,t) = false
i,j=−K
4
∃Ki,j6=0 {(∆(x, y, t) = BIG) AND (∆(x + i, y + j, t) = BIG) AND i,j=−K
(∀K ((i1 ,j1 )6=(i,j))((i1 ,j1 )6=(0,0)) (∆(x + i1 , y + j1 , t) = SM ALL)}
M(x,y,t) = false
i1 ,j1 =−K
5
∃K i1 ,j1 ,i2 ,j2 6=0 {(∆(x, y, t) = BIG) AND (∆(x + i1 , y + j1 , t) = BIG) AND i,j=−K
(∆(x + i2 , y + j2 , t) = BIG) AND (∀K ((i3 ,j3 )6=(i1,2 ,j1,2 ))((i3 ,j3 )6=(0,0)) (∆(x + i3 , y + j3 , t) = SM ALL)}
M(x,y,t) = false
i3 ,j3 =−K
Table 1. Fuzzy rule set - theoretical approach (∃ - there exists; ∀ - for all)
Nr1 =
3 CN −1
=
N −1 3
=
(N − 1)! (N − 1)(N − 2)(N − 3) = (N − 2 − 2)!3! 3!
(12)
where in our case for K = 3, N = 9 and consequently Nr1 = 56.
The second, third, fourth and fifth rule determine the case where we detect no motion, M (x, y, t) = f alse. The second and the third rule state that there is no motion if the central signal difference, in the window Fig. 3(b) ∆(x, y, t) can be considered as small in a fuzzy way or if the central signal difference in the window Fig. 3(c) ∆(x, y, t) can be considered as big in a fuzzy way but all neighbors can be considered as small, respectively. Both rules (the second and the third) introduce one rule. The fourth rule states that there is no motion if the central signal ∆(x, y, t) and one neighbor can be considered big but all other neighbors can be considered as small in a fuzzy way, fig. 3(d). It introduces N − 1 rules. Finally, the fifth rule states that there is no motion if the central signal difference ∆(x, y, t) and two neighbors can be considered as big but all the other neighbors can be 2 considered as small in a fuzzy way, Fig. 3(e), introducing additionally N r5 = CN −1 rules. As a result in total we get Ntotal = Nr1 + Nr5 + N + 1 = rules at each position. For the triangular norm and co-norm [10] we have used the product and the probabilistic sum, respectively:
(a) 1.rule
(b) 2.rule
(d) 4.rule
(e) 5.rule
(c) 3.rule
Figure 3. 2D Neighborhood sliding window for fuzzy motion detection
rule R1
antecedent ∃K ((i1 ,j1 )6=(i2 ,j2 )6=(i3 ,j3 ))((i1,2,3 ,j1,2,3 )6=(0,0)) {(∆(x, y, t) = BIG) AND
consequent
i1 ,j1 ,i2 ,j2 ,i3 ,j3 =−K
R2
(∆(x + i1 , y + j1 , t) = BIG) AND (∆(x + i2 , y + j2 , t) = BIG) AND (∆(x + i3 , y + j3 , t) = BIG)} ∀K ((i1 ,j1 )6=(i2 ,j2 )6=(i3 ,j3 ))((i1,2,3 ,j1,2,3 )6=(0,0)) (SF1 OR SF2 OR SF3 OR SF4 )
M(x,y,t) = true M(x,y,t) = false
i1 ,j1 ,i2 ,j2 ,i3 ,j3 =−K
subf act SF1 SF2 SF3 SF4
f act ∆(x, y, t) = SM ALL ∆(x1 , y1 , t) = SM ALL ∆(x2 , y2 , t) = SM ALL ∆(x3 , y3 , t) = SM ALL Table 2. Fuzzy rule set - practical implementation (∃ - there exists; ∀ - for all)
A∗B A+B−A∗B
A AND B: A OR B:
(13)
where A and B correspond to the membership degree of the membership functions γ SM ALL and/or γBIG (8,9). Finally, we define the motion confidence as follows: θ=
Mtrue Mtrue + Mf alse
(14)
where Mtrue corresponds to the “true” motion calculated according to the first rule and M f alse corresponds to the “false” motion calculated according to the second, third, forth and the fifth rule (Table 1). However, for practical implementation in order to reduce the number of calculation, we have tested the performance of the proposed motion detector by θ = Mtrue and have noticed negligible decrease of the performance in comparison to (14). Nevertheless, it should be noted that in the latter case the fuzzy rules concerning “false” motion are defined differently, i.e. the fuzzy rules are defined as in Table 2. Finally, for the motion tracking we perform motion confidence thresholding on the estimated motion confidence θ(x, y, t) as follows: M D(x, y, t) =
Y ES NO
θ > T HR otherwise
(15)
where the threshold T HR is implementation defendant and video resolution defendant.
4. SIGNAL FILTERING AND ADAPTATION In this section we introduce the temporal recursive filter and propose a scheme for adapting the temporal filtering and motion detection algorithm to unsteady lightning conditions and to sudden changes of the noise level (section 4.1). For the latter, we estimate the standard deviation of the input Gaussian noise for the first frame only and then adapt it throughout the sequence. Additionally in subsection 4.2 we propose simple and efficient spatial filter for removing non-stationary noise introduced by the temporal filter, that precedes the spatial one.
4.1. Temporal filtering and adaptation Using the degree of motion confidence, θ, we perform recursive temporal filtering as follows: when the degree of motion confidence is bigger we filter less while in the opposite case we filter more. Namely, we control the amount of filtering through a parameter α(x, y, t), which is directly proportional to the estimated motion confidence, and takes values α ∈ [0, 1]. The proposed recursive temporal filtering is defined in the following way: Ip (x, y, t) = αIn (x, y, t) + (1 − α)Ip (x, y, t − 1)
(16)
where In and Ip correspond to current input noisy and processed (temporally filtered) output frame, respectively, as shown in Fig. 1. For one extreme case α(x, y, t) = 1, there is no averaging, while for the other extreme case, α(x, y, t) = 0, only the processed (filtered) pixel from the previous temporal position t − 1 is substituted. The smaller the α value, the stronger filtering will be performed but also more information from the previous frame will be taken into account. Hence, in case when in the previous frame α(x, y, t − 1) alpha was big α ≈ 1, the noise will be kept and if we allow in the current frame α(x, y, t) big, that is due to the information from motion confidence from the current frame θ(x, y, t) to be big, noise will propagate through the processed sequence. In order to refine our method to be robust against noise propagation, we determine α(x, y, t) not only accordingly to the motion confidence in the current frame (θ(x, y, t)), but also according to the amount of the temporal filtering in the previous frame, α(x, y, t − 1), in the following way: α0 (x, y, t) + α(x, y, t − 1) + (1 − α(x, y, t − 1))α0 (x, y, t) (17) 2 p where α0 (x, y, t) is an initial estimate based on the motion confidence only, α 0 (x, y, t) = θ(x, y, t), and the updated value (17) uses the information about the amount of filtering from the past. As a result, we obtain a more smoothed transition of the amount of temporal filtering from one frame to the other. Note that this also prevents noise propagation in time (as explained above) and avoids in this way a typical problem of time recursive schemes. α(x, y, t) = α(x, y, t − 1)
In order to adapt our algorithm to unsteady lighting conditions and sudden changes of the noise level we estimate the standard deviation of the input Gaussian noise using the method of [5] for the first frame only and then adapt it throughout the sequence for each position (x, y, t), separately. We perform this adaption of the standard deviation of the noise (σ(x, y, t)) by recursive averaging. The higher the degree of motion confidence is the closer the current value σ(x, y, t) is to the previous value σ(x, y, t − 1): σ(x, y, t) = (1 − β(x, y, t))µ + β(x, y, t)σ(x, y, t − 1)
(18)
where µ is defined in (11) and the weighting parameter β(x, y, t) which controls the updating of σ(x, y, t) is defined as: β(x, y, t) =
Kβ θ
θ < Kβ otherwise
(19)
where Kβ is a constant. We found experimentally that for different video sequences the range K β ∈ [0.25, 0.5] yields the best motion detection performance. Further on, in our experiments, we find that more stable and better results are obtained by locally averaging the estimated σ value within a 2D-sliding window (contains half of the values from the previous and half from the current frame) centered at (x, y, t). In the first frame σ(x, y, t) is initialized to a constant for each spatial position and then further adapted. We use σ(x, y, t) to improve the robustness against noise, noise level changes and illumination changes, for the motion detector in the following way: the estimated σ(x, y, t) defines the slope of the fuzzy SMALL and BIG membership functions (8,9) by modifying the parameter b from (8,9) in terms of function q(σ(x, y, t), which is defined in (10). The bigger the estimated noise variance is, the smaller the slope of the corresponding membership function will be and thus noise reduction of the motion confidence will be stronger. additionally, in case of illumination changes and/or change of the noise level σ(x, y, t) should slowly (in one or two frames time) adapt to the present changes in the video.
4.2. Spatial filter After temporal filtering, we apply spatial weighted averaging filtering as as follows:
Is (x, y, t) =
1 X
(w(i, j, t)Ip (x + i, y + j, t)
(20)
i,j=−1
where the weighting function w(x, y, t) is a Gaussian function of the absolute difference between the central (x, y, t) and the neighboring pixel value in a 2-D window (3 × 3). Namely, the latter function is defined as: w(i, j, t) = e
− (I(x+i,y+j,t)−I(x,y,t)) 2σ 2 (x,y,t)
2
(21)
The width of the Gaussian function used for weighted spatial averaging has been set to the estimated σ(x, y, t), that is the standard deviation estimated for the central pixel position, which was experimentally shown to perform optimal in terms of the resulting PSNR in the processed sequences.
5. EXPERIMENTAL RESULTS In this section, we demonstrate the performance of the proposed algorithm for video denoising and motion tracking. The tested video sequences were distorted by various levels of the Gaussian noise (σ i = 10, 15, 20, 25, 30). Additionally video sequences with unsteady lighting conditions and noise levels varying in time throughout the sequence were considered. The camera used for obtaining the sequences was fixed. Hence, we consider sequences with steady background and moving objects, that are used in video surveillance (motion tracking and video quality improvement). We have first tested several video sequences corrupted with steady Gaussian noise throughout the input corrupted sequence and with steady lighting conditions, such as “Salesman”, “Trevor”, “Miss America”, ”Bicycle” and “Chair”,. . . . We have corrupted them with various noise levels and compared denoising results with several state of the art techniques. In Table 3 we show comparison of the proposed filter performance in terms of averaged PSNR over all frames in the sequence for the noise corrupted sequences with artificial uniformly spread Gaussian noise, with σi = 10. Additionally we show the PSNR per frame for the “Salesman” and “Trevor” sequence with additive Gaussian noise (σi = 15), respectively, on Fig. 4. The proposed Fuzzy motion recursive spatio-temporal filtering technique (FMRSTF) was compared with several state-of-the-art techniques for video denoising: The Rational filter [14] (Rational), Motion-Detail Adaptive KNN filter [5] (MDAKNN), The spatio-temporal sequential filter of [15], Multiclass wavelet based spatio-temporal filter [16] (MCWF), 3D wavelet filter [17] (3DWF) and Alpha-Trimmed filter [18]. Both in terms of the PSNR and visually, our algorithm outperforms single resolution techniques [5,14] of similar or higher complexity. Moreover, the proposed algorithm yields a PSNR and visual performance that is comparable to much more complex wavelet based video denoising techniques [15–17]. The processed sequences (with added Gaussian noise of σ = 10, 15, 20, 25, can be viewed on http://telin.ugent.be/ vzlokoli/SPIE05/UNSP.
Table 3. PSNR for processed sequences corrupted with Gaussian noise σi = 10
input sequence Salesman Deadline MissAm. Trevor Tennis
Fuzzy MRSTF 34.1 34.8 36.6 34.1 31.2
PSNR adaptive α K-NN trimmed 32.5 29.4 32.2 24.3 35.3 35.1 34.1 34.1 30.5 22.5
rational
MCWF
SEQWT
30.6 27.3 35.2 34.3 26.5
32.96 33.17 35.9 35.37 31.33
34.14 X 37.57 X 32.27
(a) Salesman
(b) Trevor
34.5
34
34
33.5 33
33.5
32.5
33
PSNR
SEQWT MCWF 3DWF AKNN FSTF Rational
32 31.5 31 30.5
PSNR
32
32.5
31.5 31 30.5 30
30
29
29.5
28.5
29
MDAKNN Rational FMRSTF MCWF
29.5
0
10
20 30 frame index
40
50
28
0
5
10 frame index
15
20
Figure 4. PSNR per frame for input noise level σ = 15
There video sequences with motion detection (tracking) can be viewed as well, where the detected motion was determined by the motion confidence thresholding as showed in Fig. 1 as a separate block and described by (15), with the threshold T HR = 0.75. In order to show the robustness of the motion detector to noise and to motion blur we show the results of the motion detector for different threshold values (15) for the estimated motion confidence (θ). The results of the processed sequences for the “Tennis” and “Salesman” sequence for noise levels σ = 5, 10, 15, 20, 25, 30, for different threshold values T HR, can be visualized on the website: http://telin.ugent.be/˜vzlokoli/SPIE05/UNSP. Additionally, in Fig. 5 we show the comparison of the proposed motion detector for the “Tennis” sequence (t = 131) and with added noise of σi = 15, with the Noise Robust Change Detection (NRCD) of [6] and with the result of the proposed method performance on the same sequence without noise added. From the fig. 5, it can be observed that the proposed FMRD method is always better, that is, more robust to noise and detects more motion in comparison to NRCD method, with consideration of ground truth motion information from Fig. 5(d). Moreover, by changing the threshold value (THR) we can adapt the algorithm to desired amount of detection, whereas in the other methods [4, 6] the output is fixed. As the noise increases the amount of detected pixels decreases slowly for the cost of noise reduction of motion detected pixels. However, even for relatively small threshold (THR) and relatively high noise levels (σ ≈ 20) as shown in the Fig. 15 it gives good and reliable results. We have also tested our video denoising algorithm on video sequences with additive Gaussian noise that changes throughout the sequence in the temporal domain. As an experiment we have added noise, σ i = 10, in “Salesman” and “Bicycle” sequence for the first 25 and 15 frames, respectively, and then changed the noise level to (a)σi = 15 and (b)σi = 20. Similarly, we have set another experiment with σ i = 20 for the first 25 and 15 frames, for “Salesman” and “Bicycle” sequence, respectively, and then change it to (c)σ i = 15 and
(a) (σi = 20) NRCD
(b) (σi = 20) FMRD (T HR = 0.5)
(c) (σi = 20) FMRD (T HR = 0.75)
(d) (noise-free) FMRD (T HR = 0.75)
Figure 5. Motion Detection comparison
(a) Salesman
(b) Bicycle
20 19
20
18 17 16
15
10−15 20−15 20−15 10−20
14 13
σ(t)
σ(t)
15
12
10−20 10−15 20−10 20−15
10
11 10 9 8
0
10
20 30 frame index
40
50
5
0
10
20
30
frame index
Figure 6. σ adaption
(d)σi = 20. In Fig. 6 we show how the estimated average σ mean (t) changes from frame to frame and how big the slope is, i.e. the speed of the adaption. The corresponding processed sequences can be seen on the web: http://telin.ugent.be/˜vzlokoli/SPIE05/NUNSP/CN, where both denoised sequences and motion detection sequences are shown. The value for motion confidence thresholding (Fig. 1)(15) was T HR = 0.75. It can be observed from the sequences showed how fast the motion detection adapts to the noise level change and corrects its performance in few frames (usually 1-4).
Finally, we have tested our algorithm for motion detection and video denoising for the video sequences with varying illumination changes both in spatial and temporal domain. For that we have set the experiment with the sequence with the fixed camera and have changed the illumination conditions in the time and space. Additionally we have added Gaussian noise σi = 15 to the sequences and have processed them. The results demonstrate good robustness against slow illumination changes. For sudden changes more time is needed for the algorithm to adapt. The results are shown on the website: http://telin.ugent.be/˜vzlokoli/SPIE05/NUNSP/CI.
6. CONCLUSION In this paper a new adaptive recursive scheme for fuzzy logic based motion detection that is robust to noise and slowly variating illumination changes, is presented. The proposed scheme works in a close loop with the proposed temporal recursive filter. In future we aim at investigating the usage of color and spatial image features for more efficient motion detection that could cope with fast illumination changes and fast zooming and panning.
REFERENCES 1. J. Denzler and D. W. R. Paulus, “Active motion detection and object tracking,” in ICIP, pp. 635–639, 1994. 2. R. Jain and K. Skifstad, “Illumination independent change detection from real world image sequences,” Computer Vision, Graphics, and Image Processing 46, pp. 387–399, 1989. 3. Y. Z. Hsu, H. H. Nagel, and G. Refers, “New likelihood test methods for change detection in image sequences,” Computer Vision, Graphics, and Image Processing 26, pp. 73–106, 1984. 4. P. L. Rosin, “Thresholding for change detection,” in ICCV, pp. 274–279, (Bombay, India), January 1998. 5. V. Zlokolica and W. Philips, “Motion and detail adaptive denoising of video,” in IS&T/SPIE Symposium on Electronic Imaging,San Jose, California, USA, Jan. 2004. 6. M. De Geyter and W. Philips, “A noise robust method for change detection,” in ICIP,Barcelona, Sept. 2003. 7. D. V. D. Ville, PhD, Ghent University, 2001. 8. E. Kerre, “Introduction to the basic principles of fuzzy set theory and some of its applications,” Communication and Cognition , 1991. 9. E. Kerre, Fuzzy sets and approximate reasoning, Xian Jiaotong University Pres, 1998. 10. E. Klement, M. R., and E. Pap, Triangular Norms, Kluwer Academic Pub., 2000. 11. E. Cox, “Fuzzy fundamentals,” IEEE Spectrum , Oct. 1992. 12. C. Lee, “Fuzzy logic in control systems: Fuzzy logic controller,” IEEE Trans. Syst., Man. and Cybernetics 20, pp. 319–323, Mar. 1990. 13. W. Van Leekwijck and E. Kerre, “Defuzzification: criteria and classification,” Fuzzy Sets and Systems 108, pp. 159–178, Dec. 1999. 14. F. Cocchia, S. Carrato, and G. Ramponi, “Design and real-time implementation of a 3-D rational filter for edge preserving smoothing,” IEEE Transactions on Consumer Electronics 43, pp. 1291–1300, Nov. 1997. 15. A. Pizurica, V. Zlokolica, and W. Philips, “Noise reduction in video sequences using wavelet-domain and temporal filtering,” in SPIE Conference on Wavelet Applications in Industrial Processing, (Providence, Rhode Island, USA), Oct. 2003. 16. V. Zlokolica, A. Pizurica, and W. Philips, “Video denoising using multiple class averaging with multiresolution,” in International Workshop VLBV03, Lecture Notes in Computer Science of Springer Verlag, (Madrid, Spain), Sept. 2003. 17. W. Selesnick and K. Li, “Video denoising using 2d and 3d dual-tree complex wavelet transforms,” in Wavelet Applications in Signal and Image Processing (Proc. SPIE 5207), San Diego, Aug. 2003. 18. J. Bednar and T. L. Wat, “Alpha-trimmed means and their relationship to median filters,” IEEE Transactions on acoustics, speech, and signal processing 32, pp. 145–153, Feb. 1984.