arXiv 1610.00386
1
Rain Removal via Shrinkage-Based Sparse Coding and Learned Rain Dictionary Chang-Hwan Son and Xiao-Ping Zhang Abstract— This paper introduces a new rain removal model based on the shrinkage of the sparse codes for a single image. Recently, dictionary learning and sparse coding have been widely used for image restoration problems. These methods can also be applied to the rain removal by learning two types of rain and non-rain dictionaries and forcing the sparse codes of the rain dictionary to be zero vectors. However, this approach can generate unwanted edge artifacts and detail loss in the non-rain regions. Based on this observation, a new approach for shrinking the sparse codes is presented in this paper. To effectively shrink the sparse codes in the rain and non-rain regions, an error map between the input rain image and the reconstructed rain image is generated by using the learned rain dictionary. Based on this error map, both the sparse codes of rain and non-rain dictionaries are used jointly to represent the image structures of objects and avoid the edge artifacts in the non-rain regions. In the rain regions, the correlation matrix between the rain and non-rain dictionaries is calculated. Then, the sparse codes corresponding to the highly correlated signal-atoms in the rain and non-rain dictionaries are shrunk jointly to improve the removal of the rain structures. The experimental results show that the proposed shrinkage-based sparse coding can preserve image structures and avoid the edge artifacts in the non-rain regions, and it can remove the rain structures in the rain regions. Also, visual quality evaluation confirms that the proposed method outperforms the conventional texture and rain removal methods. Index Terms—Rain removal, texture removal, sparse coding, dictionary learning, correlation, deep learning, classifier
R
I. INTRODUCTION
ain forms structures on captured images. This means that rain structures can prevent computer vision algorithms (e.g., face/car/sign detections, visual saliency, scene parsing, etc.) from working effectively [1]. Most computer vision algorithms depend on feature descriptors such as scale invariant feature transform (SIFT) [2] and histogram of oriented gradients (HOG) [3]. These descriptors are designed based on the gradient’s magnitude and orientation, and thus the rain structures can have negative effects on the feature extractor. For this reason, rain removal is a necessary tool. Rain removal can preserve the details of objects and suppress the rain This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) under Grant RGPIN239031. C.-H. Son is with the Department of Electrical and Computer Engineering, Ryerson University, ON M5B2K3 Canada (E-mail: changhwan76.son@ gmail.com.) X.-P. Zhang is with the Department of Electrical and Computer Engineering, Ryerson University, ON M5B2K3 Canada (E-mail:
[email protected]).
structures and thus, it can be used to detect the visual saliency [4] on the rain images. Moreover, recently, self-driving [5] is a hot problem in the vehicle industry. To realize this, bad weather conditions [6] including rain, snow, or haze should be considered. Therefore, low-level computer vision algorithms, such as rain and haze removals, are essential in smart cars. Application of rain removal is not limited only to computer vision problems in bad weather conditions. In general, rain removal first detects certain types of image structures, and then removes the detected structures from input images. Therefore, rain removal approaches can be applied to similar problems that appear in computer graphics and image processing. For example, rain removal approaches can used for texture [7], stripes [8], waterfall effects [9], or other types of artifacts [10] removal. A. Related Works In most cases, rain structures can be described by vertical and diagonal edges. However, in some cases, rain structures appear with other types of patterns. Given an input rain image, several approaches can be considered to remove the rain structures. The first approach is to directly use the conventional texture removal algorithms. The morphological component analysis (MCA) [11] and relative total variation (RTV) [7] are sophisticated methods to remove the textures. The MCA algorithm [11] uses parametric-based dictionaries, which indicate the basis vectors of the discrete cosine transform (DCT) and curvelet wavelet transform (CWT). Different types of DCT and CWT dictionaries can discriminate between the textured and non-textured parts. Especially, the CWT can detect the anisotropic structures and smooth curves/edges, while the DCT can represent periodic patterns. However, it has not been proven that the two parametric-based dictionaries are still effective to separate the rain textures from the input rain images. Another RTV-based texture removal [7] can be used for rain removal if the rain structures are fine textures. In [7], the RTV is defined as the absolute value of the sum of the spatial gradients calculated at every local region and it is shown that the RTV is useful to distinguish the rain structures from the main structures (e.g., large edges/lines). However, for images with heavy rain pattern, the RTV model may fail to discriminate between the rain and original main textures, thereby removing the rain and main textures at the same time. The second approach is to describe the features for the rain structures, and then remove the rain structures from the input rain-patterned images [1]. To detect the rain structures, hand-crafted rain features can be designed based on elliptical
arXiv 1610.00386 shape [12] or high visibility and low saturation [13]. After detecting the rain regions via a handcrafted feature descriptor, rain structures can be removed via nonlocal filtering or image inpainting. However, recent trend is to adopt representation learning approach [14] (e.g., dictionary learning [15] or deep learning [16]) rather than designing handcrafted features. Representation learning can automatically extract useful features from raw data and it has shown powerful performance for image restoration and image classification problems. For this reason, representation learning has replaced traditional handcrafted design. Following this trend, online dictionary learning has been used to represent rain and non-rain image structures in an input image [1]. In this method, to separate the rain dictionary part from the whole dictionary, handcrafted HOG feature descriptor was used with an assumption that rain structures have vertical and diagonal edges with high variations. However, rain structures are not restricted to only vertical and diagonal edges with high variations. Therefore, the HOG descriptor has limited ability to classify the rain dictionary part from the learned whole dictionary. Even though this method can increase rain removal performance when the rain structures are removed, it may remove object details as well. To overcome this drawback, the depth of field (DOF) can be considered to roughly represent the non-rain regions [17]. However, DOF is also a handcrafted feature. Moreover, the use of DOF is not the main algorithm for the sparse-coding-based rain removal. Rather, it is regarded as a pre-processing step that can be applied to any rain removal algorithms. Actually, other visual saliency algorithms [4] can be used instead of DOF as well. Recently, a more elegant rain removal method was presented based on the discriminative sparse coding [18]. The key idea of this paper is to make the sparse codes of rain and non-rain dictionaries mutually exclusive. However, as discussed in [18], perfect mutual exclusivity cannot be guaranteed for some rain images with similar structures between rain and objects, which leads to unsatisfactory results. In [18], the initial rain dictionary is designed by using motion kernel with the dominant gradient orientation of input rain image. Thus, this initialization can affect final rain removal performance. Rain structures still exist on the rain regions even though object's details are preserved with the learned non-rain dictionary. We will show this in our experimental results. There is another related work [19] that removes dirt and water droplet on the captured images through a window. In this paper, convolutional neural network (CNN) was used to map corrupted image patches onto the clean patches in a supervised manner. However, this method cannot be directly applied to rain removal. In the case of rain images, it is not easy to collect the corrupted and clean patch pairs as it is necessary in a supervised method. Also, rain structures have variety of forms in size and shape, and thus direct mapping from the corrupted patches to the clean ones may not work properly because rain detection and rain removal process are incorporated into the CNN model. However, the learned rain features via the CNN can be used for rain detection in an unsupervised manner, which requires an additional classifier (e.g., logistic regression or support vector machine), and then rain removal process, such as image inpainting or nonlocal
2
filtering can be conducted to remove rain structures. Meanwhile, there are video de-raining methods that are based on temporal and chromatic priors [20] Gaussian mixture model [21], analysis of rain layers in transformed domain [22], etc. However, this paper focuses on the rain removal for a single image, and thus more details will not be discussed in this paper and the reader may refer to related literatures [20-23]. B. Motivation Online dictionary learning requires the handcrafted HOG descriptor [1] to separate the rain dictionary part from whole dictionary. However, the HOG descriptor cannot model various types of rain structures, and thus in this paper, offline dictionary learning is adopted. That is, rain dictionary is learned from the rain training images, not from the input rain image. However, even the rain training images can include non-rain regions. Therefore, in this paper, masked images are generated manually, and then used during the offline dictionary learning to indicate the rain regions. By doing this, the learned rain dictionary can represent various types of rain structures. Also, non-rain dictionary is separately learned from natural training images. Assuming that the rain and non-rain dictionaries are given, rain removal can be achieved by forcing the sparse codes of the rain dictionary to be zero vectors, as follows:
α in R i x Dα i D n D r r α i 0
(1)
where R i indicates the operator [24,25] that extracts a patch from input rain image ( x ) at the ith pixel position. If x is the image vector with the size of M 1 and the patch size is m m , the matrix R i will be m 2 M in size. In this paper, boldface lower case indicates vectors, whereas boldface uppercase indicates matrices. D m 2 2 K is the dictionary set in which each column vector corresponds to the basis vector which is also called signal-atom. The dictionary set is composed of D n m 2 K and D r m 2 K . These are the learned non-rain and rain dictionaries, respectively. In (1), the left term shows that the extracted input rain patch R i x m 1 can be approximated by the linear combination of the dictionary set D and the corresponding sparse code α i 2 K 1 . Any kinds of sparse coding algorithms [15] can be used to estimate the α i from the rain patch ( R i x ). The estimated sparse code α i can be divided into α n and α r 2
i
i
corresponding to the D and D , respectively. Here, the sparse code α r is used only to represent the rain structures. n
r
i
Therefore, the rain structures can be removed by assigning the zero vector to the sparse code α r , thereby producing the i
rain-removed patch, which is represented by D n α in . Ideally, the rain dictionary D r should be used only to reconstruct the rain structures. However, the rain dictionary D r can reconstruct the non-rain structures. Since the signal-atoms of
arXiv 1610.00386 thee rain and non--rain dictionaries can have some correlatio ons, sim milar signal ato oms in the rain and non-rain dictionaries d can n be sellected via a matching m pursuiit algorithm. Therefore, T for the nonn-rain patchess, the rain rem moval approach h that forces the spaarse code α r to be zero vecttor, can lead to the edge artifaacts i
andd detail loss off objects. Fig.1 1 shows an exaample of the ed dge arttifacts and conttrast loss that ap ppear around th he man's shoulder andd face regions including eye, lip, and hair. This observation revveals that the sh hrinkage of thee sparse code α r in the non-rrain i
reggions is needeed. Also, it iss expected thaat this shrinkaage appproach should be applied to the sparse cod de α n in the rain r i
reggions. Therefore, the main contribution c off this paper iss to shoow a new meth hod of shrinkin ng the sparse codes c α r and α n i
i
forr rain removall. We call it shrinkage-baseed sparse coding meethod hereafterr.
Figg.1. Input rain imag ge (left image) and d rain-removed imaage with edge artiffacts andd contrast loss (righ ht image).
C. Proposed Shrinkage-based Sparse Coding g vs. parse Coding for f Rain Removal Disscriminative Sp B Before moving g to the next seection, we firsst need to discu uss moore details abou ut the proposed d shrinkage-based sparse coding andd the conventtional discrimiinative sparse coding [18]. As briiefly mentioned in the In ntroduction, the t conventio onal disscriminative sp parse coding [18] focuses on generating g highly disscriminative sp parse codes in an a iterative man nner, whereas our o shrrinkage-based sparse coding g focuses on how h to shrink the spaarse codes of the fixed rain n and non-raiin dictionaries to preeserve details of o objects and avoid edge artifacts. As alreaady pointed out in [18 8], perfect mutu ual exclusivity y between rain and a ured. This reveeals that the spaarse objject structures cannot be ensu coddes of the rain and non-rain dictionaries d can n be used togeth her to represent object details and rain r structures.. Even though the parse coding is used, u similar problem, as sho own disscriminative sp in Fig. 1, can occcur. In other words, rain strucctures still rem main r Actuallly, in rrain regions orr detail may be lost in object regions. thee rain and non--rain dictionaries used in this paper have sm mall corrrelation abou ut 0.1, meaning g that the leaarned dictionarries havve discriminattion power to some extent. This is possiible beccause masked rain images arre used to extraact rain structu ures froom the rain imaages to learn th he rain dictionaary. This indicaates thaat even the disscriminative sp parse coding caan have the saame prooblem. Differeent from initiall purpose of th he discriminattive spaarse coding [26 6] used for im mage classificattion, rain remo oval meethods requiree detection and a removal. Therefore, affter dettecting rain reg gions via the discriminative d sparse coding g, it shoould be conssidered how to remove th he detected rain r
3
structurres from the raiin images, especially for the regions with similar image structur ures between raain and image layers. This leads too the necessityy of the propossed shrinkage-bbased sparse coding. Theree are major diffferences betweeen the proposeed shrinkagebased ssparse codingg and the coonventional spparse coding methodds [1,18]. Firsst, in the prooposed methood, the rain dictionaary is learned from the masked rain imagges that have informaation regardinng which pixeel positions ccontain rain structurres. Therefore, the learned raain dictionary ccan represent various types of real--life rain structtures and it cann detect rain regions from inputt rain imagees directly. IIn contrast, conventtional sparse coding methoods [1,18] reqquire a prior knowleddge about rain structures. Theerefore, the perrformance of the convventional sparrse coding methhods depends on that prior knowleddge. For exam mple, in the sparrse coding metthod [1], it is assumedd that rain struuctures can be rrepresented byy vertical and diagonaal edges withh high variatioons. Based onn this prior knowleddge, the handccrafted HOG deescriptor is useed to separate the rainn dictionary frrom the wholee dictionary. F For long and thick raain structures, tthe HOG descrriptor can sepaarate the rain dictionaary part from w whole dictionarry, as already ddemonstrated in [1]. H However, for ddifferent types of rain structuures, it is not sure whhether the handdcrafted HOG descriptor cann classify the rain dicctionary from thhe whole dictionary. To checck this point, rain im mages with variious types of rain structuress in size and Our experimennts show that shape aare selected andd then tested. O the sparrse coding methhod [1] removes rain structurres as well as object'ss structures. Thhis indicates thaat the used prioor knowledge OG descriptor is not aalways right, aand thus the haandcrafted HO can fail to separate thee rain dictionarry from whole dictionary. method [18], Similaarly, in the disscriminative spparse coding m initial raain dictionary iis created by ussing the motionn kernel with a dominnant gradient orientation off input rain im mage. In this methodd, it is assumedd that rain struuctures are closse to straight However, there are various tyypes of rain pattterns in size lines. H and shaape in the reaal world. As pointed out iin [18], this discrim minative sparse coding methood is not appliccable to rain ment, similar images with magnifieed rain drops. In our experim results are obtained.. Even thoughh object's dettails can be preserveed with the leaarned non-rainn dictionary, raain structures still rem main in the rrain regions ddue to the iniitialized rain dictionaary with imprroper prior kknowledge. Thherefore, the conventtional sparse ccoding methodds [1,18] needss to improve prior m models more roobust to varioous types of rrain patterns, whereass in the propossed method, vaarious types of real-life rain structurres can be moddeled by using the representation learning approacch [15,14] withh masked rain iimage databasee. Seconnd, in the prooposed methodd, a new shriinkage-based approacch is adopted too remedy edgee artifacts and detail loss in non-rainn regions and tto remove rainn structures in rain regions, which aare the main isssues of this ppaper, as alreaddy discussed with Fiig. 1. The prroposed shrinkkage-based spparse coding determiines how muchh the sparse coddes of the rain and non-rain dictionaaries are attenuuated in rain and non-rain rregions. The learned rain dictionarry can reconsttruct the rain ppatches, and thus thee amounts of atttenuations for the sparse codes of the rain
arXiv 1610.00386 andd non-rain dictionaries d caan be determ mined based on reppresentation errors e betweeen input raiin patches and a recconstructed rain patches. Alsso, correlation strength betweeen thee signal-atoms in the rain an nd non-rain dictionaries can be useed to determinee which sparse codes of the siignal-atoms in the nonn-rain dictionaary should be shrunk s in the raain regions. Ev ven thoough an iterativ ve-based appro oach [18] can be b used to mod dify thee sparse codees of the rain n and non-rain n dictionaries, a chaallenging non-cconvex optimizzation problem m should be solv ved, as pointed out by the authors. We W found in thee experiments that t pixxel saturation n and clippin ng artifacts appear on the raiin-removed imaages with the used u greedy pu ursuit algorithm m in [188]. Even tho ough multi-bllock alternatiing optimization tecchnique is used d instead of the greedy pursuitt algorithm, in this t casse, the discrim minative sparsee coding beco omes too slow w to connverge for truee solutions, as already mentio oned in [18]. It is thuus hard to find a desirable sollution with fastt convergence for thee non-convex optimization o prroblem. Moreo over, as indicaated in the previous paaragraph, the itterative approaach in [18] shou uld n dictionary. Ho owever, this in nitialization neeeds iniitialize the rain to be more accurrate with impro oved prior mod deling for vario ous typpes of rain patterns. T Third, more co omparisons bettween the prop posed method and a thee conventional sparse coding methods [1,18 8] are provided d in thee experimental result section to show obvious advantages of thee proposed metthod. Our expeeriments show that the propossed meethod is much stronger at thee representation n for textures and a im mage structures (e.g., face, sh hirt' line) on the t rain-remov ved im mages in comparison with th he conventionaal methods [1,,7]. Also, the proposed method is more m effective in removing rain r s image structures s than the conventio onal struuctures from similar meethods [1,7]. T This paper is th he updated version of our confference paper [2 27]. Coompared to the previous work k [27], quantitaative evaluation n is new wly added in th his paper and more m test imag ges are comparred. In addition, moree discussion abo out the proposeed method and the connventional metthods [1,7,18] are provided. III. OUR RAIN REMOVAL R MODE EL Ass mentioned in the Introductio on, our main go oal is to shrink the spaarse codes of the t rain and no on-rain diction naries. To achieeve thiis, a shrinkage map normalizeed to [0-1] willl be designed, and a theen multiplied to t the sparse codes c of the in nput rain patch hes. Ouur rain removall model is exprressed, as follows: R i x Df (α i ) Dn Dr f α in ; α ir Dn Dr α is si α in ; α ir (2)
nd the symboll of whhere f indicattes the shrinkaage function an sem micolon ( ; ) iss used to creaate a new row in the vectorr or maatrix. Our mod del uses a sim mple linear sh hrinkage function s i is a scalar f ( α i ) s i α i . Where W s value to shrink the spaarse coddes. The abovee equation show ws that the inpu ut rain patch R i x willl be replaced by b the linear co ombination of the dictionary set D and the shrunnk sparse codess α is to obtain the rain-removved pattch.
4
Fig. 2. An examplle of the rain imagge and its masked iimage.
Fig. 3.. Rain dictionary (lleft image) vs. nonn-rain dictionary (rright image).
III. PROPO OSED RAIN REM MOVAL METHOD D A. Ourr Shrinkage Strrategy The purpose of ussing the shrinkkage function in (2) is to remove the rain structuures in the rainn regions and too preserve the details of objects witthout any edgge artifacts in the non-rain regions. To achieve thhis goal, for thhe non-rain reggions, higher values sshould be assiggned to s i . Thhis makes the sparse codes of the α r and α n chaange little in thhe non-rain regiions, thereby i
i
avoidinng the edge artiffacts and preseerving the detaiils of objects. For thee rain regions, lower values should be asssigned to s i , which ccan attenuate thhe magnitude oof the sparse code α r . As a i
result, tthe rain structuures in the rainn regions can be removed. Howeveer, there is a quuestion about hhow to shrink tthe α n in the i
rain reggions. Even inn the rain regiions, the highlly correlated signal-aatoms of the raain and non-raiin dictionaries can be used jointly tto represent thhe rain structurres. In this case, the highly correlatted signal-atom ms should be reemoved in the rain regions. This can an done by shriinking some parts of the spaarse code α n i
with sm mall value of s i . In other woords, some vecctor elements of the α n , not the whhole vector α n , will be shruunk based on i
i
the corrrelation matrixx between thee two dictionaries D n and D r in tthe rain regionss. B. Rainn Image Databbase Constructtion To learrn the rain dicctionary, a coollection of raiin images is needed.. Moreover, aas mentioned in the Introduction, the maskedd images are nneeded to indiccate the rain reegions in the rain imaages. In this p aper, ninety raain images are downloaded from thhe websites, annd then the maasked images aare generated manuallly. Fig. 2 show ws the examplee of the rain im mage and the correspoonding maskeed image. The red colors in the masked image iindicate the raiin regions. By using the massked images, rain pattches are extraacted via a raandom samplinng. The total numberr of the extractted rain patchees is 15000, whhich are used
arXiv 1610.00386
5
forr the rain dicttionary learnin ng. It is thus expected e that the leaarned rain dictionary can reepresent vario ous types of rain r struuctures, which h makes it posssible to detect rain r regions. Note N thaat the non-raiin dictionary is trained by y another imaage dattabase that contains three hundred natural images without raiin structures, which w are also collected c from websites. O One of the orriginal papers that deals witth the dictionary leaarning [15] shows that saatisfactory pattch sparsity and a reppresentation acccuracy can be obtained wiith 11000 natu ural pattches. Based on n this guidelinee, 15000 rain patches are used d in thiis paper for raain dictionary learning. As more m rain imag ges witth different rain structures aree added to our rain database, the reppresentation acccuracy of the learned rain dictionary d can be strengthened. C. Offline Dictio onary Learning g Too learn the tw wo types of rain r and non--rain dictionarries ( D n m 2 K and D r m 2 K ), well-known K-SV VD dicctionary learning algorithm [15] was used d. The dictionary sizze is m 2 K 256 1024 . That is, thee used patch size s durring the diction nary learning iss 16 16 and the number of the siggnal-atoms (i.ee., the column vectors in eaach dictionary)) is 10224. For each rain and non-raain training dattabase, dictionary leaarning was con nducted separattely. F Fig. 3 shows the t example off the learned rain r and non-rrain dicctionaries. In the t images, eaach small squ uare indicates the siggnal-atoms in n the diction nary. For viisualization, the siggnal-atoms are reshaped as blo ocks. It can be observed that the shaape of the rain n dictionary is not limited on nly to vertical and a diaagonal lines, an nd thus it can represent variious types of rain r pattterns. Actually, the used diictionary learn ning is conduccted bassed on patch unit, u and thereffore, even the input i rain patch hes cann include the structures off background and objects. In conntrast, the non--rain dictionary y has more variety of patternss to reppresent all natu ural patches.
Fig. 4. Inpput rain images (leeft column) and theeir shrinkage mapss (right column).
Fig. 5. Coorrelation matrix between the signaal-atoms of the raain and non-rain dictionariies.
the whole imaage by averagging all the Step 2 : Complete th reconstrructed patches ( D r α r ), and tthen generate tthe error map i
D. Shrinkage Map M Design Too shrink the sparse code α r of the rain dicctionary, a shrink i
that inccludes the squaare of the diffference betweeen the whole image aand the input raain image at eaach pixel positiion.
maap filled with the t values of s i is first desig gned. This map p is norrmalized to [0 0-1] and it has the same size as the input rain r im mage. The shriinkage map is i generated according a to the folllowing steps: r Steep 1: Conduct the sparse coding using only the rain dicctionary for all overlapping in nput rain patch hes extracted from inpput rain imagee. The orthogon nal matching pursuit p algorithm [122,13] is used fo or the sparse co oding.
1
N N x * R T R i R T Dα r i 1 i 1 i
e j ( x *j x j ) 2
αi
whhere
p
i
subject to
αr i
0
L
(3)
indicaates the p -no orm and L is a scalar valuee to
conntrol the sparsiity. Note that the t sparse codee of α r estimaated i
witth the rain dicttionary is not same s as the on ne in (2), which h is esttimated using two t types of the non-rain and d rain dictionariies. Duuring the sparsee coding, L iss set to 3.
i
(4) (5)
The paatching averagging of the rreconstructed overlapping patches is expresseed by (4), which is deerived from N
min R i x D r α r x
min R i x D r α r
i
i
i
2 2
. In (4), T is the transpose operator
and N is the total nuumber of the ooverlapping pattches. Please refer too [24,25] for m more details oof the patch avveraging. As shown iin (4), the rainn dictionary is used to represent the input rain imaage, and thus itt is expected thhat the reconstrructed image x * can represent the rain regions, hhowever it cannnot represent map e in (5) the nonn-rain regions. For this reasoon, the error m will havve large valuess for the non-raain regions, whhereas it will have sm mall values forr the rain regioons. In (5), j indicates the
arXiv 1610.00386 pixel index in an image vector, which is not the same as the patch index i in (4). In the Step 1 and 2, the learned non-rain dictionary can be used to increase the accuracy of the error map. However, this can increase computational cost. As shown in the experimental results, only use of the learned rain dictionary can provide satisfactory results. Thus, the non-rain dictionary was excluded in designing the error map.
extracted input rain patch, and thus the shrinkage value s i should be calculated from the extracted shrinkage patch at the same pixel location. For the shrinkage of the rain sparse codes, first, the sparse coding is conducted using the dictionary set, as follows:
min α i αi
Step 3: Apply the k-means clustering algorithm [28] to the error map, and then generate the shrinkage map based on the distance ratios between the clustered centers and pixel values in the error map.
s j d cj /( d cj1 d cj 2 ) and c min( c1 , c 2 )
(6)
where s is the shrinkage map; c1 and c 2 are the cluster centers with scalar values. d cj1 and d cj 2 are the distances
0
subject to R i x Dα i m 2 ( ε ) 2
(7)
Note that the dictionary set D and the sparse code α i , not the D r and α ir , are used in (7), according to the proposed rain removal model as defined in (2). To minimize (7), orthogonal matching pursuit was used [15]. In (7), m 2 256 is the dimension of the extracted patch R i x and ε is the bounded representation error [15,25]. Discussion about how to set the bounded representation error is provided in supplementary material. Next, the shrinkage value s i is calculated, as follows:
between the j th pixel value in the error map and the cluster centers of the c1 and c 2 , respectively. In (6), c min( c1 , c 2 ) indicates the cluster center for the rain regions. Therefore, the shrinkage map will have small values for the rain regions because the pixel values of the error map for the rain regions will be close to the cluster center ( c ). On the other hand, for the non-rain regions, the shrinkage map will have higher values.
6
s i f avg ( R i s )
(8)
where s is the shrinkage map and f avg is the average function. Thus, the value of s i is the mean of the extracted shrinkage patch R i s . Given the s i and α i [α in ; α ir ] , the sparse code α r corresponding to the rain dictionary is extracted from the i
Step 4: Find the pixels with horizontal lines in the input rain image, and then assign higher values (e.g., ‘1’) to the same pixels in the shrinkage map. It can be assumed that rain structures rarely have horizontal edges. Finally, a gray dilation [15] is conducted on the shrinkage map to expand the non-rain regions. Visual saliency map [4] or DOF [17] would be considered to be added into the final shrinkage map. Also, rain regions detected by a trained classifier (e.g., support vector machine or deep learning) can be added to the final shrinkage map. More details about how to train classifiers are provided in the supplementary material. In this paper, Prewitt edge operator [29] was used to find the horizontal lines. Fig. 4 shows the example of the generated shrinkage map, according to the four steps mentioned above. In this map, it can be observed that the non-rain regions have higher intensity values, whereas the rain regions have smaller intensity values. Therefore, this shrinkage map satisfies our shrinkage strategy, as mentioned in the subsection of III.A. In Fig. 4, the shrinkage map was scaled to [0-255] for visualization. E. Shrinkage of Rain Sparse Codes Given the shrinkage map, the sparse code α r of the rain i
dictionary will be shrunk with the proposed rain removal model, as shown in (2). In other words, the sparse code α r is i
multiplied with the corresponding shrinkage value s i . However, the proposed rain removal method is conducted based on the patch unit. In other words, the index i indicates the patch extracted from the input rain image at the ith pixel location. Actually, each sparse code α r corresponds to the i
α i , and then shrunk as:
α ri s s i α ri
(9)
Given the shrinkage map, as shown in Fig. 4, the magnitude of the α r for the rain regions can be reduced via (9) because the i
shrinkage map has small values for the rain regions. Also, the value of the α r can be preserved for the non-rain regions i
because the shrinkage map has higher values in those regions. Therefore, it is expected that the rain structures can be removed and the edge artifacts can be avoided. Moreover, the image structures of objects (e.g., face or lines on the shirt in Fig. 4) can be preserved. F. Shrinkage of Non-Rain Sparse Codes Now, we move on to the shrinkage of the non-rain sparse coding α n . In rain regions, the sparse codes of rain and i
non-rain dictionaries can be used to reconstruct rain structures. Therefore, the signal-atoms of the non-rain dictionary that are highly correlated to the signal-atoms of the rain dictionary should be removed in the rain regions. However, if the sparse codes of the non-rain dictionary are forced to be zero vectors in the rain regions, over-smoothing effect can occur in the rain regions. This means that fine textures (e.g., tree leaves) in the rain regions, will be removed along with rain structures. Thus, in this paper, the signal-atoms of the non-rain dictionary that are highly correlated to the signal-atoms of the rain dictionary are removed. To achieve this, the correlation matrix is needed to know how much the signal-atoms of the non-rain and rain
arXiv 1610.00386 dictionaries are correlated. As mentioned in the shrinkage strategy in the subsection of III.A, some elements of the α n , i
not the whole vector α n , will be shrunk based on the i
correlation matrix between the two dictionaries D n and D r . The correlation matrix [30] is calculated as follows: C(k , l )
D n (:, k ) T D r (:, l ) D n (:, k ) T D n (:, k ) D r (:, l ) T D r (:, l )
(10)
where C is the correlation matrix and ( k , l ) is the index to indicate the matrix elements. D n (:, k ) and D r (:, l ) is the k th and lth column vectors of the D n and D r , respectively. The correlation matrix can measure how much the two signal-atoms, i.e., D n (:, k ) and D r (:, l ) are similar. Fig. 5 shows the correlation matrix calculated from the signal-atoms of the learned two dictionaries, which is shown in Fig. 3. The size of the matrix C is 1024 1024 because the total number of the signal-atoms in each dictionary is 1024. In the correlation matrix, the total number of the signal-atoms in the dictionary D n that have the correlation values higher than a threshold TH c 0 .8 is 42. In other words, 42 out of 1024 signal-atoms in the non-rain dictionary are similar to the ones in the rain dictionary D r . This means that some signal-atoms of the non-rain dictionary can be used to represent the rain regions with the corresponding highly correlated signal-atoms in the rain dictionary. Therefore, in the rain regions, the highly correlated signal-atoms of the rain and non-rain dictionaries should be removed at the same time. This can be done by shrinking the sparse codes of the α n and α r . In the rain region, i
i
as already mentioned in the previous section, the sparse code of the α r can be shrunk, according to (7)-(9). The remaining i
sparse code of the α n will be shrunk, as follows: i
α ni s (k ) si α ni (k ) if si TH s , α ri (l ) 0, and C (k , l ) TH c (11) where α n (k ) indicates the k th element of the vector α n . i
i
Therefore, above equation shows that some elements of the α n i
are shrunk with the value of s i . However, there are some constraints: s i TH s , α r (l ) 0 , and C ( k , l ) TH c . In i
(11), the first constraint s i TH s indicates that the input rain patch should belong to the rain regions. The second and third constraints indicate that the signal-atom D n (:, k ) should be highly correlated to the signal-atom D r (:, l ) . Also, this signalatom D r (:, l ) should have non-zero sparse coefficient, i.e., α r (l ) 0 . Consequently, by doing (9) and (11), in the rain i
regions, the influences of the highly correlated signal-atoms of the rain and non-rain dictionaries can be removed at the same time. This can lead to improvement in rain removal, especially for the rain regions. In (11), the parameters are heuristically set by TH c 0 .8 and TH s 0 .25 .
7
Algorithm I: Proposed rain removal Input: Rain image x and learned dictionary set D [ D n D r ] Output: Rain-removed image y Initialization: Generate the shrinkage map s , according to Step 1-4 Find the correlation matrix C using (10) Initialize the parameter ε , L , TH s , and TH c used in (7), (11), and (3) Initialize the vector p and the matrix Q by all zeros Proposed Rain Removal: for i 1, 2,.., N Conduct the orthogonal matching pursuit via (7) for each input patch R i x Calculate the shrinkage value s i using (8) Obtain the shrunk sparse code α irs , according to (9)
Obtain the shrunk sparse code α ins , according to (11)
Make the rain-removed patch via Dα is D n D r α ni s ; α ri s
Update the p p R Ti Dα is and Q Q R Ti R i
Conduct the patch averaging to obtain the rain-removed image via y p / diag (Q ) where diag is the function to extract the diagonal elements from a matrix and then make a column vector
end
Return y
G. Proposed Rain Removal Algorithm In Algorithm I, the proposed rain removal algorithm is summarized. After initializing the shrinkage map s , correlation matrix C , and parameters ( ε , L 3 , TH s 0 . 25 , TH c 0 . 8 ), orthogonal matching pursuit [15] is conducted for every overlapping patch R i x extracted from the input rain image x . Then, the shrinkage value of s i is calculated from the shrinkage map using (8), and the sparse codes α ir and α in are shrunk using (9) and (11), respectively. Next, the rain-removed patch is generated by the linear combination of the dictionary set D and the shrunk sparse code α is . Then, the vector p and the matrix Q are updated to save the reconstructed patches and the number of the overlapping patches. After finishing the sparse coding for all patches, patch averaging is conducted to obtain the rain-removed image y via p / diag (Q ) , where diag is the function to extract the diagonal elements from a matrix and then make a column vector. IV. EXPERIMENTAL RESULTS Three experiments will be conducted in this section. First, it will be shown how the proposed shrinkage approach for each α ir and α in can affect visual effects, and then visual comparison between the proposed method and the conventional methods [1,7,18] will be given. Finally, image quality evaluation will be conducted. The used rain image database and Matlab source code will be uploaded at the website: https://sites. google.com/site/changhwan76son/
arXiv 1610.00386 A. Visual Effectss According to o Proposed Shrrinkage Model Figg. 6(a) shows the visual efffect according to the propossed shrrinkage approaach for the sparse code α ir . In n Fig. 6(a), the left l im mage shows thee rain-removed d image with the conventio onal shrrinkage approaach ( α ir 0 ) [1], whereas the right imaage shoows the rain-rremoved image with the pro oposed shrinkaage r moodel ( s i α i ). In n the left imag ge, as already discussed in the Inttroduction, the edge artifacts are generated around the maan’s shooulder. Also, contrast c and detail d loss occu urred in the face fa reggions. Howeveer, the use of th he proposed sh hrinkage approaach moodeled with s i α ir can reducee the edge artiffacts and impro ove thee contrast and details d of the faace, as shown in i right image. As shoown in Fig. 4, the shrink maap has higher values v around the maan’s shoulder. Therefore, T in th hose regions, th he sparse codess of thee α ir can be prreserved, i.e., s i α ir α ir . Th his means that the r spaare code α i of the rain dictiionary can be used to repressent thee structures of the man’s sho oulder with ano other sparse co ode n α i of the non-raain dictionary. This T can avoid the edge artifaacts andd preserve the image structurres at the same time.
(aa)
(b b) Figg. 6. Visual effectss, according to thee proposed shrinkaage: (a) rain-remo oved imaage with the α ir 0 (left image) and d rain-removed image with the propo osed shriinkage model si α ir (right image) an nd (b) rain-removeed images without any shriinkage of the α in (left image) and rain-removed imaage after applying the prooposed shrinkage to o the α in (right im mage).
F Fig. 6(b) show ws the visual efffect according g to the use of the α in . As it is prooposed shrinkaage approach for f the sparse code c shoown in the reed boxes, it can c be observ ved that the rain r struuctures can be suppressed mo ore with the prroposed shrinkaage appproach, as deffined in (11). This reveals that t a part of the siggnal-atoms in the non-rain dictionary d can be employed d to reppresent the raain structuress (or rain regions) with the corrresponding highly h correlated signal-ato oms in the rain r
8
dictionaary. Therefore,, in the rain reggions, the highhly correlated signal-aatoms of the rrain and non-rrain dictionariees should be remove d at the same time. The prooposed shrinkaage approach for the α in based on the correlatioon matrix can remove the signal-aatoms of the non-rain dictionary that are highly correlatted to the rainn dictionary, thhereby improvving the rain removaal, especially foor the rain regioons. B. Visuual Quality Coomparison Figs. 77-12 shows the rain-rem moved images using the conventtional methodss [1,7] and the pproposed method. Here, the test imaages, as shown in Figs. 7(a), 88(a), and 9(a), were used as trainingg images for rrain dictionaryy learning. How wever, other test imaages, as shown in Figs. 10-12,, are not used as the training images.. As shown in Figs. 7(b) and 11(b), the texture removal methodd using RTV [7] can be a good solutionn if the rain structurres are fine. Hoowever, if the rrain structures aare thick, the RTV c annot discrim minate betweenn the rain andd the image structurres, and thus thhe texture remooval method cann remove the image sstructures and rain structuress simultaneously, as shown in Figs. 9(b) and 10(b)). The conventiional sparse cooding method well. Especiallly, the HOG [1] can remove the raain structures w descripttor is strong at the representaation for thick aand long rain steaks, aand thus the spparse coding m method can rem move the long rain ste aks, as shown in the red boxx of Fig. 8(c). For the long mance of the spparse coding rain steaaks, the rain reemoval perform methodd is better than the proposed method, as shown in Figs. 8(c) andd 8(d). Howevver, the HOG descriptor cann suffer from separatiing the rain ddictionary from m the whole diictionary for differennt types of rain patterns. As a rresult, the details of the tree leaves aand face are alm most removed, as shown in Fiigs. 7(c), 8(c) and 11 (c). In contraast, the proposed rain remooval method collectss the rain structtures using thee masked images, as shown in Fig. 2, and then learns the rrain structures via offline As mentioned inn Introduction,, recent trend dictionaary learning. A is to adoopt the learninng representatioon. It can be a bbetter choice to use tthe learned raiin dictionary rrather than the handcrafted HOG feeatures to reprresent various types of rain ppatterns with differennt size and shaape. Taking advvantage of the learned rain dictionaary, which cann represent the rain structures accurately, but it caannot represennt the image strructures of objeects, an error map can an be generatedd. Based on thhe shrinkage m map induced from thhe error map, tthe sparse codees of the rain aand non-rain dictionaaries can be uused to represeent the image structures in non-rainn regions. As a result, the im mage structurees (e.g., face and treee leaves) in thee non-rain regiions can be desscribed more accurateely with the pproposed methhod than the cconventional methodds [1,7]. Especially, the propoosed method cann distinguish the rainn structures froom the raindropps. This can be checked in the red boxes of the Fig. 11d and the third row w of Fig. 12, respectiively, where thhe rain structuures are removved, however the rainndrops falling oon the ground oor face can be ppreserved. In additionn, rain structuures can be m more suppresssed with the proposeed method evven though thhere are still a few rain structurres on the rain rregions. The uuse of the correllation matrix enables the highly ccorrelated signnal-atoms of tthe rain and non-rainn dictionaries tto be removed iin the rain regioons, and thus
arXiv 1610.00386
9
(aa) (b) (c) (d) Figg. 7. Resulting 'facce' images; (a) orriginal image, (b) rain-removed im mage using texturee removal [7], (c) rain-removed im mage using sparse coding [1], (d) rainn-removed image using u the proposed d method.
(c) (d) (a) (b) Figg. 8. Resulting 'treee' images; (a) original image, (b) rain-removed imaage using texture removal [7], (c) rain-removed im mage using sparse coding [1], (d) rainn-removed image using u the proposed d method.
(c) (a) (b) (d) of tile' images; (a)) original image, (b) ( rain-removed image i using textur ure removal [7], (cc) rain-removed im mage using sparsee coding [1], (d) Figg. 9. Resulting 'roo rainn-removed image using u the proposed d method.
raiin structures caan be more su uppressed. In addition, a from the ressulting images of Figs. 10-11 1, it can be said d that the learn ned raiin dictionary can detect otherr types of rain n patterns that are nott included in th he training rain n images. In this pap per, supplemen ntary materiall is additionaally proovided to checck the perform mance of the rain removal via disscriminative sp parse coding [18]. As shown n in the resulting im mages, object deetails can be prreserved, thank ks to the use of the leaarned non-rain dictionary, sim milarly to the proposed p meth hod. Hoowever, rain strructures canno ot be removed. As mentioned d in thee Introduction, the discriminaative sparse coding method [18] neeeds to initializze the rain dicttionary with a prior knowled dge aboout rain structtures. In this method, m it is assumed a that rain r struuctures are close to straig ght lines. Baseed on this prrior knowledge, the raain dictionary is i initialized by y using the motion ominant gradieent orientation of the input rain r kerrnel with a do im mage. However,, there are vario ous types of rain patterns in size s andd shape in thee real world. Therefore, T this initialization can c faiil to remove vaarious types of rain structures. M Moreover, thee discriminativ ve sparse coding method [18]
mization, as needs tto solve a chhallenging nonn-convex optim pointedd out by the auuthors. Our exxperiment founnd that pixel saturatioon and clippinng artifacts apppear on the rrain-removed images (see the suppleementary materrial). Thus, thee used greedy pursuit algorithm nneeds to be more stable and robust irrespecctive of input rain images. Even though multi-block alternatting optimizatioon technique iss used instead oof the greedy pursuit algorithm, thhe discriminatiive sparse codding method becomees too slow to converge for true solutionss, as already indicateed in [18]. Itt is hard to ffind a solutioon with fast converggence for non--convex optim mization probleem. For this reason, visual qualityy comparison aand quantitativve evaluation are exclluded in this paaper. C. Lim mitations and F Future Work off Proposed Metthod The pproposed methhod has some drawbacks. F First, in this paper, tto avoid detail lloss and edge aartifacts, the shhrinkage map was addopted. Accorrding to the shrinkage sstrategy, the shrinkagge map shouuld have largge values arouund object's boundarry. In other w words, object's boundary is classified as
arXiv 1610.00386
10
(a) (b) (c) (d) Figg. 10. Resulting 'm man' images; (a) original o image, (b)) rain-removed im mage using texturee removal [7], (c)) rain-removed im mage using sparse coding [1], (d) rainn-removed image using u the proposed d method.
(a) (b) (c) (d) Figg. 11. Resulting 'peeople' images; (a) original image, (b) ( rain-removed image i using texturre removal [7], (cc) rain-removed im mage using sparsee coding [1], (d) rainn-removed image using u the proposed d method.
nonn-rain regions.. However, thee proposed metthod is conduccted bassed on the patcch unit, and thu us the rain stru uctures around the objject's boundariies can be presserved. This caan be detected d in thee boundaries of o the man and d umbrella of Fig. F 10d and Fig. F 11d, respectively y. However, it is i not easy to solve this probllem beccause accurate boundary deteection is requireed from input rain r im mages, and then n the rain strucctures around object's o boundary shoould be removed. Second, thee representatio on accuracy of the leaarned rain dicctionary depen nds on the ussed training rain r pattches. If input rain r images co ontain rain strucctures that are not n inccluded in the trraining imagess, the proposed d method may fail f to remove the raiin structures. There are stilll rooms for improvement in the propossed meethod. As show wn in the red bo ox of the last ro ow of Fig. 12, it i is a cchallenging tassk to remove th he rain structu ures falling on the facce. One solutio on is to learn co olor dictionariees from color rain r im mages [31]. As shown s in the reed box, the rain n structures falling on face or human n body tend to have h white colo ors, and thus co olor h to remove those rain structures. Moreov ver, dicctionaries can help raiin structures can vary in size. In this paper, patch p size is fix xed, andd thus the prop posed method can fail to claassify similar rain r andd image structtures. To solvee this problem,, multi-resolution appproach [32] can n be considered d to discriminaate between imaage andd rain structurres at differentt scales. Howeever, these issu ues willl be handled in n the future wo ork. D. Quality Evalluation Too evaluate the performance p off the proposed and conventio onal meethods, blind image quality y evaluation (BIQE) ( [33] and a refference-based quality q evaluattions are consiidered for natu ural andd synthetic raain images, respectively. First, F to evalu uate nattural rain imag ges, as shown in Figs. 7-12, opinion-unawa are
methodd [33] that doess not require anny human subjeective scores for trainning is used. Thhis method moodels the naturaal statistics of the locaal structures, contrast, multtiscale decompposition, and then it m measures the ddeviation of thee distorted imaages from the referencce statistics. Inn the rain-removved images, reemaining rain structurres can be connsidered as noiise. Also, imagge structures can be rremoved after applying the raain removal. T Therefore, the BIQE m method can bee used to meaasure how well the image structurres can be prreserved and how well thee noisy rain structurres are removeed based on thhe natural imaage statistics. Table I shows the BIQE scoress for the thrree methods introducced in the preevious section.. In Table I, B BIQE scores becomee smaller whenn the natural sttatistic of the rrain-removed image aapproaches to the reference natural statisttic, which is determiined using traiining images. As shown in Table I, the proposeed method hass the lowest aaverage BIQE score. This means tthat the naturaal statistics off the rain-remooved images using thhe proposed m method are moore close to thhe reference natural statistic than tthe conventionnal methods [11,7]. Thus, it can be deducted thatt the visual quuality of the rrain-removed images with the pproposed metthod is betteer than the conventtional methodss. Also, this BIQE result conffirms that the proposeed method is eeffective at rem moving rain strructures with small annd moderate sizzes and also inn preserving objject's details. Seconnd, to evaluatee synthetic rainn images, SSIM M (Structure SIMilarrity) [34] and P PSNR (peak siggnal to noise rattio) are used. In this ppaper, to creatte synthetic raiin images, rainn patches are extracteed from naturall rain images, aas shown in Figs. 7 and 12, and thenn added to origginal clean im mages. If it is neecessary, the rain pattches are rotatedd and then added to original cclean images. Fig. 13 shows the creeated syntheticc images. As sshown in the synthetiic 'brick' and 'face sketch' images, rain patches are
arXiv 1610.00386
11
Figg. 12. Resulting im mages; original im mages (first column n), rain-removed images using textuure removal [7] (ssecond column), raain-removed imagges using sparse codding [1] (third colu umn), (d) rain-remo oved images using g the proposed metthod (last column)).
choosen to have similar edge dirrections in thee image structu ures of the horizontal and diagonal liines. For these synthetic imag ges, ow effectively y the proposed d shrinkage-bassed wee can check ho spaarse coding caan remove rain n structures and d preserve imaage struuctures, compaared to the conv ventional meth hods [1,7]. For the 'strraw' image, wee can check wheether the propo osed method using thee learned rain dictionary d can discriminate the rain structu ures froom similar textu ure patterns bu ut with differen nt edge directio ons. Thhe synthetic images, as shown n in the second d column, are also a rottated 90 degreees, and then tessted to check th he performancee of thee rain removal methods for diifferent rain dirrections. A As shown in th he shrinkage maps m (last colum mn), the propossed
methodd can classify raain structures ffrom image strructures with similar edge directionns. In the shrrinkage map oof the 'brick' image, bbrick textures are classified as non-rain reggions, which are mar arked with whhite colors, whhereas rain sttructures are classifieed as rain regioons, which aree marked with bblack colors. Similar results can be found in othher shrinkage maps of the As a result, inn the 'brick' 'straw' and 'face sketch' images. A resultinng image, the prroposed methood can preservee fine surface texturess and horizontaal lines while reemoving the rain structures. Also, ddiagonal lines oon the 'face skketch' image arre preserved, milar edge direcctions can be whereass the rain strucctures with sim remove d. For the 'straaw' image, texxtures can be ddistinguished
arXiv 1610.00386
12
Figg. 13. Experimental results for syntheetic images: origin nal clean images off the 'brick', 'straw'', and 'face sketch (first column), thee created synthetic images (second coluumn) where real-liife rain patches aree inserted into the red r boxes, rain-rem moved images withh texture removal [[7] (third column),, rain-removed imaages with sparse codding [1] (fourth collumn), and rain-rem moved images with h proposed method d (fifth column), an and the estimated shhrinkage maps witth proposed methood (last column).
Fig. 7
Fig. 8
T Texture removal [7 7]
55.404
44.860
TABLE I. BLIND IMAGE I QUALITY EV VALUATION Figg. 12 Fig. 12 Fig.. 9 Fig. 10 Fig. 11 (1st row) (2nd row) 45.159 40.002 51..882 49.643 539 52.5
Sparse coding [1]]
65.582
44.516
59.812
39.815
34.835
63..987
Proposed method d
44.472
30.691
55.703
40.694
29.282
46..410
TABLE III. PSNR EVALUATIION Face skeetch Face skeetch (90)
Fig. 12 (3th row) 44.645
F Fig. 12 (44rd row) 449.343
Fig. 122 (5th row w) 48.7922
51.499
53.055
411.4876
59.3288
51.391
39.192
37.310
337.894
56.8622
41.851
AVG. 48.226
Brick k
Brick (9 90)
S Straw
Straaw (90)
AVG.
22.8612
22.628 84
23.250 03
22.66543
244.3136
244.8790
23.431
Sparse coding [1]
22.1515
22.203 31
23.293 32
23.3 050
233.4263
222.1586
22.756
Proposed method d
23.9983
25.251 18
27.061 18
26.44739
255.2232
255.7074
25.619
Brick k
Brick (90)
Face skeetch
Face skeetch (90)
S Straw
Str traw (90)
AVG.
0.8157 7
0.805 58
0.7111
0.67787
0..7494
00.7761
0.7561
Sparse coding [1]
0.8096 6
0.811 15
0.7369
0.73364
0..7536
00.6650
0.7521
Proposed method d
1 0.8911
0.920 02
0.8831
0.88885
0..8285
00.8488
0.8767
T Texture removal [7]
TABLE III. SSIM EVALUATI TION T Texture removal [7]
n structures butt with differen nt edge directio ons. froom similar rain In contrast, the conventional sparse coding g [1] and texture moval method ds [7] remove the rain strucctures as well as rem im mage structures,, as shown in blue b boxes. Forr example, bricck's surrface textures, straw's texttures, and diaagonal lines are rem moved. In the case of the sp parse coding method m [1], facce's dettails and contraasts are decreassed. Also, the rain r structures are nott removed com mpletely for thee 'brick' and 'strraw' images. This T inddicates that the t HOG deescriptor used d in [1] cann not disscriminate betw ween image strructures and rain structures with w sim milar edge direcctions. Thus, th he rain dictionaary part cannott be acccurately separaated from the whole w dictionarry. Also, the RT TV meeasure used in [7] [ cannot distiinguish rain strructures from fine f texxtures. Thus, fiine textures aree removed on the rain-remov ved im mages.
Tablees II and III sshow the PSNR R and SSIM sscores of the conventtional and propposed methods for the synthettic images. In Tables III and III, rounnd brackets indiicate the rotatioon. As shown in thesee tables, the aaveraged PSNR R and SSIM sscores of the proposeed method aree higher than those of the cconventional methodds. This indicaates that the pproposed methhod is much strongerr in representiing object's details and textures on the rain-rem moved images tthan the convenntional methodds [1,7]. Also, this resuult shows that the proposed m method is moree effective at removinng rain structurres from similaar image structtures. V. CONCLUSSIONS A new rrain removal m model based on the shrinkage of the sparse codes iss introduced inn this paper. Diirect use of thee learned rain and nonn-rain dictionarries can generaate unwanted eedge artifacts
arXiv 1610.00386 and detail loss. This observation brought us to develop a new shrinkage-based sparse coding for rain removal. To realize this, in this paper, shrinkage map and correlation matrix were generated based on the learned rain and non-rain dictionaries. The shrinkage map can make the sparse codes of the rain and non-rain dictionaries change little in the non-rain regions, thereby avoiding edge artifacts and detail loss. In the rain regions, the correlation matrix can find the signal-atoms of the non-rain dictionary that are highly correlated to the ones in the rain dictionary so that the sparse codes corresponding to the non-rain dictionary can be shrunk in the rain regions. This leads to improvement in the rain removal, especially for the rain regions. Experimental results showed that the proposed rain removal model is good at preserving image structures and removing rain structures. Moreover, it is expected that the proposed rain removal model can be directly applied to snow removal if a snow image database is provided. REFERENCES [1] [2] [3]
[4]
[5]
[6] [7] [8]
[9]
[10] [11]
[12]
[13]
L.-W. Kang, C.-W. Lin, and Y.-H. Fu, "Automatic single-image-based rain steaks removal via image decomposition," IEEE Transactions on Image Processing (TIP), vo. 21, no. 4, pp. 1742-1755, Apr. 2012. D. G. Lowe, "Distinct image features from scale-invariant key points," International Journal of Computer Vision (IJCV), vol. 60, no. 2, pp. 91-110, Nov. 2004. N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, Jun. 2005, pp. 886-893. R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, "Frequency-tuned salient region detection," in Proc. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, Jun. 2009, pp. 1597 – 1604. C. Chen, A. Seff, A. Kornhauser, and J. Xiao, "DeepDriving: Learning affordance for direct perception in autonomous driving," in Proc. IEEE International Conference on Computer Vision (ICCV), CentroParque, Chile, Dec. 2015, pp. 2722-2730. R. T. Tan, "Visibility in bad weather from a single image," in Proc. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, Jun. 2008, pp. 1-8. L. Xu, Q. Yan, Y. Xia, and J. Jia, "Structure extraction from texture via relative total variation," ACM Transactions on Graphics (TOG), vol. 31, no. 6, article no. 139, Nov. 2012. J. Fehrenbach, P. Weiss, and C. Lorenzo, "Variational algorithms to remove stationary noise: applications to microscopy imaging," IEEE Transactions on Image Processing (TIP), vol. 21, no. 10, pp. 4420–4430, Oct. 2012. L. Gomez-Chova, L. Alonso, L. Guanter, G. Camps-Valls, J. Calpe, and J. Moreno, "Correction of systematic spatial noise in push-broom hyperspectral sensors: application to chris/proba images," Applied Optics, vol. 47, no. 28, F46-F60, Oct. 2008. P. Escande, P. Weiss, W. Zhang, "A variational model for multiplicative structured noise removal," Journal of Mathematical Imaging and Vision, doi:10.1007/s10851-016-0667-3, 2016. J. L. Stark, M. Elad, and D. L. Donoho, "Image decomposition via the combination of sparse representation and a variational approach," IEEE Transactions on Image Processing (TIP), vol. 14, no. 11, pp. 2675-2681, Oct. 2005. J.-H. Kim, C. Lee, J.-Y. Sim and C.-S. Kim, "Single-image deraining using an adaptive nonlocal means filter," in Proc. IEEE International Conference on Image Processing (ICIP), Melbourne, VIC, Sept. 2013, pp. 914-917. S.-C. Pei, Y.-T. Tsai, and C.-Y. Lee, "Removing rain and snow in a single image using saturation and visibility features," in Proc. IEEE International Conference on Multimedia and Expo Workshops (ICME), Chengdu, July 2014, pp. 14-18.
13
[14] Y. Bengio, A. Courville, and P.Vincent, "Representation Learning: review and new perspectives," IEEE Transactions on Patten Analysis AND Machine Intelligence (PAMI), vol. 35, no. 8, Aug. 2013. [15] M. Aharon, M. Elad, and A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation," IEEE Transactions on Signal Processing (TSP), vol. 54, no. 11, pp. 4311-4322, Nov. 2006. [16] Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning," Nature, vol. 521, pp 436-444, May 2015. [17] D.-Y. Chen, C.-C. Chen, and L.-W. Kang, "Visual depth guided color image rain streaks removal using sparse coding," IEEE Transactions on Circuits and Systems for Video Technology (CSVT), vol. 24, no. 8, pp. 1430-1455, Aug. 2014. [18] Y. Luo, Y. Xu, and H. Ji, "Removing rain from a single image via discriminative sparse coding," in Proc. IEEE International Conference on Computer Vision (ICCV), Santiago, Dec. 2015, pp. 3397-3405. [19] D. Eigen, D. Krishnan, and R. Fergus, "Restoring an image taken through a window covered with dirt or rain," Proc. IEEE International Conference on Computer Vision (ICCV), Washington, Dec. 2013, pp. 633-640. [20] X. Zhang, H. Li, Y. Qi, W. K. Leow, and T. K. Ng, "Rain removal in video by combining temporal and chromatic properties," in Proc. IEEE International Conference on Multimedia and Expo (ICME), Toronto, July 2006, pp. 461-464. [21] J. Bossu, N. Hautiere, and J.-P. Tarel, "Rain or snow detection in image sequences through use of a histogram of orientation of steaks," International Journal of Computer Vision (IJCV), vol. 93, no. 3, pp. 348-367, July 2011. [22] P. C. Barnum, S. Narasimhan, and T. Kanade, "Analysis of rain and snow in frequency space," International Journal of Computer Vision (IJCV), vol. 86, no. 2-3, pp. 256-274, July 2010. [23] K. Garg and S.K. Nayar, "Detection and Removal of Rain from Videos," in Proc. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, June 2004, pp. 528-535. [24] C.-H. Son and H. Choo, "Local learned dictionaries optimized to edge orientation for inverse halftoning," IEEE Transactions on Image Processing (TIP), vol. 23, no. 6, pp. 2542-2556, Apr. 2014. [25] M. Elad and M. Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," IEEE Transactions on Image Processing (TIP), vol. 15, no. 12, pp.3736-3745, Dec. 2006. [26] Z. Jiang, Z. Lin, and L. S. Davis, "Learning a discriminative dictionary for sparse coding via label consistent K-SVD," in Proc. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, June 2011, pp. 1697-1704. [27] C.-H. Son and X.-P. Zhang, "Rain removal via shrinkage of sparse codes and learned rain dictionary" in Proc. IEEE International Conference on Multimedia and Expo Workshop (ICME), Seattle, July 2016. [28] S. P. Lloyd, "Least squares quantization in PCM," IEEE Transactions on Information Theory (TIT), vol. 28, no. 2, pp. 129–137, Mar. 1982. [29] R. Gonzales and R. Woods, Digital image Processing, Prentice Hall, 2002. [30] C.-H. Son, H.-M. Park, and Y.-H. Ha, "Improved color separation based on dot-visibility modeling and color mixing rule for six-color printers," Journal of Imaging Science and Technology (JIST), vol. 55, no. 2, pp. 010505–010505-16, Jan.-Feb. 2011. [31] J. Mairal, M. Elad, and G. Sapiro, "Sparse representation for color image restoration," IEEE Transactions on Image Processing (TIP), vol. 17, no. 1, pp. 53 - 69, Jan. 2008 [32] B. Ophir, M. Lustig, and M. Elad, " Multi-scale dictionary learning using wavelets," IEEE Journal of Selected Topics in Signal Processing (JSTSP), vol. 5, no. 5, pp. 1014 - 1024, Sept. 2009. [33] L. Zhang, L. Zhang, and A. C. Bovik, "A feature-enriched completely blind image quality evaluator," IEEE Transactions on Image Processing (TIP), vol. 24, no. 8, pp. 2579 - 2591, Aug. 2015. [34] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE Transactions on Image Processing (TIP), vol. 13, no. 4, pp. 600-612, April 2004.
arXiv 1610.00386
Supplem S mentary y Materia al 1. Expe erimental re esults of th he discrimiinative spa arse coding g method [ 18] s coding method [1 8] has been tested. The open sourcee code at the author’s The disccriminative sparse website (Hui Ji) has been downlloaded, and tthen tested with w adjusting g some param meters. The resulting images aare given bellow,
(a)
(b)
(d)
(c)
(e)
(f)
ulting images with the discrriminative spaarse coding meethod [18] Fig. 14. Resu
(a)
(b)
r image and (b) Resultinng image with the discrimin native sparse ccoding method d [18] Fig. 15. (a) Input rain
wn in Fig. 14, 1 object details d can bbe preserved, thanks to the use of tthe learned non-rain As show dictionarry, similarlyy to the pro oposed methhod. Howev ver, rain stru uctures cannnot be remo oved. As mentionned in the preevious page,, the discrim minative sparse coding method m [18] nneeds to initiialize the rain dicttionary with a prior kno owledge abouut rain structtures. In thiss method, it is assumed that rain
arXiv 1610.00386
structurees are close to straight liines. Based oon this priorr knowledge, the rain dicctionary is in nitialized by usingg the motionn kernel with h a dominantt gradient oriientation of input i rain im mage. Howev ver, there are varioous types of rain patternss in size andd shape in thee real world.. Therefore, tthis initializaation can fail to reemove variouus types of raain structuress, as shown in i Fig. 14. Moreover, the discriminative sparsee coding meethod [18] needs to solvve a challeng ging nonconvex optimizationn, as pointed d out by the authors. Ou ur experimen nt found thatt pixel saturaation and r d images. Fo or example, in the red box ox of Fig. 15((b), pixel clippingg artifacts apppear on the rain-removed brightneess becomes so dark, com mpared to thhe input rain n image of Fig. F 15(a). Th Thus, the useed greedy pursuit aalgorithm neeeds to be more m stable annd robust irrrespective off input rain iimages. Even though multi-block alternatiing optimizaation techniquue is used in nstead of the greedy purssuit algorithm m, in this c methhod becomess too slow to o converge ffor true solu utions, as case, thee discriminaative sparse coding already indicated inn [18]. It is thus hard too find a solu ution with fast f convergeence for non n-convex m. optimizaation problem In conclusiion, as show wn in resultinng images off new experiiments, the ddiscriminativ ve sparse coding m method can fail to remo ove various ttypes of rain n patterns, due d to the ussed initializaation and greedy ppursuit algorrithm. On thee other handds, our paperr shows a new w shrinkagee-based sparsse coding for rain removal, whhich is signifficantly diffeerent from th he discriminative sparse coding meth hod. It is i superior to o the discrim minative sparsse coding shown thhat our shrinnkage-based sparse codinng approach is method, as shown inn the resulting g images. e comparis sons betwe een the pro oposed metthod and conventionaal methods s [1,7] 2. More
arXiv 1610.00386
Fig. 16. R Resulting imaages; original rain images ((first column)), rain-removeed images usiing texture rem moval [7] (second ccolumn), rain-removed imaages using spaarse coding [1] (third colum mn), (d) rain--removed imaages using the propoosed method (last ( column).
3. Para ameter setting of the bounded b re epresentation error The bouunded represeentation erro or ε , as show wn in (7), caan be set man nually or adaaptively, according to input raiin images. This T bounded d representattion error is needed to control the spparsity, i.e., αi 0 . In the imagge denoising method usin ng sparse codding [25], ε is related to noise leveels. In other words, if the noisse level is loow, ε is sett with a smaaller value to o preserve details d of thee reconstructted patch ous, ε is seet with a high her value during thhe sparse codding. On the contrary, if tthe inserted noise is serio to looselly reconstrucct the input patch. p Similaarly, we can guess that ε is related to the amou unt of the rain struuctures. In thhe case of raain removal,, the amountt of the rain structures ccan be defineed as the average value of thee absolute of the spatiall gradients th hat are calcu ulated from tthe rain regiions, i.e.,
si TH H s . This averraged gradiennt value can be mapped to t the manuaally tuned boounded representation error ( ε ) via a lineaar function. Certainly, otther fitting methods m (e.g. regression)) can be used. In our experim ment, the fittinng function is given by
ε 90 .7441 f avgg x x y x 0 .110 07 3
(12)
mn vectors filled f with vertical v and horizontal gradients, g where x x and y x indicattes the colum respectivvely. If the input i image is normalizeed to [0-1], ε should be scaled by 1/255. In (12), if the absolutee gradient average is less than 0.1107 (i.e., minim mum amount of rain structtures), ε iss set by 3. ‘Prewitt’ edge operattor was used d to calculatee the gradientts.
arXiv 1610.00386
4. Additional use of classifiers for final shrinkage map design Rain regions predicted with a trained classifier, for examples, support vector machine (SVM) or deep convolutional neutral network (DCNN) can be additionally used to generate the final shrinkage map. In the proposed method, masked rain images are used to learn the rain dictionary. Therefore, rain features can be extracted via HOG descriptor or convolutional neutral network (DCNN) from the masked rain images. Similarly, from natural images without rain structures, non-rain features can also be extracted. Then, SVM or DCNN can be trained with the two types of rain and non-rain feature sets. Next, given an input rain image, rain and non-rain regions with different binary labels are predicted with trained classifiers (SVM or DCNN) based on patch unit, and then the predicted binary 'label' map is averaged with the final shrinkage map.
Matlab functions ('extractHOGFeatures' and 'fitclinear') can be used to collect HOG features from images and train the support vector machine, respectively.
MatConvNet (http://www.vlfeat.org/matconvnet/) can be used to learn the CNN.