A new clothing image retrieval algorithm based on

7 downloads 0 Views 2MB Size Report
words. Besides, all clothing images in the database need to be annotated beforehand, which cause the ... search clothing image by using mobile visual sensors.
Computational Intelligence and Data Fusion in Wireless Multimedia Sensor Networks - Research Article

A new clothing image retrieval algorithm based on sketch component segmentation in mobile visual sensors

International Journal of Distributed Sensor Networks 2018, Vol. 14(11) Ó The Author(s) 2018 DOI: 10.1177/1550147718815627 journals.sagepub.com/home/dsn

Haopeng Lei1 , Yugen Yi2, Yuhua Li3, Guoliang Luo4 and Mingwen Wang1

Abstract Nowadays, the state-of-the-art mobile visual sensors technology makes it easy to collect a great number of clothing images. Accordingly, there is an increasing demand for a new efficient method to retrieve clothing images by using mobile visual sensors. Different from traditional keyword-based and content-based image retrieval techniques, sketch-based image retrieval provides a more intuitive and natural way for users to clarify their search need. However, this is a challenging problem due to the large discrepancy between sketches and images. To tackle this problem, we present a new sketch-based clothing image retrieval algorithm based on sketch component segmentation. The proposed strategy is to first collect a large scale of clothing sketches and images and tag with semantic component labels for training dataset, and then, we employ conditional random field model to train a classifier which is used to segment query sketch into different components. After that, several feature descriptors are fused to describe each component and capture the topological information. Finally, a dynamic component-weighting strategy is established to boost the effect of important components when measuring similarities. The approach is evaluated on a large, real-world clothing image dataset, and experimental results demonstrate the effectiveness and good performance of the proposed method. Keywords Sketch-based clothing image retrieval, component segmentation, mobile visual sensors

Date received: 22 May 2018; accepted: 20 October 2018 Handling Editor: Xiangjian He

Introduction Clothing image retrieval has great research and commercial value due to the popularity of e-commerce applications. However, traditional keyword-based search methods could not effectively address the visual search and satisfy the user’s requirements. Keywordbased search methods can be cumbersome because many visual characteristics of clothes are too difficult or impractical to describe verbally using a few of keywords. Besides, all clothing images in the database need to be annotated beforehand, which cause the retrieval accuracy of keyword-based retrieval method influenced by a multiplicity of different factors, that is, annotators’ language, culture, habit, and so on. An alternative

way is content-based clothing image retrieval methods which require users to supply a clothes example image for retrieving visually similar images from the database. Although content-based retrieval method can provide a 1

School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China 2 School of Software, Jiangxi Normal University, Nanchang, China 3 Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou, China 4 East China Jiaotong University, Nanchang, China Corresponding author: Yugen Yi, School of Software, Jiangxi Normal University, Nanchang 330022, China. Email: [email protected]

Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/ open-access-at-sage).

2 visual assistance for the user, this query way is limited for practical applications because user probably does not have a suitable example image at hand. Therefore, a more intuitive and effective way to help these users find clothing images meeting their personal requirements and preferences is highly desired. With the rapid development of visual sensors technology, some new kinds of mobile sensors such as smart cameras, mobile phones, tablets, and drones are appearing nowadays.1–3 Through these mobile sensors, users can connect directly to the Internet and conveniently store and collect massive image data from various sources.4 Moreover, these special mobile sensors, for example, smart phones and tablets, have friendly human–computer interface where users can easily control the mobile sensors by touching or sketching on the screen, well suited even for who are not computer savvy. Therefore, sketching has become easy and ubiquitous, and we can sketch on different kinds of mobile sensors, such as sketching on the mobile phones and cameras even on the smart watches. More importantly, this effective human–machine communicative way can provide us a natural and user-friendly tool to query clothing images from a huge image repository. Considering in a realistic scenario, a user, for example, clothes buyer or fashion designer, needs to visually search on the Internet or Cloud servers to confirm whether there are suitable clothes that meet his needs. Through using these mobile sensors, he can simply draw a query sketch on the interface and promptly upload to the Internet or Cloud servers to retrieve similar images from a large collection of clothes image database. In another words, the mobile visual sensors can help users both capture and query images. Besides, many potential applications of sketch-based clothing image retrieval can be considered, such as aiding fashion designers to inspire new clothes styling ideas, checking violation of existing clothes style design rights, and looking for possibility of reusing existing clothes parts. For these reasons, there is a growing interest for the research community to develop an effective sketch-based clothing image retrieve system to search clothing image by using mobile visual sensors. Although sketching in mobile sensors is an expressive and informative way to express people’s search intention, it is still a great challenge to realize an effective sketch-based image retrieval (SBIR) system. First, it is inherently ambiguous to represent clothes using sketches since a sketch is a coarse representation only containing partial information of a clothes image. Second, sketches are naturally lack of visual cues (e.g. without color and texture), which consequently make applications of traditional image-oriented algorithms nontrivial.5 Moreover, the same clothes image can be drawn in varied levels of abstraction and deformation owning to different sketching styles of users, which

International Journal of Distributed Sensor Networks further affect the performance of the retrieval methods. To overcome these challenges, researchers have proposed various SBIR methods. Unfortunately, the existing retrieval methods6–9 generally follow the conventional content-based image retrieval paradigm, which directly compare sketch with image by adopting some traditional image-oriented feature descriptors (e.g. histogram of oriented gradients (HOG), ScaleInvariant Feature Transform (SIFT), and shape context). These retrieval methods could not effectively address the search demands in most cases for clothing images. Different from ordinary images, clothing images consist of some basic components, that is, collar, sleeve, upper outer garment, and clothing accessories. Although clothing images are varied in different appearance and styles, component structure analyzing can provide an effective way to capture the most important characteristic of a clothing style. This is because clothing components not only distinguish different clothing categories but also describe fine-grained details of clothes. In this work, we exhibit a new sketch-based method for clothing image retrieval based on component segmentation of query sketches. At the off-line stage, we first collect a large scale of clothing sketches and clothing images as training dataset. Since our goal is to find the most similar clothing images to a query sketch, we create an image–sketch pair for each sketch. And then, we manually tag with semantic component labels for each image–sketch pair and employ conditional random field (CRF) model10 to train a classifier based on tagged component labels. At the online stage, we can use the trained CRF classifier to segment a new query sketch into different components so that similarities can be calculated between sketch components and corresponding clothing image components. Furthermore, a dynamic component-weighting strategy is designed to improve retrieval accuracy. Finally, we can achieve a fine-grained clothing image retrieval results for the query sketch. Extensive experiments are conducted to validate the effectiveness of our proposed retrieval algorithm, which can achieve a higher retrieval accuracy comparing with other traditional SBIR methods. The rest of this article is organized as follows. Section ‘‘Related work’’ discusses the related work. Section ‘‘Proposed method’’ describes the details of the proposed methodology. In section ‘‘Experiments and discussion,’’ the experimental results are discussed, while in section ‘‘Conclusion,’’ conclusions are drawn.

Related work Currently, various types of mobile visual sensors such as cameras, mobile phones, and drones have been

Lei et al. released and applied to many different areas.1,2,11–15 A huge number of different kinds of images have been collected by these sensors. Thus, how to assist user retrieve relevant images for some domain-specific applications is urgently demanded. Among these kinds of images, clothing images are the most commonly used in our daily life. According to the different types of queries for users, clothing image retrieval can be classified into three categories: keyword-based methods, content-based methods, and sketch-based methods. Keyword-based query methods have been widely used in practical. However, query-by-keywords can be cumbersome because many visual characteristics of an image are hard or impractical to be described verbally using just a few words. Moreover, all candidate images need to be pre-annotated, which can cause a large labor cost and is influenced by a multiplicity of different factors, such as annotators’ language, culture, and habit. Content-based retrieval methods expect users to supply an example image for retrieving visually similar images from a database. The performance of a contentbased retrieval methods largely depends on the image features extracted from search candidate images.16 Park et al.17 used the color information to characterize an image in their retrieval system. Xu et al.18 proposed a novel shape descriptor called contour flexibility to match different images. Han and Ma19 used a Gabor representation for texture image retrieval. Philbin et al.20 proposed a new content-based image retrieval method by using an enhanced Bag-of-Words (BOW) model to learn feature descriptors. Zhang et al.21 utilized spatial information into BOW model. Although there is a huge volume of literature investigating content-based image retrieval methods, one key limitation of this kind of approach is that it is difficult for users to have an appropriate example query image at hand. However, with the popularity of touch-screen mobile devices, how to use human-drawn sketch to retrieve different medias (e.g. image, video, threedimensional (3D) model) has become an active research area.22–24 Sketch-based retrieval methods allow users to submit a sketch drawing illustrating their search intent. Such a way of user query intent expression provides users much freedom to specify and clarify their search need. Thus, how to develop efficient feature descriptors is the most important part for sketch-based retrieval methods, and several sketch-based retrieval algorithms have been proposed recently. Chen et al.25 presented a system named Sketch2Photo that composed a realistic picture from a simple freehand sketch annotated with text labels. The system can automatically select suitable photos from Internet and seamlessly stitch them to a new photo according to the sketch and text labels. Cao et al.26 proposed the MindFinder system, which is an interactive sketch-based image search engine for

3 million-level database. Cao et al.8 also proposed a novel index structure and the corresponding raw contour-based matching algorithm to calculate the similarity between a sketch query and natural images. Sun et al.27 built a system to support query-by-sketch for two billion images, and the raw edge pixel and chamfer feature are selected as the basic representation and matching in this system. Zhou et al.28 proposed a real-time image retrieval system, which allows users to search target images whose objects are similar to the query in contour. Eitz et al.7 developed new descriptors based on the bag-of-features approach and introduced a benchmark for evaluating the performance of largescale SBIR systems. Hu and Collomosse6 adopted a Bag-of-Visual Words (BoVW) approach, incorporating Gradient Field HOG (GF-HOG) descriptor for SBIR. Liu et al.24 proposed an interactive sketch-based interface to organize video clips, enabling common users to easily draw simple sketches, and these sketches are automatically converted from keyframes of videos. Eitz et al.22 proposed a new sketch-based 3D model retrieval system based on bag-of-features framework. They used Gabor filter as feature extraction tool for encoding information of input sketch and two-dimensional (2D) views of 3D models. They also proposed the first large-scale exploration of human sketches,29 and this dataset contains 20,000 unique sketches evenly distributed over 250 object categories. Saavedra and Bustos9 proposed an SBIR approach based on Learned KeyShapes (LKS); the structure of the objects from the image can be represented by using the sketch token to detect KeyShape. Wang et al.30 presented a deep convolutional neural networks (DCNN) to learn deep feature for SBIR. Very recently, Li et al.31 and Yu et al.5 proposed fine-grained SBIR to search shoes on the Internet. Li et al.31 defined a taxonomy of 13 discriminative attributes commonly possessed by shoes and addressed the fine-grained SBIR challenge by identifying discriminative attributes and parts. They developed a strongly supervised deformable part-based model to reduce semantic gap between sketch and image. Yu et al.5 proposed a deep triplet ranking model for finegrained SBIR, and they also contributed two new datasets, one for shoe sketches and the other for shoe images. These two works inspire us to develop a new clothing image retrieval method based on sketch input. Despite the huge volume of literature on generalpurpose SBIR methods, the particular problem of sketch-based clothing image retrieval is studied much less intensively. Different from general-purpose SBIR efforts, clothing image retrieval has its own demands. The goal of clothing image retrieval is to identify and rank relevant clothing images from a large-scale database. Yang et al.32 proposed a data-driven framework for clothing image co-parsing and segmentation. Liu et al.33 proposed a clothing retrieval method that users

4 can take a photo of any fashionably dressed lady with mobile device and system can parse the photo and search for the clothing with similar styles from online shopping websites. Huang et al.34 proposed the Dual Attribute-Aware Ranking Network which simultaneously integrates the attributes and visual similarity constraint into the retrieval feature learning, and given a clothing image, the method can retrieve the same or attribute-similar clothing items from online shopping stores. Mizuochi et al.35 developed a clothing retrieval system considering local similarity, where users can retrieve their desired clothes which are globally similar to an image and partially similar to another image. Since most of salient visual features for clothing images can be extracted from components, our work is also inspired by some natural image segmentation and 3D model segmentation methods. Wang et al.36 proposed convolutional neural network to segment images. Jang and Kim11 proposed a strategy to extract multiple features representing neighboring circumstances from the input images. Shi et al.3 presented an affine invariant method to produce dense correspondences between uncalibrated wide baseline images. Huang et al.37 presented a data-driven approach to segment sketch. This method performed segmentation and labeling simultaneously, by inferring a structure that best fits the input sketch, through selecting and connecting 3D components in the database. Kalogerakis et al.38 present a data-driven approach to segment 3D meshes. They apply graphical models to the segmentation and labeling of 3D meshes and find the global most probable configuration by optimizing a CRF objective function. Liu et al.39 proposed an effective 3D garment synthesis method that utilizes the style of garments components to create new 3D garment model automatically. Schneider and Tuytelaars40 proposed sketch segmentation by using CRF to find the most probable global configuration. They regarded sketch as a graph where the vertices are the strokes and the edges are the relations between them. CRF is used to describe probabilities in a graph. Different from their method with only adopting SIFT descriptors during segmentation, our work defines and employs different kinds of feature descriptors to find a more accurate segmentation boundary. In summary, no specialized applications for sketchbased clothing image retrieval have been exhibited in literature so far. According to the characteristics presented in the clothing images, we seek an effective and efficient sketch-based method to solve the issue of the mobile sensor–based clothing retrieval. The method is based on clothing components segmentation and topological structure analysis. Clothing components are capable of distinguishing different clothing categories

International Journal of Distributed Sensor Networks as well as capturing characteristics of different clothing styles. We employ a supervised learning method to segment clothing image and sketch into meaningful components, and a group of global and local features descriptors are used to describe detailed information of clothing components. Moreover, a dynamic component-weighting strategy is established to boost the effect of important components when measuring similarities. Various experiments on a real-world clothing image database demonstrate the good performance of the proposed retrieval method.

Proposed method Overview In this section, we describe our new sketch-based clothing image retrieval method through query sketch component segmentation. Figure 1 illustrates the framework on the whole execution process of the newly proposed clothing image retrieval paradigm. The new algorithm consists of feature database with an index structure which is created off-line and an online search engine with a sketch-based user interface. We outline the proposed sketch-based clothing image retrieval approach in the following steps: 1.

2.

3.

4.

During off-line stage, we first collect two large scales of datasets, one for clothing sketches and the other for clothing images, and we classify them into different categories. Then, we create image-sketch correspondence pairs for training. For the training dataset, we manually assigned a set of component labels to each clothing image– sketch pair. Then, we design two kinds of feature descriptors: unary features and pairwise features to capture information for each labeled component. After that, we adopt CRF model to train the labeled dataset. After training, when a user draws a sketch for retrieval, the learned model can segment the query sketch into different components. Both of global and local features of each component of query clothing sketch are extracted during online stage, and these two types of features are combined into the composite feature descriptors. For the clothing image in the database, the same composite feature descriptors are also extracted and stored into feature indexing database. When measuring similarity between query clothing sketch and clothing image, we compute their corresponding components by using extracted composite feature descriptors. Moreover, we make use of the relationships between

Lei et al.

5

Figure 1. The framework of our proposed sketch-based clothing image retrieval method.

components to achieve topological structure of clothing sketch to improve retrieval accuracy. Finally, the retrieved clothing images are returned to the user.

Data collection Although clothing sketch and clothing image are from different domains, clothing sketch is a simple line drawing which has a high level of abstraction and clothing image contains a wealth of color and texture information; the notion of subparts of the clothes is always the same. For instance, a shirt is composed of collar and sleeve, and a pant must be drawn pant leg. If a pair of clothing sketch and image represents a same category of clothes, then they should have the same components. Thus, analyzing component structure can provide an effective way to capture the most important characteristic of a clothing style and bridge the semantic gap between clothing sketch and image. To segment clothing sketch into different components, we collect two large scales of datasets: one for clothing sketches and the other for clothing images. So far, there are no specialized dataset, and benchmarks for sketch-based clothing image retrieval have been exhibited in literature. Though several famous sketch datasets26,29,31 such as TU Berlin,29 which contains 20,000 hand-drawn sketches and 250 different categories, have been proposed, this sketch dataset is used for general-purpose image search tasks and does not

contain any kinds of clothing sketch. Therefore, we built our own image and sketch dataset for training and evaluating the performance of our proposed algorithm. As clothes have a wide variety of different categories and each clothes’ category has its unique style and characteristic, in an ideal scenario, a user can simply draw a sketch to promptly retrieve similar images from a large collection of clothing image database through SBIR system. Thus, the collected clothing sketch dataset should cover most of the clothes categories that can be seen on the market. To construct clothing image repository used in our experiments, we used different mobile visual sensors, that is, mobile phones, smart cameras, and drones to take photos of clothing images from different shopping malls; some images are crawled from several popular e-commerce websites, such as Amazon.com, tmall.com, JD.com, and vip.com. In total, we have collected more than 15,000 clothing images, and all images in this database were grouped into 32 main clothing categories, including T-shirt, skirt, dress, coat, suit, and jeans. Each main category may contain multiple clothing subcategories, such as Tshirt can be divided into long-sleeve T-shirt, shortsleeve T-shirt, and sleeveless shirt. Generally speaking, we categorize the collected clothing images according to the classification criteria from the popular ecommerce websites mentioned above. However, note that our clothing image datasets are mainly used for sketch-based retrieval, so these images were not categorized based on colors, for example, black T-shirt and white T-shirt. Although it is impossible to cover all the

6

International Journal of Distributed Sensor Networks

Figure 2. Some examples of clothing sketches and images collected in database, and all pictures are transmited to the cloud-serve through mobile sensors.

categories of clothing images, our collected clothing dataset basically covers the most representative clothing categories. For the clothing sketches used in our experiments, we recruit some volunteers to draw different categories of clothing sketches by using mobile sensors (e.g. mobile phones, smart cameras, and tablet). The volunteers include professional garment designers, nonexpert college students, teachers, and house wives. We provide some clothing image examples and some keywords (e.g. short-sleeve skirt) to a volunteer. We let volunteers watch image examples for 20 s. After then, we take away the images and ask volunteers to draw sketch according to the given keyword. While finishing sketching, we send the drawn sketches to the cloudserver for storage through mobile sensors. We also searched some sketches from the Internet by inputting some keywords phrase such as ‘‘clothing category + sketch’’ (e.g. ‘‘dress sketch’’). Overall, the dataset contains 5320 clothing sketches. We classified them as the same categories as the collected images. Thus, one sketch can correspond to more than two clothing images when we create clothing image–sketch pairs for component segmentation training process. Besides, there are some distinct characteristics for the captured clothing images and sketches using our mobile sensors. First, the clothing images in database are usually captured in studio environment with controlled lighting conditions, and those images that contain person or other objects are excluded. Moreover, some irrelevant backgrounds of images are also eliminated. So, the backgrounds of these clothing images are often clean. Second, the query sketches are captured by mobile sensors with unknown drawing

viewpoints, and the volunteers can draw the sketch on the different views. Third, all the pictures, whatever clothing images and sketches, are scaled in a uniformed size and sent to the same cloud-server machine through mobile sensors network for processing. Meanwhile, one can download or upload the new collected pictures from the server by using the mobile sensors. Such an example of clothing sketches and images is shown in Figure 2.

Component segmentation We now describe our algorithm for segmenting and recognizing components for each clothing image– sketch pair. Because clothing sketches are sparse in visual cues (e.g.without color and texture), these unique properties of clothing sketches make traditional image segmentation techniques inapplicable. Therefore, we use a supervised learning based on CRF model to segment each clothing image–sketch pair into meaningful components. First, we manually assign a set of component labels to each clothing image–sketch pair. Based on analyzing real clothing prototype and the parts of human body, we divide clothes into six main component categories: upper outer garment (i.e. T-shirt), lower outer garment (i.e. pants and skirt), body-connected garment (i.e. onepiece dress), sleeve, collar, and accessories (i.e. button, zip, and belt). Then, we label clothing image–sketch pair according to these six component categories. Since a typical clothing sketch can be seen as a list of strokes that is represented as a sequence of points, we sample a series of points on each stroke and label these points with corresponding component label. Meanwhile, we

Lei et al.

7

Figure 3. An example of component tag labeling of clothing sketch, and different components are in different colors.

need to guarantee the number of sample points large enough to avoid capturing not enough information for each sketch, and the component boundary should have sample points. In the implementation, we sample at least 300 points for each sketch. An example of component tag labeling of clothing sketch is shown in Figure 3. For the clothing images, we do not label component on the images directly as traditional way. Instead, we represent clothing images as a set of contours because the clothing image contour representation can more compactly capture informative characteristics while preserving the structural properties of the image. We adopt Canny operator to extract an edge map from the input clothing image and then detect contours from the derived edge map using the approach proposed in Arbelaez et al.41 Therefore, we can perform the same component labeling procedure on the clothing image as clothing sketch. Inspired by 3D model segmentation,38 we also design two kinds of feature: unary features and pairwise features to capture segmentation information of the sample points. Unary features are used to predict the probability of a component label given a sample point on the contour. Pairwise features are used to find potentially indicative boundaries between two adjacent components. Before computing any features, we normalize all clothing image–sketch pairs into the same scale. The details of unary features and pairwise features extraction process will be described as follows. Our defined unary features include shape local width, average Euclidean distance (AED), angle of tangential direction from the central point, inner shape context histogram, and distance from the central point. 1.

Shape local width. For each sample point pi , we first compute the tangential line of this point according to its adjacent point, and then, we use a mask centered around the vertical direction of this tangential line and send several rays inside

2.

3.

4.

to the other side of the sketch, gather the intersection points, and measure the ray lengths. If the angle between ray and the vertical direction of the tangential line is 108 , 208 , and 308 , then we compute the length of these rays. So, shape local width is defined as the weighted average of the lengths of these rays. This yields 70 features. AED: AED is the invariant in rotation and translation, and we use this feature to measure how far each sample point is from the rest part of the sketch. For example, sleeves have usually higher AED than other components in a shirt sketch. The AED for each sample point is computed by averaging the Euclidean distance from each sample point to all the other sample points of the sketch. Angle of tangential direction from the central point. Angle of tangential direction from the central point is the invariance in rotation, which can be used to capture local details of the contour around the sample point. Inner shape context histogram. Inner shape context histogram utilizes the inner distance and inner angle to capture spatial relationships of the edge lines which is good at representing complex internal structures of the sketch. Inner distance is defined as the length of the shortest path between a pair of sample points, and this path should be contained within the given object. Inner angle is defined as the tangential direction at the starting point of the shortest path connecting these two points. For each sample point, inner shape context histogram is defined as a histogram of the relative coordinates to all the other sample points. So, we first calculate the relative inner distance and inner angle between these sample points and use logpolar space to construct bins for histogram. In the implementation, the relative inner distance

8

International Journal of Distributed Sensor Networks

5.

is uniformly partitioned into five intervals for the distance, and the relative inner angle is uniformly partitioned into 12 intervals for the polar angle, yielding 60 features in total. Distance from the central point. Distance from the central point is the invariance in rotation, which can also be used to capture local details of the contour around the sample point.

Pairwise feature extraction is based on each pair of adjacent sample points pi and pj along the contour line of the sketch. When people draw the sketch online for retrieving, each sketch can be represented as a timeordered sequence of points, with each point containing a 2D coordinate and a time stamp. So, we can define the adjacent point pj for each sample point pi according to their time-ordered sequence. Besides, we should remove the starting points and end points if contour of the sketch is closed. Since pairwise features are used to find potentially indicative boundaries between components, they should be more discriminative than unary features. Our designed pairwise features include shape local width difference, tangential direction difference, and concavity weight.

C(pi , pj ) =

j X

C(pk , pk + 1 ) if pi

k =j

and pj are non-adjacent

where pk and pk + 1 are the adjacent points along the contour line path between non-adjacent points pi and pj . After unary and pairwise features extraction for each sample points, the next step is we adopt CRF model to train our labeled dataset. Computing component labels of all sample point involves minimizing the following objective function E(c; xi , yij , u) =

P i

ai E1 (ci ; xi , u1 ) + ==

P i, j

bij E2 (ci , cj ; yij , u2 )

ð2Þ

The objective function has two terms, the unary term E1 and the pairwise term E2 , where E1 measures consistency between the unary features x of sample point and its corresponding component label c, which can be defined as using log-likelihood function E1 (ci ; xi , u1 ) =  log P(ci jxi , u1 )

1.

2.

3.

Shape local width difference. The boundary between two components has a much smaller shape local width difference than inside a component. So, we can use it as one of the pairwise features. For each pair of adjacent sample points, we compute the absolute values of the differences between their corresponding shape local widths. Tangential direction difference. The difference of tangential direction represents the curving degree of the contour. Similarly, we compute the absolute values of tangential direction difference between each adjacent sample points. Concavity weight. Since the concave regions in a sketch can be regarded as candidates for component boundaries, we define concavity weight to find concave regions. For each sample point pi , we compute the corresponding curvature and curvature vector. If pi and pj are adjacent points, we define concavity weight as the absolute values of the differences between their corresponding curvature and curvature vector. If pi and pj are non-adjacent points, but pi and pj are on the same contour line, we define the concavity weight C(pi , pj ) as the accumulation of the concavity weight between the adjacent points along the contour line, which can be defined as equation (1)

ð1Þ

ð3Þ

u1 is the parameter of unary term E1 . The pairwise term E2 measures consistency of component labels ci and cj between adjacent sample points given pairwise features yij , which can be defined as E2 (ci , cj ; yij , u2 )=W (ci , cj )  ½k log P(ci 6¼ cj jyij , u2 ) + m ð4Þ

where W is the penalty matrix of the component label and W (ci , cj ) is used to represent compatibility of the component labels ci and cj . W is a symmetric matrix, and the element of this matrix will be initialized according to the adjacency of the component label, and nonadjacent label will be set as a large number and adjacent label will be set as 1. P(ci 6¼ cj jyij , u2 ) represents the probability of the difference between two component labels. k is the scale factor and m is the smooth factor. We use CRF model to train component segmentation parameters u1 and u2 of the objective function. After extracting unary features and pairwise features, we adopt JointBooster classifier to train this model. JointBoost classifier can automatically select useful features from among hundreds of possible features during classification process. Therefore, it can handle large numbers of input features for multiclass classification.42 We split the whole training dataset into five parts uniformly. Among them, four are used as training set, the

Lei et al.

9

rest are used as validation set. The training process can be described as follows. We first perform softmax transformation for the unary term E1 and the pairwise term E2 P(c = ci jxi , u1 ) =

Pexp (H(xi , ci )) exp (H(x, c)) c2C

ð5Þ

where H(x, c) is the weight sum of decision stump hm (x, c) M P

H(x, c) =

hm (x, c)

ð6Þ

m=1

The decision stump hm (x, c) can be defined as 8 f < a if xi .t and c 2 C hm (x, c; f) = b if xfi \t and c 2 C : c k if c 2 6 C

ð7Þ

Then, we can define an optimization function for the training process Y (fj ) =

N P P ci 2C i = 1

wci (zci  hm (xi , ci ; fj ))2

ð8Þ

where fj is the parameter of decision stump during jth iteration process and zci is the indicative function; if the component labels i and c are same, z is set as 1 and otherwise z is set as 21. To minimize optimization function, we can compute partial derivative of parameters as , bs , and ksc and set their partial derivative values as 0. Performing optimization process iteratively, update the weight of training data wci

   wci exp zci hnm (xi , ci ; f)

ð9Þ

Optimize this objective function iteratively to finish training process by using JointBooster classification algorithm. To predict P(ci 6¼ cj jyij ) in the pairwise term E2 , we perform the same training process. After achieving objective function parameters u1 and u2 , we use an exhaustive method to test all the possible combinations of the remaining parameters W, k, and m. In the exhaustive process, we adopt Graph-cut algorithm to refine component boundary and define the classification error as

Ec =

P i



c

zci + 1 2

 ð10Þ

where ci is the output component label of classificac tion. If ci 6¼ ci , then zci = 1, and the value of Ec will c increase to 1. If ci = ci , then zci = 1, and then, the value of Ec is fixed. After training process, the learned CRF model can predict the component label for each stroke line automatically when user inputs a new

clothing sketch. Therefore, we can use segmentation components into our retrieval system.

Component feature extraction In this section, we will describe how to design effective component feature descriptors to retrieve clothing images. After segment components of clothing sketch, we use composite features combining both global and local features extracted from components. The global features capture global exterior boundary of clothing sketch and can tolerate a certain degree of stroke deformation, and local features capture the discriminative interior content and details. So, these two types of features can be combined into composite feature descriptors for each sketch to achieve integrated information. Thus, two similar clothing images from the database can be further differentiated with a higher confidence. Moreover, we make use of the relationships between components to achieve topological structure when measuring similarity between clothing sketch and image. To extract the global features of the clothing sketch, Fourier descriptors, Spherical Harmonics function,43 Zernike moments, and eccentricity are adopted to represent global features. The Fourier descriptors and spherical harmonics function describe contour information of the sketch. Zernike moments are derived from a set of complex Zernike polynomials, which form a complete orthogonal set over the unit circle. Let Anm denote the Zernike polynomials, and it is defined as Anm =

n+1 p

PP x

f (x, y)Vnm (x, y), x2 + y2 ł 1

y

ð11Þ

where n is the degree of polynomial, m is the repetition number, f (x, y) is the image function, and Vnm (x, y) can be as Vnm (r, u) which is defined as 2 njmj 3 s 2 n2s X (  1) (n  s)!r    5ejmu Vnm (r, u) = 4 njmj n + jmj ! ! s = 0 s! 2 2

ð12Þ

Zernike moments are translation and scale invariant, and it is suitable for contour representation. Notice that the efficiency and precision of shape representation are greatly influential by the number of moments, we use the first 10 moments in our implementation. Eccentricity depicts the shape deviation from a circle which can be represented as hP

N i=1

ecc = hP N

(xi  xc )2 

PN

i=1

(yi  yc )2

i

i PN 2 2 (x  x ) + (y  y ) i c i c i=1 i=1 hP i N 2 PN 2 4 i = 1 (xi  xc ) i = 1 (yi  yc ) i + hP PN N 2 2 i = 1 (xi  xc ) + i = 1 (yi  yc )

ð13Þ

10

International Journal of Distributed Sensor Networks

where f(xi , yi ), i = 1, 2, N g is the sample points along the contour of the sketch and f(xc , yc )g represents the central point of the circle. For the local features of sketch component, we use a BOW method to extract and encode them at different resolution. More specifically, the bounding box for each component of clothing sketch is computed first, and then, it is subdivided into increasingly finer subregions at resolution 1, . . . , l, . . . , L such that the regions at level l have sub-regions 2l along each axis direction for a total of 2l 3 2l sub-regions. For each resolution level l, we compute the gradient magnitude m(x, y) and gradient orientation u(x, y) for every pixel at every 2l 3 2l sub-region and encode them into visual words, which can be defined as m(x, y) = ((I(x + 1, y)  I(x  1, y))2 1

+ (I(x, y + 1)  I(x, y  1))2 )2 I(x, y + 1)  I(x, y  1) u(x, y) = arc tan I(x + 1, y)  I(x  1, y)

ð14Þ

where I(x, y) denotes the intensity value in the pixel (x, y) of the sketch. For every sub-region, the gradient orientations are quantized into nine bins for 0  1808 (208 for each bin). So, we can represent a ninedimensional feature vector for each sub-region ci (i = 1, 2, . . . , 2l 3 2l ): H(ci ) = ½h1 (ci ), . . . , hk (ci ), . . . , h9 (ci ), where hk (ci ) is defined as hk (ci ) = #fu(x, y) 2 bin(k)g

ð15Þ

#fg denotes the pixel (x, y) in the sub-region ci falling into the gradient orientation bin(k). For each segmented component, we perform the same procedure to extract these global and local features. The global features and the local features are complementary to each other which can represent the most important visual characteristics of the clothing sketch.

Similarity computation In this section, we will describe how to measure similarity between a query clothing sketch and a clothing image through making use of components. We first combine extracted global and local features from each component to a composite feature descriptor and define the similarity between two different components of clothing sketch and clothing image as Simcomp = 1  (wg Disg (cs , ci ) + wl Disl (cs , ci ))

ð16Þ

wg and wl are weights of global and local features and wg + wl = 1. In our implementation, we set wg = 0:4 and wl = 0:6. Disg (cs , ci ) and Disl (cs , ci ) represent the

distance of global and local features, respectively, which can be computed by using cosine distance V(cs )V(ci ) Dis(cs , ci ) = 1  kV(c s )kkV(ci )k n P (V (cs )j 3 V (ci )j )

j=1 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi = 1  rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n n P P 2 2

(V (cs )j )

j=1

ð17Þ

(V (ci )j )

j=1

where V(cs ) and V(Ci ) represent composite feature vector of components from clothing sketch and clothing image, respectively. In addition, the spatial relationship between components can provide the topological information of clothing sketch, and similar styles of clothes have similar component topological structure. Therefore, we characterize the topological information through creating component graph. The components are regarded as nodes of graph, and edges are connected between adjacent components. We assume that different pairs of adjacent component labels need to incur different penalties. For example, a pair of clothing sketch and image have a same shirt-like upper outer garment component label, and the adjacent component label for the clothing sketch’s upper outer garment component is a pant, while the adjacent component label for the clothing image’s upper outer garment component is a skirt, and we can reduce their score and give more penalties simultaneously. According to this assumption, we attempt to estimate the similarity of topological information between clothing sketch and image through counting the number of similar components commonly existing in both of them Simtop =

P

(wc R(cs , ci ) + wadj(c) R(adj(cs , ci )))

comp

ð18Þ

where R(cs , ci ) represents the counting number of similar component labels between clothing sketch cs and clothing image ci and R(adj(cs , ci )) represents the counting number of similar adjacent component labels. wc and wadj(c) represent the weight of the similar component label and similar adjacent component label, respectively, which are defined by measuring the proportion of the number of sample points in this similar component to the total number of sample points in the whole sketch count(samp(compi )) wcompi = P count(samp(comp))

ð19Þ

comp

Through integrating the component feature similarity definition of equation (16) and topological similarity definition of equation (18), we can define the final

Lei et al.

11

Algorithm 1: Sketch-based clothing image retrieval algorithm Input: Clothing image database Im , clothing sketch database Sn , and the user query clothing sketch S. Output: Ranked clothing image list L according to similarity to sketch S 1 Create clothing image–sketch correspondence pairs for Im and Sn 2 for each clothing image–sketch pair do 3 use Canny operator to extract edge map; 4 sample a series of points along the edge; 5 for each sample point pi do; 6 assign component labels ci for sample point pi ; 7 extract unary feature descriptors xi ; 8 if pi and pj are adjacent points then 9 extract pairwise feature descriptors yij 10 input xi and yij to CRF model E(c; xi , yij , u); 11 initialize the weight wic and the decision stump hm (xi , ci ; f); 12 while not end of wic ł e do  13 update the training weight wic wic exp ( zci hnm (xi , ci ; f)); 14 Segment the query clothing sketch S and clothing images I from the database into different components c; 15 for the sketchSand each clothing imageI do 16 for each componentc do 17 extract global and local feature descriptors Gf and Lf , respectively; 18 compute component similarity SimP com = 1  (wg Disg (cs , ci ) + wl Disl (cs , ci )); (wc R(cs , ci ) + wadj(c) R(adj(cs , ci ))); 19 compute topology similarity Simtop = comp P Simcomp (cs , ci ); 20 combine component similarity and topology similarity Sim(S, I) = Simtop (cs , ci ) + comp

21 Sort all clothing images according to the similarity; 22 Return ranked list L to the user.

similarity measurement between the clothing sketch S and clothing image I as follows

Then, when the user submits a query clothing sketch to the search engine, the proposed online search algorithm can return relative clothing images by using similarity measurement equation (19). The advantages of the combination of component feature similarity and topological information similarity in our algorithm not only improve the accuracy of whole retrieval system but also enhance the effectiveness of components usage and further leads to the overall improvement of whole system. To summarize, the proposed online search algorithm can be described as in Algorithm 1.

retrieval system can be used to verify the effectiveness of our approach and it is composed of two parts: an off-line component segmentation process and an online searching procedure. In the off-line component segmentation process, we manually assign a set of component labels to each clothing image–sketch pair and use CRF model to train the labeled repository. Even though it may take several hours in this off-line processing, training process can effectively improve the performance in the online searching stage. After offline training stage, the prototype retrieval system can segment any specific query clothing sketch into different components automatically. In addition, the retrieval system provides a convenient way that users can directly draw or upload an existing hand-drawn sketch through mobile sensors.

Experiments and discussion

Experimental database

In this section, we give the image database used for evaluation purposes, the experimental setup, and the experimental results. In order to verify the effectiveness of the proposed method, we implemented a prototype sketch-based clothing image retrieval system, and all experiments are performed on a PC Server with Ubuntu 14.10, Intel E5-2620v4 2.2G 10C, nVidia TESLA K40 12GB GPU, and 12GB DDR4 memory. The used softwares include Visual Studio 2015, OpenCV 3.2, and Matlab R2015b. The prototype

To construct clothing image repository used in our experiments, we used different mobile sensors, that is, mobile phones, cameras, and drones, to capture clothing images from different shopping malls, and also, we crawled some images from several popular e-commerce website. In total, we have collected more than 15,000 clothing images, which split into 32 categories. For the clothing sketch used in our experiments, some volunteers are e-recruited to draw clothing sketches. To make our clothing sketch dataset contain different kinds of

Sim(S, I) = Simtop (cs , ci ) +

P

Simcomp (cs , ci )

comp

ð20Þ

12 drawing styles, the volunteers include not only professional garment designers but also non-expert college students. We provide a clothing category name (e.g. suits) and a clothing image example to a volunteer and ask volunteer to draw similar sketch. All sketches are directly drawn on the mobile sensors and sent on a central serve for storage. Besides, to produce a great variety of clothing sketches, we also search some clothing sketches from the Internet by using the keywords clothing category + sketch (e.g. dress sketch). We have collected 5320 clothing sketches in total. Thus, one sketch can correspond to more than two clothing images when we create clothing image–sketch pairs for component segmentation training process. More information of our experimental database can be referred in section ‘‘Data collection.’’

Experimental results In order to evaluate the performance of the proposed algorithm, we carried out a dedicated session of image retrieval experiments on the above database. For each search session, we hope to find clothing images from the image dataset that appear visually similar to the input query clothing sketch. Such tasks are equivalent to the use scenario in reality when customers in an ecommerce shopping setting try to find particular clothing images meeting their personal visual appearance requirements. Figure 4 illustrates a series of query examples by using our method: the left side is the query clothing sketch and right side is the retrieved clothing images. The figure gives the retrieval examples of different kinds of clothes. It can be seen from these examples that whether suit, T-shirt, long-sleeve coat, or shortsleeve jumpsuit, the top eight retrieved clothing images by our method are basically belonging to the same class with the input query image. The same conclusion can be obtained in the retrieval experiments on the other classes. Through the performance evaluation and retrieval results, we can see that our component feature is able to effectively represent characteristic in clothing images. In our experiments, a query session can be finished using less than 2.5 s on average by our prototype system implementation against the image dataset with over 15,000 images. To comprehensively evaluate the performance of the proposed algorithm, we also conduct a series of experiments to compare the proposed approach with other state-of-the-art retrieval approaches. Since most the existing SBIR methods are researching on how to retrieve nature images, there are few SBIR methods concentrated on the particular problem of clothing image search. Differ from natural images, clothing images are often captured by mobile phones, cameras, and other kinds of mobile sensors in a studio

International Journal of Distributed Sensor Networks environment from specific viewing angles that can best reveal the visual appearance and appeal of the clothes. For comparison in a fair environment, we implement some general-purpose SBIR retrieval approaches to use the same query clothing sketch to retrieve clothing images in our database. We consider the method of standard HOG (SHOG) within a bag-of-word framework,7 GF-HOG,6 Edgel Index,8 Learned KeyShapes (LKS),9 and deep feature.30 We choose these methods as the baseline methods in our study because these methods have been widely used or are popular in recent image retrieval research, such as Edgel Index8 has been used in a notable search engine MindFinder system26 to search for million-level image database. Among them, GF-HOG6 and SHOG7 are the typical BoWbased methods, all of them employ a BoW strategy but with different feature descriptors. Edgel Index8 and LKS9 are contour-feature-based methods. Deep feature30 inputs sketch into a DCNN-like images to learn feature vectors for SBIR. We adopt standard Precision/Recall and Mean Average Precision (MAP) which are widely used performance metrics to measure retrieval effectiveness. For each query in category C which contains N clothing images, Precision means the ratio of the number of relevant images retrieved to the number of images retrieved and Recall means the number of relevant images retrieved to the number of relevant images in database, which can be described as follows Precision = r(L)=L Recall = r(L)=N

Average precision (AP) considers the order of the retrieved images, which is computed as the area under the precision-recall curve for each query N P

AP =

P(r) 3 rel(r)

r=1

L

where rel(r) is an indicator function equaling 1 if the item at rank r is a relevant image and zero otherwise. MAP can be obtained by computing the average value of the AP for all queries. Figure 5(a) illustrates the precision–recall curves that compare our proposed method with other methods, where the x-axis indicates the rate of precision, and the y-axis indicates the rate of recall. Figure 5(b) presents the MAP score to count the average percentage of relevant images appearing in top K image retrieval results returned by an algorithm. From the experiment results, we can conclude that our proposed method outperforms the other methods and has a greater retrieval power. The precision does not drop quickly, and the precision rates and MAP

Lei et al.

13

Figure 4. Examples of retrieval results using our proposed method. The left first column represents the query clothing sketches and right is top eight retrieved clothing images. (a) Long pant, (b) sleeveless dress, (c) T-shirt, (d) suit, (e) long-sleeve coat, (f) turndown collar coat, (g) jacket with hat, and (h) short-sleeve jumpsuit.

scores are the best in most query cases. This means that the idea of segmenting clothing sketch into different

meaningful components and exploiting component structural information is surely effective for clothing

14

Figure 5. Performance comparisons of the proposed method with other methods. (a) Precision and recall curve. (b) MAP score with different number of top-K retrieved images.

image retrieval and is more robust than other state-ofart retrieval approaches.

Discussion Among these SBIR methods, we can see that Edgel Index8 is a raw contour-based retrieval method and can be computed quickly, but its accuracy is lower than BOW-based methods such as SHOG and GF-HOG, which is because Edgel Index has difficulty in handling clothing sketch with huge distortion. SHOG features7 are popular and powerful classic for image retrieval, and SHOG improves the traditional HOG descriptor by storing only the most dominant sketched feature lines. GF-HOG6 is extracted locally to pixels of the Canny edge map, and then, BoW codebook is formed to quantize the relative location and spatial orientation of sketches or Canny edges of images. The retrieval performance of GF-HOG6 is slightly better than conventional HOG features, but there is still lack of capability to handle the input sketch with complex internal structures. LKS9 uses the sketch token to detect KeyShape which performs better than other handcrafted feature baseline methods. However, this method ignores describing detailed information of

International Journal of Distributed Sensor Networks sketch components and is highly sensitive to the stylistic variation of the query sketch. Compared to the traditional features, we notice that deep features30 outperform other SBIR methods. DCNN is a powerful tool in the area of general-purpose image classification and retrieval. However, there are still several challenges in learning deep sketch feature: (1) existing DCNN and other deep learning methods are designed for processing color images. Unlike images, hand-drawn sketch loses the colors and textures of these two important visual features. There is a big semantic gap between image and sketch even when they belong to the same subject. Some literatures have demonstrated that directly employing deep learning methods produce little improvement over hand-crafted features.5 (2) Most of the deep learning methods require large amount of training data to avoid overfitting given millions of model parameters. However, our collected clothing sketch and image dataset are far small used for training a deep learning model. (3) The setting of parameters of deep learning model can greatly affect the final retrieval results, and the users need to carefully investigate the steps of pre-training, fine-tuning, and the structure of the network. In addition, the training time is proportional to the total number of training data. Nevertheless, we believe that deep learning network can achieve a better performance by inputting more number of training data or adopting more complex procedures, such as using annotated semantic information to help extract better deep feature representation for sketches and images. Therefore, the abovementioned general retrieval methods do not reflect the topology structures of clothes and capture clothing style details so that general retrieval methods are not suitable to the clothing image retrieval. As the results of the general retrieval algorithms used in specific areas cannot meet the users’ needs of clothing image retrieval, the purpose of this article is to seek an efficient retrieval method for clothing images and to find a component-based method which can reflect the structural information and style characteristics of the clothes, overcoming the shortcomings of traditional general-purpose SBIR methods. Since the clothing images captured by mobile visual sensors are the picture of the man-made products, the clothing image is highly artificial. Its basic components’ shape is regular and determined. Although our method can effectively describe the clothing images, it still have the room to improve. Sometimes, when mobile visual sensors capture a blurry clothing image or input a sketch containing too much noise, our method needs to design more effective features to handle these cases. In addition, this method does not use pattern and texture feature to

Lei et al.

15

distinguish different styles of clothes which are important factors for users choosing the clothes.

Conclusion In this article, we introduce a novel sketch-based clothing image retrieval algorithm based on sketch component segmentation in mobile visual sensors. The method first uses mobile sensors to collect a large scale of clothing sketches and images as training dataset and creates sketch–image pairs for each clothing sketch. Then, we manually tag with semantic component labels for each image–sketch pair and employ CRF model to train a classifier based on tagged component labels. Any specific kind of query clothing sketch can be segmented into different components automatically when finish training. After that, the global and local features of each segmented component are extracted. Furthermore, our method also uses the spatial adjacent relationship of components to describe the topological information and designs a dynamic componentweighting strategy to boost the effect of important components. Finally, fine-grained part-level retrieval results for the query sketch can be achieved. Experimental results have demonstrated satisfactory performance of this method and favorable advantages of the method versus the state-of-the-art peer methods. Although the proposed method has achieved better performance than previous works, we also need to consider how to retrieve and segment new categories of clothing sketch if labeling is sparse or skewed. As future work, we would like to develop a fast, parallelized implementation for clothing component segmentation and incorporate user feedback mechanism into our method to improve the performance of the algorithm. Declaration of conflicting interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is jointly supported by the National Natural Science Foundation of China (61762050, U1504608, 61602222, and 61602221), the Jiangxi Natural Science Foundation (20161BAB212043), and the Science and Technology Research Project of Jiangxi Provincial Department of Education (No. GJJ160280).

ORCID iD Haopeng Lei

https://orcid.org/0000-0002-1988-4155

References 1. Zhou Z, Wu QJ, Huang F, et al. Fast and accurate nearduplicate image elimination for visual sensor networks. Int J Distrib Sens N. Epub ahead of print 27 February 2017. DOI: 10.1177/1550147717694172. 2. Gao R, Wen Y, Zhao H, et al. Secure data aggregation in wireless multimedia sensor networks based on similarity matching. Int J Distrib Sens N 2014; 10(4): 494853. 3. Shi F, Gao J and Huang X. An affine invariant approach for dense wide baseline image matching. Int J Distrib Sens N. Epub ahead of print 19 December 2016. DOI: 10.1177/1550147716680826. 4. Mustafa S, Songfeng L, Zaid A, et al. Efficient encrypted image retrieval in IoT-cloud with multi-user authentication. Int J Distrib Sens N. Epub ahead of print 26 February 2018. DOI: 10.1177/1550147718761814. 5. Yu Q, Yang Y, Liu F, et al. Sketch-a-net: a deep neural network that beats humans. Int J Comput Vision 2017; 122(3): 411–425. 6. Hu R and Collomosse J. A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Comput Vis Image Und 2013; 117(7): 790–806. 7. Eitz M, Hildebrand K, Boubekeur T, et al. Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE T Vis Comput Gr 2011; 17(11): 1624–1636. 8. Cao Y, Wang C, Zhang L, et al. Edgel index for largescale sketch-based image search. In: Proceedings of the 24th IEEE conference on computer vision and pattern recognition, CVPR 2011, Colorado Springs, CO, 20–25 June 2011, pp.761–768. New York: IEEE. 9. Saavedra JM and Bustos B. Sketch-based image retrieval using keyshapes. Multimed Tools Appl 2014; 73(3): 2033–2062. 10. Lafferty JD, McCallum A and Pereira FNP. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 28th international conference on machine learning, ICML 2011, Bellevue, WA, 28 June–2 July 2011, pp.282–289. San Francisco, CA: Morgan Kaufmann Publishers Inc. 11. Jang SW and Kim GY. A multiple feature-based imageswitching strategy in visual sensor networks. Int J Distrib Sens N 2015; 11(7): 143754. 12. Li Z, Bi J and Chen S. Traffic prediction-based fast rerouting algorithm for wireless multimedia sensor networks. Int J Distrib Sens N 2013; 9(5): 176293. 13. Han C, Sun L, Xiao F, et al. A camera nodes correlation model based on 3D sensing in wireless multimedia sensor networks. Int J Distrib Sens N 2012; 8(11): 154602. 14. Wang L, Yang Y, Zhao W, et al. Network-coding-based energy-efficient data fusion and transmission for wireless sensor networks with heterogeneous receivers. Int J Distrib Sens N 2015; 10(3): 351707. 15. Li Z, Kong W, Zhan Y, et al. A novel image mosaicking algorithm for wireless multimedia sensor networks. Int J Distrib Sens N 2013; 9(10): 719640. 16. Xu X, He L, Lu H, et al. Deep adversarial metric learning for cross-modal retrieval. World Wide Web. Epub ahead of print 4 April 2018. DOI: 10.1007/s11280-018-0541-x.

16 17. Park BG, Lee KM and Lee SU. Color-based image retrieval using perceptually modified Hausdorff distance. Eurasip J Image Vide 2008; 2008(1): 263071. 18. Xu C, Liu J and Tang X. 2D shape matching by contour flexibility. IEEE Trans Pattern Anal Mach Intell 2009; 31(1): 180–186. 19. Han J and Ma KK. Rotation-invariant and scaleinvariant Gabor features for texture image retrieval. Image Vision Comput 2007; 25(9): 1474–1481. 20. Philbin J, Isard M, Sivic J, et al. Descriptor learning for efficient retrieval. In: Proceedings of the 11th European conference on computer vision, ECCV 2010, Crete, 5–11 September 2010, pp.677–691. Berlin: Springer. 21. Zhang Y, Jia Z and Chen T. Image retrieval with geometry-preserving visual phrases. In: Proceedings of the 24th IEEE conference on computer vision and pattern recognition, CVPR 2011, Colorado Springs, CO, 20–25 June 2011, pp.809–816. New York: IEEE. 22. Eitz M, Hildebrand K, Boubekeur T, et al. Sketch-based 3D shape retrieval. In: Proceedings of the international conference on computer graphics and interactive techniques, SIGGRAPH 2010 (Talks Proceedings), Los Angeles, CA, 26–30 July 2010. New York: ACM. 23. Li B, Lu Y, Johan H, et al. Sketch-based 3D model retrieval utilizing adaptive view clustering and semantic information. Multimed Tools Appl 2017; 76(24): 26603–26631. 24. Liu YJ, Ma CX, Fu Q, et al. A sketch-based approach for interactive organization of video clips. ACM T Multim Comput 2014; 11(1): 2. 25. Chen T, Cheng MM, Tan P, et al. Sketch2photo. ACM T Graphic 2009; 28(5): 124. 26. Cao Y, Wang H, Wang C, et al. MindFinder: interactive sketch-based image search on millions of images. In: Proceedings of the 18th ACM international conference on multimedia, SIGMM 2010, Firenze, 25–29 October 2010, pp.1605–1608. New York: ACM. 27. Sun X, Wang C, Xu C, et al. Indexing billions of images for sketch-based retrieval. In: Proceedings of the 21st ACM international conference on multimedia, Barcelona, 21–25 October 2013, pp.233–242. New York: ACM. 28. Zhou R, Chen L and Zhang L. Guess what you draw: interactive contour-based image retrieval on a millionscale database. In: Proceedings of the 20th ACM international conference on multimedia, SIGMM 2012, Nara, Japan, 29 October–02 November 2012, pp.1343–1344. New York: ACM. 29. Eitz M, Hays J and Alexa M. How do humans sketch objects? ACM T Graphic 2012; 31(4): 124. 30. Wang X, Duan X and Bai X. Deep sketch feature for cross-domain image retrieval. Neurocomputing 2016; 207: 387–397. 31. Li K, Pang K, Song YZ, et al. Synergistic instance-level subspace alignment for fine-grained sketch-based image retrieval. IEEE T Image Process 2017; 26(12): 5908–5921.

International Journal of Distributed Sensor Networks 32. Yang W, Luo P and Lin L. Clothing co-parsing by joint image segmentation and labeling. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition, CVPR 2014, Columbus, OH, 24–27 June 2014, pp.3182–3189. New York: IEEE. 33. Liu S, Song Z, Wang M, et al. Street-to-shop: crossscenario clothing retrieval via parts alignment and auxiliary set. In: Proceedings of the 2012 IEEE conference on computer vision and pattern recognition, CVPR 2012, Providence, RI, 16–21 June 2012, pp.1335–1336. New York: IEEE. 34. Huang J, Feris R, Chen Q, et al. Cross-domain image retrieval with a dual attribute-aware ranking network. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, 8–10 June 2015, pp.1062–1070. New York: IEEE. 35. Mizuochi M, Kanezaki A and Harada T. Clothing retrieval based on local similarity with multiple images. In: Proceedings of the 22th ACM international conference on multimedia, SIGMM 2014, Orlando, FL, 3–7 November 2014, pp.1165–1168. 36. Wang F, Huang S, Shi L, et al. The application of series multi-pooling convolutional neural networks for medical image segmentation. Int J Distrib Sens N. Epub ahead of print 21 December 2017. DOI: 10.1177/1550147717 748899. 37. Huang Z, Fu H and Lau RWH. Data-driven segmentation and labeling of freehand sketches. ACM T Graphic 2014; 33(6): 175. 38. Kalogerakis E, Hertzmann A and Singh K. Learning 3D mesh segmentation and labeling. ACM T Graphic 2010; 29(4): 102. 39. Liu L, Su Z, Luo X, et al. Automatic 3D garment modeling by continuous style description. In: Proceedings of 2013 international conference on computer graphics and interactive techniques Asia posters on, SIGGRAPH Asia 2013, Hong Kong, 19–22 November 2013, p.15. New York: ACM. 40. Schneider RG and Tuytelaars T. Example-based sketch segmentation and labeling using CRFs. ACM T Graphic 2016; 35(5): 151. 41. Arbelaez P, Maire M, Fowlkes C, et al. Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 2011; 33(5): 898–916. 42. Antonio T, Kevin M and Freeman W. Sharing visual features for multiclass and multiview object detection. IEEE Trans Pattern Anal Mach Intell 2007; 29(5): 854–869. 43. Min P, Chen J and Funkhouser T. A 2D sketch interface for a 3D model search engine. In: Proceedings of the 29th international conference on computer graphics and interactive techniques, SIGGRAPH 2002, San Antonio, TX, 21– 26 July 2002, p.138. New York: ACM.

Suggest Documents