data mining and knowledge engineering

3 downloads 0 Views 5MB Size Report
Dr.K.K. Thyagharajan, Professor and Principal of RMK College of. Engineering and ..... Engineering from College of Engineering Guindy, Anna University.
CIIT INTERNATIONAL JOURNAL OF

DATA MINING AND KNOWLEDGE ENGINEERING

1

Distributed Search in P2P Networks through Secure-Authenticated Content Management Systems

532

L. Ramesh, V. Praveen Kumar and K. Vijay Kumar

2

An Efficient K-Means Clustering Algorithm for Large Data

539 K. Srinivasa Rao, K. Kiran Kumar and P. Srinivasa Rao

Comparative Analysis of Different Noise Sequence Embedded ColQr Image Watermarking Techniques in Spatial Domam :~ •. 4

An Enhanced Algorithm for Mining Color Images - A Novel Approach

550

c. Lakshmi Devasena, R. Radha Krishnan and Dr.M. Hemalatha

5

8

An Efficient Algorithm for Mining Frequent K-Item Sets for AssociatiQnRule Mining in Large Databases

-!

555

N. Kavitha and S. Karthikeyan

A Proposed New Algorithm for Hierarchical Clustering Suitable for Video Data Mining

569

D. Saravanan and Dr.S. Srinivasan

9

Moving Region in Spatial Temporal Data Warehousing

573 Dr. V. f,;arthikeyani, I. Shahina Begam, K. Tajudin and I. Parvin Begam

11

A Survey on Visual Cue Based Data Area Identification in Unsupervised Web Data Extraction

581

M. Priya and S. Jamuna Rani

12

Storing and Indexing Spatial Data in P2P Systems

587 S. Imavathy, S. Mahalakshmi, R. Uma Maheswar~ T. Vaishnavi and P. Vennila

CiiT hrmgll1j:

,lte lI"odd /0('111(1'

Automatic Image Classification Using SVM Classifier

Abstract---In this world of fast computing, automation plays an important role. In image retrieval technique automation is a great quest. Giving an image as a query and retrieving relevant images is a challenging research area. If we go for automation we are in need of an automatic learning technique to predicate the result. So in this paper we are proposing a design of automatic image classification. For the concept of classification here we are using Support Vector Machine classifier, a semi-unsupervised learning technique to classiJy the i.ma'E,es automati.ca\\')' wi.thout an')' manua\ wmk. lhe

attribute used for classification are the low level feature such as color and texture of an image. To extract the feature from an image here we use the standardized MPEGTs color and texture descriptor, with which we create a 34 byte DCE Chuck which is used to classify the Images.

II.

In [16] and [12] uses the MPEG 7 as the low level feature and tried to use SVM, KNN and Fuzzy classifier for image classification. In those paper they haven't explain how those different byte sized MPEG 7 descriptor where combined and the result regarding the low level feature were not specified. In [14] they have taken only canine a group of animal ontology for retrieval. Comparing all those paper in our proposal we have taken two different domains such as flower and sport ontology feature. Also the low level feature description and classification where discussed in the below sections III.

1.

INTRODUCTION

A

N image is equivalent to thousand words that is the fireason why we have enormous number of image database and lots of research for retrieving is been proposed yet we haven't [1][2][3] reach the effective way of retrieving it. The main reason is that lack of unique feature for each are every images, as for human identity his biometric feature like eyes, thumb impression and ect. provide the uniqueness. The main reason for the lack of imperfection in image retrieval is that like our literal data the image data cannot be made generic, as the images are of in different format and the low level features of same image with different format is completely different. So there is a need to fill the semantic gap between low-level and high-level feature. Once the Images in a database in been classified then the retrieval procedure would be easier. As classifying billions or trillions of image in web is not feasible, here in this paper we standardized a way of classifying two different set of images using SVM through which a kind of automation can be implemented for the rest of the domain images. The paper are organized in such a way first we described the MPEG7 features used then the general concept of SVM and how the classification is been done

Manuscriptreceivedon June 17,20 II, review completedon July 0 I, 20 II and reviscdon July 04,20 II. R.I. Minu, Research Scholar at Anna Universityof Technology,Trichy, TamilNadu, India. Phone:91-9443529372;E-Mail:'·JJni" ••@yahoo.,·oJ.n Dr.K.K. Thyagharajan, Professor and Principal of RMK College of Engineering and Technology, Chennai, Tamil Nadu. E-Mail: [email protected] DigitalObjectIdentifierNo: DMKE072011 006

RELATED WORK

MPEG 7 FEATUREEXTRACTOR

MPEG 7 is an ISO/IEC [4] standard for describing the multimedia content using different standard audio/video descriptor. Table 1 shows the list visual Descriptor specified by the MPEG 7 standards. As shown in Figure 1 from the collection of images or from the query image given instead of extracting general features like the one used for Content Based Image Retrieval (reference) here we are using the standard MPEG 7's Color, Texture and Shape descriptor. TABLE! MPEG 7 VISUAL

COLOR Dominant

color

Color Layout Scalable Color Color Structure

DESCRIPTOR

TEXTURE Texture Browsing Homogenous Texture EdgeHistogram

VISUAL

DESCRIPTOR

SHAPE

MOTION

Contour Shape

CameraMotion

Region Shape

MotionTrajectory Parametric Motion MotionActivity

GOFIGO

P

In most of the related work [5] they use anyone of the feature here we use the all the three so to provide efficiency. In our proposed system we use a chuck value for each image which consists of all three descriptor value.

A.

COLOR Descriptor

The Color Space used in MPEG 7 are either RGB or HSV space. Here we convert the RGB space model images into HSV space model image. The reason behind is that all the imgaes we may collected wont be an images taken by the expert photographer so due to dull illuminance effect the RGB

value will differ, so to avoided those problem we are converting to HSV model. Below equation provide the way to convert a RGB image model to a HSV space image model. Figure I show the RGB image to HSV model space image transaction. 1.~~ 1112X(F",~ij) ; M,,~lUirf~~G,B) I) M, ~~O-4S=j) Of s=}6,-~

~

-') L

j.'",- "4

-

~"

~J

R. ......."u. u_(G-B)*6J" JMx- 'M:n {G'. "'" '~'~J . '0 = (6-B)(-3:60+£0),/. '. f? ......." .u Mx:- NIt " D

G.:l '"

J

(1) 4'..1

". -G

.!.~~_:

. u_((B -RJ .•.2"O)~:60.! lM.~-1'4:n

~fi-

Fig.I: RGB to HSV Conversion

1) Dominant Color Descriptor Among the list of Color descriptors, Dominant color is best suitable for local image. For given images maximum of 8 dominant colors is identified and label with unique numbering. The feature vector for each dominant color can calculated as F

= {CLPi,¥f,}5J

Iiflhere i

= {1,2,

... ,N}

2) Color Layout Descriptor In Color Layout Descriptor [5] the image is partitioned in to 8 x 8 block and in each block's dominant color is determined. For each 8 x 8 block Discrete Cosine Transform for Y, C" and Cb color is determined and quantized for the required bit and using the Zigzag scanning the values are tabulated in matrix form. Equation (3) is the general equation used for DCT .).~ '-'""~+":V lil-t LN-t 2:.:0-,=1) C 17::1 eo~~ ~f-" ".'"'''''''.,,=1)' 2M cas "".~"2N - .•;

I xy ="c'xc'y

(3)

(2)

CL = ilb Dominant color Pi, = Percentage in 5 bits Vi, = Color variance in 3 bits S = spatial coherency in 5 bis N = Total number of quantized color in a region in 3 bits Figure 3 shows the MPEG 7 Dominant color visual descriptor for a given flower image With this dominant color descriptor we cannot determine any hypothesis behavior. So with this detail the generated Color Layout Descriptor is used in our approach. As shown in the Figure 2 for given input image 6 dominant color are identified and their corresponding feature vector is given in the below XML as the specialty ofMPEG7 is that the output would be given in XML format. From this dominant color descriptor only the Descriptor color index is used for our Chuck value

In Equation (4) for the first block the calculated DCT value is called as DC coefficient and form Equation (5) the other value is called as AC coefficient. For most of compression technique [7] say JPEG they use only the DC coefficient to reconstruct an image. So here we had taken only that value with for all the three color space Y, Cn and Cb this concept is illustrated in Figure 3. lnputhna e

Sample DeT Coefficient CLD)

I{)

n ~

4



~

10

II

('l)1'S (Cb)29 (Cr} 33

11

31

17

ill

17

1S

15

13

19

23

16

14-

14

24

18

12,

10

15

17

17

17

.i

13 Hi 17

20 Hi

11

p _I

1&

Hi

18

is

17 14

15

14

17

.f

F



14 19

21

13

1"2

12

13

14

15

16

17

The value of 10,10 is said to be DC coefficient value is used in our system for our Chuck value.

B.

only these

TEXTURE Descriptor

Repeated patterns in an image is said to be Texture. In MPEG 7 there are three different way to represent a texture as shown in Table 1. Among them Texture browsing description is used for browsing of heterogeneous texture patterns. For all three texture vector requires 12 bit. For Homogenous texture pattern the vector includes the energy, mean, standard deviation etc. kind oftexture analysis and used as vector. T = {t1 t2 t3 t4 t5 al ..... an} (6) In Texture browsing descriptors, these vectors for an image provides the details about texture regularity, each sub block direction and its scale. Edge Histogram descriptor used to determine the edges of the images. By default the image is divided into 4X4 sub images and the edge magnitude of the each sub image is determined by comparing the standard edge provided by the MPEG 7 which is Vertical edge, Horizontal edge, 45 edge, 135 edge and non-directional edge.

~APEG7EHD 10223

52 "1

,

e

32 e

44 03 3 2 3

ccc CC

lJ'O-'O 00

NIC liJ'8

23 2 63

i 6G' 20

0'70 frfr l}32

-232 3

22 3 55 22 (';

2 26 3:5

43

Edge magnitude of sub image Ixl

II rviv

1

Mh

0

Md-45

2

Md-135

2

Mnd

3

23 1 2 (! :5 5

Let Ik(i,j) represent the sub image block, the filter coefficient of each edge vector is given by: fcvCk) = Vertical edge filter fCh(k) = Horizontal edge filter fCd45(k) = 45 edge fi Iter fCd135(k) = 135 edge filter fCnd(k) = non directional edge filter To calculate the vertical edge magnitude for a sub-block (i,j) is given by Mv(i,j) IEf;4=il1k(i,j);;. fcv(k)! (7)

=

Likewise can be calculated for shown in the Figure 4. In this paper we are proposing Dominant Color Descriptor (DCD), (CLD) and Edge Histogram classificat ion of images. DCD DJ D: -----D$ 14 bvte-

CLD y &:(\~:Cb&: 2 byte

all other

edges which

is

an idea of extracting the Color Layout Descriptor Descriptor (EHD) for

EED Bi.MB2.g --- B16.M Wbyte

As shown in Fig 5 the first 24 byte of the chuck is the Dominant color of the image. In MPEG 7 total of 8 dominant color would be extracted, each dominant color Dn has the value from (0 0 0 to 255 255 255) so it contribute total of 24 bytes . In Color Layout Descriptor we are more concentrated only on the DC coefficient block's value so it is of totally 2 byte and the last few bits gives the detail of Edge Histogram value where each block value has 16 bit data. Fig 6 shows the DCE Chuck values for different images. This value can be used to generate a quadratic equation which can be used for the SVM classifier.

IMAGES 01

02

03

DI

OCO 05 15~ 115 175 605

155 0 0 255

06

07

61 64 64 191

132 191 191 516

0 0 0 0

64 61 64 191

191 192 191 516

0 0 0

255 115 115 ~5

255 255 2~5 165

155 0 0 155

255 255 0 5ro

0 0 0

255 200 0 455

255 115 115 605

255 255 255 165

191 192 192 5IG

64 64 64 131

255 25~ 255 165

0 0 0 0

155 115 115 605

255 2W 0 455

08

CLD CR CB

Y

155 155 155 IG5

EI

E2

E3

E4

E5

E6

E7

E8

EHo E9

El0

Ell

El1

E13

El4

E15

E16

33m

It

11

1B23130 13154 41565 33566 42651 13667 42357 43664 34673 31657 31567 41466 14556 33516 1m5

18

34

5011631 15511 24141 43131 41612 41554 14464 5111! 41311 34111 2m1

61652 11210 11111 1m1 31651

I 64 64 64 191

191 191 192 516

48

19

345m3

3561236564

34172 42556 3m4

44664 44663 12155 34131 31664 45542 26211 04110 03111 21112

I

40

18

2901200 03112 23111 31524 15215 13656 51443 23555 06322 43141 61323 13341 04313 54434 26534 01222

Fig. 6: DCE Chuck Values

Support Vector Machine classifier [7,8-,9] is a kind of unsupervised Learning algorithm. It is a kind of learning technique where we have a set of input but not have a deterministic idea about the output. In the proposal need an unsupervised learning algorithm to determine whether the given input image is a flower or sports image from its feature n~tor as a percepts for this we are using the SVM classifier. A learning algorithm study its environment from its past .. ory of input output data sets. Such kind of data sets is said :0 be a training data sets. So here the prediction is depends :=ponthe hypothesis space or the function used. Here in SVM ;:;)Oregenerally called as an Kernel Machine uses an complex, :: nJinear function for effective hypothesis function[IO,II]. let (Xi,Yi) be the attributes in our training example As ume Xi belongs to a hypothesis space RD, Then Yj is : :.:'i not belong to RD or +I if it belong, So, Yi (WKi +b)

2: (}

(8)

';\ruch determine the hyperspace plane [9] .;';-om the above statement the main issues in SVM is to fmd =:ff' tive kernel function for separating different attributes :.:"j= training sets as per their feature. The optimal solution -- been derived in SVM using quadratic programming ~"':Q:.

-=-

Quadratic Programming

. :imization [13] can be done by Linear programming and _dratic Programming. If we have only Linear Constrain = ill go for Linear Programming concept: Ie: 2Xj + 3X2 +X3, in this simple equation by ...].'g the appropriate value for the constrain Xl, X2 and :' ~ i:aIl get an optimal solution.

Where in Quadratic Programming the equation would be Quadratic in nature. F 2xl +3)(; +X:; (9) To determine the value of (Xl, X2, X3) we require a kernel function say K K = H(Xl, X2, X3) (10) Where H is a hypothesis function. So Now to determine these values in quadratic programing:

=

F,;',in

=

;XT

HJU;TX

+ a

(11)

The SVM is mostly stated in matrix or in vector form. Here the value of H is an symmetric matrix of the given quadratic equation and C is the vector of the attribute used and a is a constant. So with reference to Quadratic Programming the in equation (8) for optimal solution the W should be minimizes as given by [15]: ;g(W) WTW

=;

As in equation (II) here the W is minimize Fig 7 [10] shows a SVM classification of two set of data class in Magenta and Blue color. Like the color dots the Mpeg 7 feature would be densely distributed and can be classified using the SVM algorithm.

We can use the MPEG 7 Chunk value as describe in Figure 5 is used as data set through which we can generate a complex quadratic equation as shown in Equation (13) through which by implementing the SVM quadratic programming we can optimize and classify the image. 25SD;+ 1751); ·2&5.D, +···.+Y14+

Cf'17+ Cb7+ 3~

r'

"-"l-

~_~n

V.

IMAGE CLASSIFICATION

Human perceptive of analyzing and classifying the images is not equivalent to that of a machine. So as we are in need of a generalized hypothesis concept for classification of images here we are considering only two domains of image collection. Namely Sports image collection and Flower image collection. In related work [14] the author would consider either one of domain major work has been done in sports image classification. As shown in Fig 8 the main component of this proposal is the MPEG7 feature extractor and the SVM classifier.

different groups of images such as flower image and sports image. Our major research work is on retrieval of image by giving both image and text as input. For which we use Multi-Modal Ontology searching technique where the Domain Ontology of selected domain say Flower and Sports image ontology is matched with its Feature ontology. For this a Preprocessing of general classification of two kind of image is required which is been proposed in this paper. In the forthcoming research we would try to integrate this Ontology for better Image Retrieval system.

[II

[2]

[3]

[4]

[5] [6]

[7] [8]

As per the figure the collection of images in local database is classified as either flower image or sport image using the technique as discussed in previous section. We just try this concept with simply giving a query flower image and show the relevant flower image is shown in Fig 9 which is an half way implementation of said proposal as some irrelevant image is also classified as flower image.

[9]

[10] [11]

[12]

[13] [14]

[15]

[16]

VI.

CONCLUSION

& furoRE

WORK

Image Retrieval is one of the ongoing research areas, the main reason for this evolution is that the amount of images in internet domain is increasing exponentially. So those techniques which had been proposed earlier cannot be implemented to this amount of data. In this paper we are proposing a work of automatic image classification of two

Ying Liu,Dengsheng Zhang,Guojun Lu and Wei-Ying Ma,"A survey of content hased image retrieval with high-level semantics", Elsevier Science Inc. Pattern Recognition,Vol 40,Issue I, pp 262-282, January 2007. Nidhi Singhai and Prof. Shishri K.Shandilya, "A Survey on:Content Based Image Retrieval Systems", Int. Journal of Computer Application,Vol4 - No 2,July 2010 Yihun Alemu,Jong bin Koh,Muhammed Ikram,Dong Kyoo Kim, "Image Retrieval in Multimedia database: A Survey",Fifth International Conf. on Intelligent Information Hiding and Multimedia Signal Processing, 2009 B.S.Manjunath and W.Y.Ma."Texture features for browsing and retrieval of image data",IEEE Transaction on Pattern Analysis and Machine Intelligence, voI.18,pp.837-842,Aug.1996 ISO/lEC JTCI/SC29/WGlIN6828 Palma de Mallorca, "MPEG-7 Overview (version 10)" October 2004 Ivica Dimitrovski,Suzana Loskovska,Gorgi Kakasevki and Ivan Chorbev"Video content Based Retrieval System"IEEE The international Conference on "Computer as a Tool" Pg.978 -983,Sep 07 Khalid Sayood- ' Introduction to Data Compression'Morgan Kaufmann Publishers- Second Edition- 2000 Lei Zhang,Fuzong Lin,Bo Zhang," Support Vector Machine Learning for Image Retreval",IEEE International conference 2001 Haiying Guna,Sameer Antani,L.Rodney Long and GeorgeR.Thoma," Bridging the semantic gap using ranking SVM for image retrieval",IEEE conference 2009 Patrik Ndjiki,Oleg Novychny and Thomas Wiegand "Merging MPEG 7 descriptors for image content analysis", IEEE conference 2004 Chengcui Zhang,Xin Chen et al,"A Multiple instance learning approach for content based image retrieval using One class support vector machine,IEEE conference 2005 Evaggelos Spyrou,Giorgos Stamou,Yannis Avrithis and Stefanos Kollias,"Fuzzy support vector machine for image Classification Fusion MPEG7 visual Descriptors",Acemedia.org 2005 Stuart Russell,Peter Norvig,"Artificial Intelligent A Modern Approach",Second Edition,2003 Huan Wang,Song Liu and Liang Tien china "Does ontology hep in imag retrieval? - A Comparision between keyword, Text ontology and multimodality ontology approach,MM'06,ACM,2006 Siddarth Jonathan JB et al ,"SQUINT SVM for identification of relevant sections in web page for web search": 2009 second international conference on intelligent computation technology and automation Evaggelos Spyrou,Herve Le Borgne,Thefilos Mailis, Eddie Cooke,Yannis Avrithis and Noel 0 Connor "Fusing MPEG·7 visual descriptors for image classification"Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005 (2005), pp. 847-852

"

R.I. Minu was born in India December 16, 1982. She recei ved her BE degree in Electronic & Communication Engineering from Bharathidasan University 2004. She received her ME degree in Computer Science Engineering from Anna University 2007.She is at present a PhD Scholar in Infonnation and Communication Engineering from Anna University of Technology, Trichy. Her research areas are Image retrieval, Artificial Intelligent, Machine Leaning and Semantic Web.

Dr. K.K. Thyagharajan has received his RE., degree in Electrical and Electronics Engineering from PSG College of Technology (Madras University). He received his M.E., degree in Applied Electronics from Coimbatore Institute of Technology and Post Graduate Diploma in Computer Applications from Bharathiar University. He has received his Ph.D., (Multimedia Streaming) degree in Information and Communication Engineering from College of Engineering Guindy, Anna University. He has written 5 books in Computing. His book "Flash MX 2004" published by McGraw Hill (INDIA) has been recommended as text / reference book by many universities. He has published more than 30 papers in National and International Journals and Conferences. He is a grant recipient of Tamil Nadu State Council for Science and Technology. His biography has been published in the 25th Anniversary Edition of Marquis Who's Who in the World Directory. He has been invited as chairperson and delivered special lectures in many National and International conferences and workshops. He is reviewer for many International Journals and Conferences. His current interests are Multimedia Networks, Mobile Computing, Web services, Data Mining, elearning, hnage Processing, Microprocessors and Microcontrollers. He has guided 10 M.E. projects and now 9 students are doing Ph.D. under him in the area of Multimedia, Image Processing and Data Mining. He is a life member of Computer Society of India and Chairman of the ISTE chapter of RMK College of Engineering and Technology.

CiiT, #16, 1st Floor, Sathyamoorthy Road, Ramnagar Coimbatore - 641009.

Coimbatore Institute of Information

Technology

Motto: Blinging the World Locally About CiiT:

Coimbatore

Institute

of Information

Technology, is a pioneering Software Research Institute in

Coimbatore. CiiT Research is dedicated to conducting both basic and applied research in Science and Engineering. Its goals are to enhance the user experience on computing devices, reduce the cost of writing and maintaining software, and invent novel computing teclmologies. CiiT Research also collaborates openly with colleges and universities worldwide to broadly advance the field of Science and Engineering. CiiT Software Research Institute's research team doing the best things for research organization. The institute has done several research projects in various subjects, including but not limited to. TInage processing. data mining, networking, mobile computing, software engineering, signal processing, etc. The Institute has developed TIltOa lmique entity among corporate research labs, balancing an open academic model with an effective process for transferring its research to product development teams. Today the world-renmmed scientists of CiiT Research make up one of the largest, fastest -growing and most highly respected software research organizations in the world one that will help defme and redefine the computing experience for millions of people for decades to come. Mission: Advance Software and Information Teclmology through Research Partnerships and Educating the ne:-..1generation of Software Researchers/Practitioners in Advanced Software Technologies.

• • • • • •

Advance the state of software development practices, enabling companies to succeed in the global lllmketplace Improve regional competitiveness through softwme workforce skill enhancement Expedite teclmology transition of research products into industrial practice and accelerate the rate of absorption by software practitioners of the best new software teclmologies Guide university research and state-of-the-art technology deYelopment Share and disseminate the best ideas among regional TIldustry leaders through active nehyorking and interaction Oil key contemporary software issues Quantitatively improve the state of the software engineering practice in terms of productiyity and oyerall software quality and dependability

CiiT is dedicated to serving the Software Technology cOlrnmmity through its events and activities. CiiT strives to enhance abilities to design, produce, assess, and maintain diverse types of software through cooperative problem-solving and sharing of information and ex-perience. CiiT Research International • CiiT International • CiiT International • CiiT International • CiiT International • CiiT International • CiiT International • CtiT International • CiiT International • CiiT International • CiiT International • CiiT International

Journals are Journal of Artificial Intelligent Systems and Machine Learning Journal of Automation and Autonomous System Journal of Biometrics and Bioinformatics Journal of Data .Mining and Knowledge Engineering Journal of Digital Signal Processing Journal of Fuzzy Systems Journal of Image Processing Journal of Networldng and Communication Engineering Journal of Programmable Device Circuits and Systems Journal of Software Engineering and Technology Journal of Wireless Communication

CiiT hrmgm::

lilt'

Published by Coimbatore Institute ofInformation Technology at #16, 1sl Floor, Sathyamoorthy Road, Ramnagar Coimbatore-641009 www.ciitresearch,organd printed by Mrs. Y. Dhanabagyam at CiiT Printing Dept, Coimbatore-12. Tel/Fax-(+91)422 4377821.

a'or/tllth

cJ/~)

and Website: