Research Publications

Research Publications -

Dr. Vishal Jain Associate Professor Bharati Vidyapeeth's Institute of ComputerApplications and Management (BVICAM), A-4, Paschim Vihar, New Delhi-63. Tel./Fax: 011-25275055. Mobile: 9899997263. Emails (Work): [email protected]; [email protected] Email (Personal): [email protected] http://vishaljain.webs.com/

Connect at: Google Group: https://groups.google.com/forum/#!forum/drvishaljain Google Scholar: https://scholar.google.co.in/citations?user=6eeU1BYAAAAJ&hl=en ORCID: http://orcid.org/0000-0003-1126-7424 ResearcherID: http://www.researcherid.com/rid/E-4675-2017 Linkedin: https://in.linkedin.com/in/vishaljain83

INDEX S. No. 1.

Particulars CV

Page No. 01

Publications (International Journal) 2.

3.

4.

5.

6.

7.

Khaleel Ahmad, Muneera Fathima, Vishal Jain, Afrah Fathima, “FUZZYProphet: A Novel Routing Protocol for Opportunistic Network”, International Journal of Information Technology (IJIT), Vol. 9 No. 2, Issue 18, June, 2017, page no. 121-127 having ISSN No. 2511-2104, published by the Springer and indexed at INSPEC, IET (UK), ProQuest (UK) , EBSCO (USA), Open J-Gate (USA), DOAJ. [Journal is Listed at Sr. No. 49273 in UGC Approved List of Journals] Ishleen Kaur, Gagandeep Singh Narula, Vishal Jain, “Differential Analysis of Token Metric and Object Oriented Metrics for Fault Prediction”, International Journal of Information Technology (IJIT), Vol. 9, No. 1, Issue 17, March, 2017, page no. 93-100 having ISSN No. 2511-2104, published by the Springer and indexed at INSPEC, IET (UK), ProQuest (UK) , EBSCO (USA), Open J-Gate (USA), DOAJ. [Journal is Listed at Sr. No. 49273 in UGC Approved List of Journals] Dipti Mishra, Mohamed Hashim Minver, Bhagwan Das, Nisha Pandey and Vishal Jain, “An Efficient Face Detection and Recognition for Video Surveillance”, Indian Journal of Science and Technology, Volume 9, Issue 48, December 2016, page no. 1-10 having ISSN No. 0974-6846 and indexed at Thomson Reuters, Web of Science, Scopus, EBSCO, Index Copernicus, DOAJ, J-Gate and SJR is: 1.3. [Journal is Listed at Sr. No. 24 in UGC Approved List of Journals] Anirudh Khanna, Bhagwan Das, Bishwajeet Pandey, DMA Hussain, and Vishal Jain, “A Discussion about Upgrading the Quick Script Platform to Create Natural Language based IoT Systems”, Indian Journal of Science and Technology, Volume 9, Issue 46, December 2016, page no. 1-4 having ISSN No. 0974-6846 and indexed at Thomson Reuters, Web of Science, Scopus, EBSCO, Index Copernicus, DOAJ, J-Gate and SJR is: 1.3. [Journal is Listed at Sr. No. 24 in UGC - Approved List of Journals] Gagandeep Singh Narula, Usha Yadav, Neelam Duhan and Vishal Jain, “Lexical, Ontological & Conceptual Framework of Semantic Search Engine (LOC-SSE)”, BIJIT - BVICAM’s International Journal of Information Technology, Issue 16, Vol.8 No.2, July - December, 2016 having ISSN No. 0973-5658 and indexed at INSPEC, IET (UK), ProQuest (UK) , EBSCO (USA), Open J-Gate (USA), DOAJ. [Journal is Listed at Sr. No. 9769 in UGC - Approved List of Journals] Uttam Singh Bist, Manish Kumar, Anupam Baliyan, Vishal Jain, “Decision based Cognitive Learning using Strategic Game Theory”, Indian Journal of Science and Technology, Volume 9, Issue 39, October 2016, page no. 1-7 having ISSN No. 0974-6846 and indexed at Thomson Reuters, Web of Science, Scopus, EBSCO, Index Copernicus, DOAJ, J-Gate and SJR is: 1.3.

16

23

31

41

45

52

S. No.

Particulars

Page No.

[Journal is Listed at Sr. No. 24 in UGC - Approved List of Journals] 8.

9.

10.

11.

12.

13.

14.

Gagandeep Singh Narula, Dr. Vishal Jain, Dr. S. V. A. V. Prasad, “Use of Ontology to Secure the Cloud: A Case Study”, International Journal of Innovative Research and Advanced Studies (IJIRAS), Vol. 3 No. 8, July 2016, page no. 148 to 151 having ISSN No. 2394-4404. [Journal is Listed at Sr. No. 45791 in UGC - Approved List of Journals] Vishal Jain and Dr. S. V. A. V. Prasad, “Evaluation and Validation of Ontology Using Protégé Tool”, International Journal of Research in Engineering & Technology, Vol. 4, No. 5, May, 2016, page no. 1-12 having ISSN No. 2321-8843, indexed at: Thomson Reuters' RESEARCHERID, ORCiD, Scribd, Mendeley, Epernicus, Google Scholar, Index Copernicus, getCITED, Issuu, Academia.edu, Research Bib, Internet Archive, Publication lists, OAJI, SSRN, DRJI. Usha Yadav, Gagandeep Singh Narula, Neelam Duhan, Vishal Jain, “Ontology Engineering and Development Aspects: A Survey”, International Journal of Education and Management Engineering (IJEME), Hongkong, Vol. 6, No. 3, May 2016, page no. 9 – 19 having ISSN No. 2305-3623 and indexed at Google Scholar, CrossRef, Scirus and CNKI Scholar. Usha Yadav , Gagandeep Singh Narula, Neelam Duhan , Vishal Jain , B. K. Murthy, “Development and Visualization of Domain Specific Ontology using Protege “,Indian Journal of Science and Technology, Vol. 9, No. 16, April, 2016, page no. 1-7 having ISSN No. 0974-6846 and indexed at Thomson Reuters, Web of Science, Scopus, EBSCO, Index Copernicus, DOAJ, J-Gate and SJR is: 1.3. [Journal is Listed at Sr. No. 24 in UGC Approved List of Journals] Prachi Dewal, Gagandeep Singh Narula and Vishal Jain, “A Survey of Intrusion Detection Systems and Secure Routing Protocols in Wireless Sensor Networks”, International Journal For Research in Emerging Science and Technology, Vol. 3, No. 1, January, 2016, page no. 16 - 20 having ISSN No. 2349-7610 and index with index with EBSCO, ISSUU, DRJI, Index Copernicus, ICV 2014: 48.63, Standardized Value: 5.35 and impact factor is: 2.173. [Journal is Listed at Sr. No. 46122 in UGC - Approved List of Journals] Narinder K. Seera, Vishal Jain, “ Perspective of Database Services for Managing Large-Scale Data on the Cloud: A Comparative Study”, International Journal of Modern Education and Computer Science (IJMECS), Vol.7, No. 6, June, 2015 having ISSN No. 2075-017X, and index with EBSCO, ISSUU, DRJI, Index Copernicus and impact factor is: 0.13. [Journal is Listed at Sr. No. 45572 in UGC - Approved List of Journals] Vishal Jain and Dr. S. V. A. V. Prasad, “Mapping between RDBMS and Ontology: A Review”, International Journal of Scientific & Technology Research (IJSTR), France, Vol. 3, No. 11, November, 2014 having ISSN No. 2277-8616, and index with EBSCO, ISSUU, DRJI, Index Copernicus and impact factor is: 0.67. [Journal is Listed at Sr. No. 48175 in UGC -

59

63

75

86

93

99

108

S. No.

Particulars

Page No.

Vishal Jain and Dr. S. V. A. V. Prasad, “Analysis of RDBMS and Semantic Web Search in University System”, International Journal of Engineering Sciences & Emerging Technologies (IJESET), Volume 7, Issue 2, October 2014, page no. 604-621 having ISSN No. 2231-6604 and index with DOAJ, JGate and impact factor is 1.02. Vishal Jain and Dr. S. V. A. V. Prasad, “Mining in Ontology With Multi Agent System in Semantic Web : A Novel Approach”, The International Journal of Multimedia & Its Applications (IJMA) Vol.6, No.5, October 2014, page no. 45 to 54 having ISSN No. 0975-5578 listed in the Australian Research Council (ARC) Journal Ranking and index with EBSCO, DOAJ, JGate, Proquest. Vishal Jain, Dr. S. V. A. V. Prasad, “Ontology Based Information Retrieval Model in Semantic Web: A Review”, International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), Volume 4, Issue 8, August 2014, page no. 837 to 842 having ISSN No. 2277128X, and index with EBSCO, ISSUU, DRJI, Index Copernicus with ISRA Journal Impact Factor: 2.080. [Journal is Listed at Sr. No. 48958 in UGC Approved List of Journals] Vishal Jain, Dr. S. V. A. V. Prasad, “Role of Ontology with Multi-Agent System in Cloud Computing”, International Journal of Sciences: Basic and Applied Research (IJSBAR), Jordan, Volume 15, No. 2, page no. 41 - 46, having ISSN No. 2307-4531 and indexed at Ulrich, Google Scholar, Directory of Open Access Journals (DOAJ) , Ulrich's Periodicals Directory, Microsoft academic research, with Impact Factor 0.33. Ashutosh Gupta, Bhoopesh Bhati and Vishal Jain, "Artificial Intrusion Detection Techniques: A Survey", International Journal of Computer Network and Information Security (IJCNIS), Hongkong, Vol. 6, No. 9, September 2014, having ISSN No. 2074-9104, and index with Thomson Reuters, Web of Science, EBSCO, Proquest, DOAJ, Index Copernicus. [Journal is Listed at Sr. No. 47547 in UGC - Approved List of Journals] Basant Ali Sayed Alia, Abeer Badr El Din Ahmedb, Alaa El Din Muhammad,El Ghazalic and Vishal Jain, "Incremental Learning Approach for Enhancing the Performance of Multi-Layer Perceptron for Determining the Stock Trend", International Journal of Sciences: Basic and Applied Research (IJSBAR), Jordan, page no. 15 to 23, having ISSN No. ISSN 2307-4531. Vishal Jain, Gagandeep Singh Narula, "Improving Statistical Multimedia Information Retrieval (MIR) Model by using Ontology and Various Information Retrieval (IR) Approaches", International Journal of Computer Applications 94(2):27-30, May 2014 having ISSN No. 0975-8887. Published by Foundation of Computer Science, New York, USA. [Journal is Listed at Sr. No. 44570 in UGC - Approved List of Journals] Vishal Jain, “A Brief Overview on Information Retrieval in Semantic

115

Approved List of Journals] 15.

16.

17.

18.

19.

20.

21.

22.

133

143

149

155

162

171

182

S. No.

23.

24.

25.

26.

27.

28.

29.

Particulars Web”, International Journal of Computer Application, RS Publication, Issue 4, Volume 2 (March - April 2014), page no. 86 to 91, having ISSN No. 22501797, and International Impact Factor: 2.52 (Included in master list 2013) and ICV is 6.1. Vishal Jain, Dr. Mayank Singh, “A Framework to convert Relational Database to Ontology for Knowledge Database in Semantic Web”, “International Journal of Scientific & Technology Research (IJSTR), France, Vol. 2, No. 10, October 2013, page no. 9 to 12 , having ISSN No. 2277-8616, and index with EBSCO, ISSUU, DRJI, Index Copernicus with impact factor 0.675. [Journal is Listed at Sr. No. 48175 in UGC - Approved List of Journals] Vishal Jain, Dr. Mayank Singh, “Ontology Development and Query Retrieval using Protégé Tool”, International Journal of Intelligent Systems and Applications (IJISA), Hongkong, Vol. 5, No. 9, August 2013, page no. 67-75, having ISSN No. 2074-9058, DOI: 10.5815/ijisa.2013.09.08 and index with Thomson Reuters (Web of Science), EBSCO, Proquest, DOAJ, Index Copernicus. [Journal is Listed at Sr. No. 45588 in UGC - Approved List of Journals] Vishal Jain, Dr. Mayank Singh, “Ontology Based Information Retrieval in Semantic Web: A Survey”, International Journal of Information Technology and Computer Science (IJITCS), Hongkong, Vol. 5, No. 10, September 2013, page no. 62-69, having ISSN No. 2074-9015, DOI: 10.5815/ijitcs.2013.10.06 and index with Thomson Reuters, Web of Science, EBSCO, Proquest, DOAJ, Index Copernicus. [Journal is Listed at Sr. No. 49100 in UGC - Approved List of Journals] Vishal Jain, Dr. Mayank Singh, “Architecture Model for Communication between Multi Agent Systems with Ontology”, International Journal of Advanced Research in Computer Science (IJARCS), Volume.4 No.8, May-June 2013, page no. 86-91 with ISSN No. 0976 – 5697 and indexed at EBSCO HOST Index Copernicus, DOAJ, ICV value is 5.47. [Journal is Listed at Sr. No. 2503 in UGC - Approved List of Journals] Gagandeep Singh, Vishal Jain, Dr. Mayank Singh, “ An Approach For Information Extraction using Jade: A Case Study”, Journal of Global Research in Computer Science (JGRCS), Vol.4 No. 4 April, 2013, page no. 186-191, having ISSN No. 2229-371X , with impact factor (2012) 0.60. [Journal is Listed at Sr. No. 48221 in UGC - Approved List of Journals] Vishal Jain, Gagandeep Singh, Dr. Mayank Singh, “Implementation of Multi Agent Systems with Ontology in Data Mining”, International Journal of Research in Computer Application and Management (IJRCM) May, 2013 page no. 108-114 having ISSN No. 2231 – 1009 and index with Index Copernicus, DOAJ, J-Gate, Ulrich’s, EBSCO, Poland IC Value is 5.09. [Journal is Listed at Sr. No. 44341 in UGC - Approved List of Journals] Vishal Jain, Gagandeep Singh, Dr. Mayank Singh, “Implementation of Data Mining in Online Shopping System using TANAGRA Tool”, International Journal for Computer Science Engineering (IJCSE), USA, January 2013 page

Page No.

189

193

202

210

216

222

229

S. No.

Particulars

Page No.

no. 47-58 having ISSN No. 2278-9979, with impact factor (2012) 2.91. 30.

31.

32.

33.

34.

35.

36.

Vishal Jain, Mahesh Kumar Madan, “Implementation of Knowledge Mining with Ontology”, International Journal of Computer Science & Engineering Technology (IJCSET), Vol. 3 No. 7, July 2012, page no. 251-253, having ISSN 2229-3345 and index with Index Copernicus, DOAJ, J-Gate. [Journal is Listed at Sr. No. 47314 in UGC - Approved List of Journals] Vishal Jain, Mahesh Kumar Madan, “Information Retrieval through MultiAgent System with Data Mining in Cloud Computing”, International Journal of Computer Technology and Applications (IJCTA) Volume 3 Issue 1, JanuaryFebruary 2012, page no. 62-66, having ISSN 2229-6093 and index with Scopus, Index Copernicus, DOAJ, J-Gate, Ulrich’s, EBSCO and ICV value is: 5.17. [Journal is Listed at Sr. No. 43388 in UGC - Approved List of Journals] Vishal Jain, Mahesh Kumar Madan, “Multi Agent Driven Data Mining For Knowledge Discovery in Cloud Computing”, International Journal of Computer Science & Information Technology Research Excellence Vol. 2, Issue 1, Jan-Feb 2012, page no. 65-69, having ISSN 2250-2734. [Journal is Listed at Sr. No. 48079 in UGC - Approved List of Journals] Publications (International Conference) Prachi Dewal, Gagandeep Singh Narula and Vishal Jain, “Detection and Prevention of Black Hole Attacks in Cluster based Wireless Sensor Networks”, 10th INDIACom; INDIACom-2016, 3rd 2016 International Conference on “Computing for Sustainable Global Development”, 16th – 18th March, 2016, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA) ; indexed at Scopus, DBLP through IEEE Xplore. Usha Yadav, Gagandeep Singh Narula, Neelam Duhan and Vishal Jain, “A Novel Approach for Precise Search Results Retrieval based on Semantic Web Technologies”, 10th INDIACom; INDIACom-2016, 3rd 2016 International Conference on “Computing for Sustainable Global Development”, 16th – 18th March, 2016, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA); indexed at Scopus, DBLP through IEEE Xplore. Vishal Assija, Anupam Baliyan and Vishal Jain, “Effective & Efficient Digital Advertisement Algorithms”, CSI-2015; 50th Golden Jubilee Annual Convention on “Digital Life”, held on 02nd to 05th December, 2015 at New Delhi, proceeding to be published by Springer under ASIC series indexed at Scopus and DBLP. Gagandeep Singh Narula, Usha Yadav, Neelam Duhan and Vishal Jain, “Evolution of FOAF and SIOC in Semantic Web: A Survey”, CSI-2015; 50th Golden Jubilee Annual Convention on “Digital Life”, held on 02nd to 05th December, 2015 at New Delhi, proceeding to be published by Springer under ASIC series indexed at Scopus and DBLP.

241

244

249

254

259

265

274

S. No. 37.

38.

39.

40.

41.

42.

43.

Particulars

Page No.

Usha Yadav, B K Murthy, Gagandeep Singh Narula, Neelam Duhan and Vishal Jain, “EasyOnto: A Collaborative Semi Formal Ontology Development Platform”, CSI-2015; 50th Golden Jubilee Annual Convention on “Digital Life”, held on 02nd to 05th December, 2015 at New Delhi, proceeding to be published by Springer under ASIC series indexed at Scopus and DBLP. Vishal Jain, Gagandeep Singh, Dr. Mayank Singh, “Ontology Development Using Hozo and Semantic Analysis for Information Retrieval in Semantic Web”, 2013 IEEE Second International Conference on Image Information Processing (ICIIP -2013) held on December 9 - 11, 2013,organized by Jaypee University of Information Technology, Waknaghat, Shimla, Himachal Pradesh, INDIA and proceeding published by the IEEE; indexed at Scopus, DBLP through IEEE Xplore. Vishal Jain, Dr. Mayank Singh, “Ontology Based Pivoted Normalization using Vector – Based Approach for Information Retrieval”, IEEE CoSponsored 7th International Conference on Advanced Computing and Communication Technologies (ICACCT), In association with INDERSCIENCE Publishers, UK, IETE and Technically Co-sponsored by Computer Society Chapter IEEE Delhi Section, held on 16th November, 2013, organized by Asia Pacific Institute of Information Technology SD India, Panipat, India and proceeding published by Inderscience Publishers. Vishal Jain, Dr. Mayank Singh, “Ontology Based Web Crawler to Search Documents in the Semantic Web”, “Wilkes100 - Second International Conference on Computing Sciences”, in association with International Neural Network Society and Advanced Computing Research Society, held on 15th and 16th November, 2013 organized by Lovely Professional University, Phagwara, Punjab, India and proceeding published by Elsevier Science. Gagandeep Singh, Vishal Jain, “Information Retrieval through Semantic Web: An Overview”, Confluence 2012, held on 27th and 28th September, 2012 page no.114-118, at Amity School of Engineering & Technology, Amity University, Noida. Publications (National Conference )

285

Vishal Jain, Gagandeep Singh and Dr. Mayank Singh, “Comparative Study of Search Engine and Semantic Search Engine: A Survey”, NCACT-2013, held on 30th March, 2013 page no. 57-61 at Department of Computer Science & Applications, M.D. University, Rohtak. Vishal Jain, Mahesh Kumar Madan, “Cloud Computing In Trust Building Knowledge Discovery for Information Retrieval”, CTNGC 2012, held on 20th October, 2012 page no.30-32, at IT Department, Institute of Technology & Science (ITS), Mohan Nagar, Ghaziabad, proceeding published by International Journal of Computer Applications (IJCA), USA with impact factor (2011) 0.81.

296

302

308

316

321

330

DR. VISHAL JAIN 220/19 ONKAR NAGAR‐B TRI NAGAR DELHI – 110035

+91‐9899997263 http://vishaljain.webs.com [email protected] CURRICULUM VITAE

Career Goal To excel in the field of IT by converting my innovative ideas and acquired skills into executable education values in a highly cordial and professional environment for continuous learning and improvement. I am also willing and eager to learn new technologies to enhance my knowledge. Educational Qualification  Ph.D (Ontology Based Information Retrieval in Semantic Web) from Department of Computer Science and Engineering (CSE), Lingaya’s University, Faridabad under the guidance of Dr. S. V. A. V. Prasad, Dean, Lingaya’s University, Faridabad.  M.Tech (CSE) from University School of Information Technology, G.G.S. Indraprastha University, Delhi  MBA (HR) from Shobhit University, Meerut, U.P.  MCA from Apar India College of Management and Technology, Delhi, affiliated to Sikkim Manipal University  DOEACC “A” Level from A. M. Informatics, New Delhi  DOEACC “O” Level from A. M. Informatics, New Delhi Additional Qualification  Post Graduate Diploma in Computer Software Training from A. M. Informatics, New Delhi  Advance Diploma in Computer Software Training from ET&T, New Delhi  Diploma in Business Management from All India Institute of Management Studies, Chennai  Diploma in Programming from Oxford Computer Education, New Delhi Professional Qualification (Certifications)  Microsoft Certified Professional Cleared Two Modules 070‐210 , 070‐215 (MCP)

 Cisco Certified Network Administrator (CCNA) 1/15

1

Professional Membership  CSI Delhi State Students’ Coordinator, Computer Society of India (2014‐2016)  Life Membership of “Computer Society of India” (CSI)  Life Membership of “Indian Society for Technical Education” (ISTE)  Membership of “International Association of Engineers” (IAENG)  Membership of “International Association of Computer Science and Information Technology” (IACSIT)  Membership of “Computer Science Teachers Association” (CSTA) Experience Gained  Presently working as Associate Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi, Affiliated to G.G.S.I.P. University and Approved by AICTE, since July, 2017.  Worked as Assistant Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi, Affiliated to G.G.S.I.P. University and Approved by AICTE, from August, 2010 to June, 2017.  Worked in Guru Premsukh Memorial College of Engineering (GPMCE), Delhi, Affiliated to G.G.S.I.P. University and Approved by AICTE, from July 2004 to July 2008. Responsibilities at BVICAM 1. Teaching Subjects – FIT, OOPS (C++), DCN, ACN along with Practical. 2. Guide to MCA students for their Final Semester project. 3. Institute’s Activity Report Compilation for AICTE, DTTE, GGSIPU, BVCO 4. CSI and ISTE Students’ Branch Activities 5. Organize FDP, Training Programmes and Social Development Programmes 6. Publicity Chair and Special Session Coordinator, INDIACom; IEEE International Conference on “Computing for Sustainable Global Development”, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA) 7. Assistant Editor in “BIJIT ‐ BVICAM's International Journal of Information Technology” having ISSN 0973‐5658, is being published by the Springer and indexed at Index Copernicus, EBSCO, J‐Gate, Cabell’s Directory of Computer Science and Business Information System, USA and Directory of Open Access Journals (DOAJ), Sweden. 2/15

2

8. Training and Placement coordinator for BVICAM, New Delhi (August, 2010 to July, 2015) Computer Literacy Operating System : ‐ Windows (Client and Server) Languages Known : ‐ C, C++, JAVA, C#, VB.Net RDBMS : ‐ FoxPro, SQL Server, Protégé Tools : ‐ MS‐Office, .Net Framework, Packet Tracer, Boson Net Sim Web Development : ‐ HTML, Jena Framework, Pellet Reasoner Citations Indices Research Gate Google Scholar Citations (RG Score) Citations 184 h‐index 07 9.30 i10‐index 05 ORCID ID: 0000‐0003‐1126‐7424 Thomson Reuters Researcher ID: E‐4675‐2017 Scopus ID: 57192707657 Publications Books 1. Vishal Jain, Ashish Khanna, Deepak Gupta, “Success Mantra for IT Interview”, Bhavya Books Publication, having ISSN No. 978‐81‐927946‐5‐5 , 354 pages, year 2013. International Journal – 33 1. Sujeet Pandey , Puneet Tomar , Lubna Luxmi Dhirani , D. M. Akbar Hussain , Vishal Jain , Nisha Pandey, “Design of Energy Efficient Sinusoidal PWM Waveform Generator on FPGA”, International Journal of Signal Processing, Image Processing and Pattern Recognition (IJSIP), Vol. 10 No. 10, October, 2017, page no. 49‐58 having ISSN No. 2005‐4254 indexed at Scopus, INSPEC, IET (UK), ProQuest (UK) , EBSCO (USA), Open J‐Gate (USA). 2. Khaleel Ahmad, Muneera Fathima, Vishal Jain, Afrah Fathima, “FUZZY‐Prophet: A Novel Routing Protocol for Opportunistic Network”, International Journal of Information Technology (IJIT), Vol. 9 No. 2, Issue 18, June, 2017, page no. 121‐ 127 having ISSN No. 2511‐2104, published by the Springer and indexed at INSPEC, IET (UK), ProQuest (UK) , EBSCO (USA), Open J‐Gate (USA), DOAJ. [Journal is Listed at Sr. No. 49273 in UGC ‐ Approved List of Journals] 3/15

3

3. Ishleen Kaur, Gagandeep Singh Narula, Vishal Jain, “Differential Analysis of Token Metric and Object Oriented Metrics for Fault Prediction”, International Journal of Information Technology (IJIT), Vol. 9, No. 1, Issue 17, March, 2017, page no. 93‐100 having ISSN No. 2511‐2104, published by the Springer and indexed at INSPEC, IET (UK), ProQuest (UK) , EBSCO (USA), Open J‐Gate (USA), DOAJ. [Journal is Listed at Sr. No. 49273 in UGC ‐ Approved List of Journals] 4. Dipti Mishra, Mohamed Hashim Minver, Bhagwan Das, Nisha Pandey and Vishal Jain, “An Efficient Face Detection and Recognition for Video Surveillance”, Indian Journal of Science and Technology, Volume 9, Issue 48, December 2016, page no. 1‐10 having ISSN No. 0974‐6846 and indexed at Thomson Reuters, Web of Science, Scopus, EBSCO, Index Copernicus, DOAJ, J‐Gate and SJR is: 1.3. [Journal is Listed at Sr. No. 24 in UGC ‐ Approved List of Journals] 5. Gaurav Verma, Harsh Agarwal, Shreya Singh, Shaheem Nighat Khinam, Prateek Kumar Gupta and Vishal Jain, “ Design and Implementation of Router for NOC on FPGA”, International Journal of Future Generation Communication and Networking (IJFGCN), Vol. 9, No. 12, December 2016 page no. 263 – 272 having ISSNo. 2233‐7857 and indexed at Thomson Reuters Emerging Sources Citation Index (ESCI), EI Compendex, ProQuest, ULRICH, DOAJ, J‐Gate, and Cabell Directory. [Journal is Listed at Sr. No. 22814 in UGC ‐ Approved List of Journals] 6. Anirudh Khanna, Bhagwan Das, Bishwajeet Pandey, DMA Hussain, and Vishal Jain, “A Discussion about Upgrading the Quick Script Platform to Create Natural Language based IoT Systems”, Indian Journal of Science and Technology, Volume 9, Issue 46, December 2016, page no. 1‐4 having ISSN No. 0974‐6846 and indexed at Thomson Reuters, Web of Science, Scopus, EBSCO, Index Copernicus, DOAJ, J‐Gate and SJR is: 1.3. [Journal is Listed at Sr. No. 24 in UGC ‐ Approved List of Journals] 7. Gagandeep Singh Narula, Usha Yadav, Neelam Duhan and Vishal Jain, “Lexical, Ontological & Conceptual Framework of Semantic Search Engine (LOC‐SSE)”, BIJIT ‐ BVICAM’s International Journal of Information Technology, Issue 16, Vol.8 No.2, July ‐ December, 2016 having ISSN No. 0973‐5658 and indexed at INSPEC, IET (UK), ProQuest (UK) , EBSCO (USA), Open J‐Gate (USA), DOAJ. [Journal is Listed at Sr. No. 9769 in UGC ‐ Approved List of Journals] 8. Uttam Singh Bist, Manish Kumar, Anupam Baliyan, Vishal Jain, “Decision based Cognitive Learning using Strategic Game Theory”, Indian Journal of Science and Technology, Volume 9, Issue 39, October 2016, page no. 1‐7 having ISSN No. 0974‐6846 and indexed at Thomson Reuters, Web of Science, Scopus, EBSCO, Index Copernicus, DOAJ, J‐Gate and SJR is: 1.3. [Journal is Listed at Sr. No. 24 in UGC ‐ Approved List of Journals] 9. Gagandeep Singh Narula, Dr. Vishal Jain, Dr. S. V. A. V. Prasad, “Use of Ontology to Secure the Cloud: A Case Study”, International Journal of Innovative Research and Advanced Studies (IJIRAS), Vol. 3 No. 8, July 2016, page no. 148 to 151 having ISSN No. 2394‐4404. [Journal is Listed at Sr. No. 45791 in UGC ‐ Approved List of Journals] 4/15

4

10. Vishal Jain and Dr. S. V. A. V. Prasad, “Evaluation and Validation of Ontology Using Protégé Tool”, International Journal of Research in Engineering & Technology, Vol. 4, No. 5, May, 2016, page no. 1‐12 having ISSN No. 2321‐8843, indexed at: Thomson Reuters' RESEARCHERID, ORCiD, Scribd, Mendeley, Epernicus, Google Scholar, Index Copernicus, getCITED, Issuu, Academia.edu, Research Bib, Internet Archive, Publication lists, OAJI, SSRN, DRJI. 11. Usha Yadav, Gagandeep Singh Narula, Neelam Duhan, Vishal Jain, “Ontology Engineering and Development Aspects: A Survey”, International Journal of Education and Management Engineering (IJEME), Hongkong, Vol. 6, No. 3, May 2016, page no. 9 – 19 having ISSN No. 2305‐3623 and indexed at Google Scholar, CrossRef, Scirus and CNKI Scholar. 12. Usha Yadav , Gagandeep Singh Narula, Neelam Duhan , Vishal Jain , B. K. Murthy, “Development and Visualization of Domain Specific Ontology using Protege “,Indian Journal of Science and Technology, Vol. 9, No. 16, April, 2016, page no. 1‐7 having ISSN No. 0974‐6846 and indexed at Thomson Reuters, Web of Science, Scopus, EBSCO, Index Copernicus, DOAJ, J‐Gate and SJR is: 1.3. [Journal is Listed at Sr. No. 24 in UGC ‐ Approved List of Journals] 13. Prachi Dewal, Gagandeep Singh Narula and Vishal Jain, “A Survey of Intrusion Detection Systems and Secure Routing Protocols in Wireless Sensor Networks”, International Journal For Research in Emerging Science and Technology, Vol. 3, No. 1, January, 2016, page no. 16 ‐ 20 having ISSN No. 2349‐7610 and index with index with EBSCO, ISSUU, DRJI, Index Copernicus, ICV 2014: 48.63, Standardized Value: 5.35 and impact factor is: 2.173. [Journal is Listed at Sr. No. 46122 in UGC ‐ Approved List of Journals] 14. Narinder K. Seera, Vishal Jain, “ Perspective of Database Services for Managing Large‐Scale Data on the Cloud: A Comparative Study”, International Journal of Modern Education and Computer Science (IJMECS), Vol.7, No. 6, June, 2015 having ISSN No. 2075‐017X, and index with EBSCO, ISSUU, DRJI, Index Copernicus and impact factor is: 0.13. [Journal is Listed at Sr. No. 45572 in UGC ‐ Approved List of Journals] 15. Vishal Jain and Dr. S. V. A. V. Prasad, “Mapping between RDBMS and Ontology: A Review”, International Journal of Scientific & Technology Research (IJSTR), France, Vol. 3, No. 11, November, 2014 having ISSN No. 2277‐8616, and index with EBSCO, ISSUU, DRJI, Index Copernicus and impact factor is: 0.67. [Journal is Listed at Sr. No. 48175 in UGC ‐ Approved List of Journals] 16. Vishal Jain and Dr. S. V. A. V. Prasad, “Analysis of RDBMS and Semantic Web Search in University System”, International Journal of Engineering Sciences & Emerging Technologies (IJESET), Volume 7, Issue 2, October 2014, page no. 604‐ 621 having ISSN No. 2231‐6604 and index with DOAJ, J‐Gate and impact factor is 1.02. 17. Vishal Jain and Dr. S. V. A. V. Prasad, “Mining in Ontology With Multi Agent System in Semantic Web : A Novel Approach”, The International Journal of Multimedia & Its Applications (IJMA) Vol.6, No.5, October 2014, page no. 45 to 5/15

5

54 having ISSN No. 0975‐5578 listed in the Australian Research Council (ARC) Journal Ranking and index with EBSCO, DOAJ, J‐Gate, Proquest. 18. Vishal Jain, Dr. S. V. A. V. Prasad, “Ontology Based Information Retrieval Model in Semantic Web: A Review”, International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), Volume 4, Issue 8, August 2014, page no. 837 to 842 having ISSN No. 2277‐ 128X, and index with EBSCO, ISSUU, DRJI, Index Copernicus with ISRA Journal Impact Factor: 2.080. [Journal is Listed at Sr. No. 48958 in UGC ‐ Approved List of Journals] 19. Vishal Jain, Dr. S. V. A. V. Prasad, “Role of Ontology with Multi‐Agent System in Cloud Computing”, International Journal of Sciences: Basic and Applied Research (IJSBAR), Jordan, Volume 15, No. 2, page no. 41 ‐ 46, having ISSN No. 2307‐4531 and indexed at Ulrich, Google Scholar, Directory of Open Access Journals (DOAJ) , Ulrich's Periodicals Directory, Microsoft academic research, with Impact Factor 0.33. 20. Ashutosh Gupta, Bhoopesh Bhati and Vishal Jain, "Artificial Intrusion Detection Techniques: A Survey", International Journal of Computer Network and Information Security (IJCNIS), Hongkong, Vol. 6, No. 9, September 2014, having ISSN No. 2074‐9104, and index with Thomson Reuters, Web of Science, EBSCO, Proquest, DOAJ, Index Copernicus. [Journal is Listed at Sr. No. 47547 in UGC ‐ Approved List of Journals] 21. Basant Ali Sayed Alia, Abeer Badr El Din Ahmedb, Alaa El Din Muhammad,El Ghazalic and Vishal Jain, "Incremental Learning Approach for Enhancing the Performance of Multi‐Layer Perceptron for Determining the Stock Trend", International Journal of Sciences: Basic and Applied Research (IJSBAR), Jordan, page no. 15 to 23, having ISSN No. ISSN 2307‐4531. 22. Vishal Jain, Gagandeep Singh Narula, "Improving Statistical Multimedia Information Retrieval (MIR) Model by using Ontology and Various Information Retrieval (IR) Approaches", International Journal of Computer Applications 94(2):27‐30, May 2014 having ISSN No. 0975‐8887. Published by Foundation of Computer Science, New York, USA. [Journal is Listed at Sr. No. 44570 in UGC ‐ Approved List of Journals] 23. Vishal Jain, “A Brief Overview on Information Retrieval in Semantic Web”, International Journal of Computer Application, RS Publication, Issue 4, Volume 2 (March ‐ April 2014), page no. 86 to 91, having ISSN No. 2250‐1797, and International Impact Factor: 2.52 (Included in master list 2013) and ICV is 6.1. 24. Vishal Jain, Dr. Mayank Singh, “A Framework to convert Relational Database to Ontology for Knowledge Database in Semantic Web”, “International Journal of Scientific & Technology Research (IJSTR), France, Vol. 2, No. 10, October 2013, page no. 9 to 12 , having ISSN No. 2277‐8616, and index with EBSCO, ISSUU, DRJI, Index Copernicus with impact factor 0.675. [Journal is Listed at Sr. No. 48175 in UGC ‐ Approved List of Journals]

6/15

6

25. Vishal Jain, Dr. Mayank Singh, “Ontology Development and Query Retrieval using Protégé Tool”, International Journal of Intelligent Systems and Applications (IJISA), Hongkong, Vol. 5, No. 9, August 2013, page no. 67‐75, having ISSN No. 2074‐9058, DOI: 10.5815/ijisa.2013.09.08 and index with Thomson Reuters (Web of Science), EBSCO, Proquest, DOAJ, Index Copernicus. [Journal is Listed at Sr. No. 45588 in UGC ‐ Approved List of Journals] 26. Vishal Jain, Dr. Mayank Singh, “Ontology Based Information Retrieval in Semantic Web: A Survey”, International Journal of Information Technology and Computer Science (IJITCS), Hongkong, Vol. 5, No. 10, September 2013, page no. 62‐69, having ISSN No. 2074‐9015, DOI: 10.5815/ijitcs.2013.10.06 and index with Thomson Reuters, Web of Science, EBSCO, Proquest, DOAJ, Index Copernicus. [Journal is Listed at Sr. No. 49100 in UGC ‐ Approved List of Journals] 27. Vishal Jain, Dr. Mayank Singh, “Architecture Model for Communication between Multi Agent Systems with Ontology”, International Journal of Advanced Research in Computer Science (IJARCS), Volume.4 No.8, May‐June 2013, page no. 86‐91 with ISSN No. 0976 – 5697 and indexed at EBSCO HOST Index Copernicus, DOAJ, ICV value is 5.47. [Journal is Listed at Sr. No. 2503 in UGC ‐ Approved List of Journals] 28. Gagandeep Singh, Vishal Jain, Dr. Mayank Singh, “ An Approach For Information Extraction using Jade: A Case Study”, Journal of Global Research in Computer Science (JGRCS), Vol.4 No. 4 April, 2013, page no. 186‐191, having ISSN No. 2229‐ 371X , with impact factor (2012) 0.60. [Journal is Listed at Sr. No. 48221 in UGC ‐ Approved List of Journals] 29. Vishal Jain, Gagandeep Singh, Dr. Mayank Singh, “Implementation of Multi Agent Systems with Ontology in Data Mining”, International Journal of Research in Computer Application and Management (IJRCM) May, 2013 page no. 108‐114 having ISSN No. 2231 – 1009 and index with Index Copernicus, DOAJ, J‐Gate, Ulrich’s, EBSCO, Poland IC Value is 5.09. [Journal is Listed at Sr. No. 44341 in UGC ‐ Approved List of Journals] 30. Vishal Jain, Gagandeep Singh, Dr. Mayank Singh, “Implementation of Data Mining in Online Shopping System using TANAGRA Tool”, International Journal for Computer Science Engineering (IJCSE), USA, January 2013 page no. 47‐58 having ISSN No. 2278‐9979, with impact factor (2012) 2.91. 31. Vishal Jain, Mahesh Kumar Madan, “Implementation of Knowledge Mining with Ontology”, International Journal of Computer Science & Engineering Technology (IJCSET), Vol. 3 No. 7, July 2012, page no. 251‐253, having ISSN 2229‐3345 and index with Index Copernicus, DOAJ, J‐Gate. [Journal is Listed at Sr. No. 47314 in UGC ‐ Approved List of Journals] 32. Vishal Jain, Mahesh Kumar Madan, “Information Retrieval through Multi‐Agent System with Data Mining in Cloud Computing”, International Journal of Computer Technology and Applications (IJCTA) Volume 3 Issue 1, January‐ February 2012, page no. 62‐66, having ISSN 2229‐6093 and index with Scopus, 7/15

7

Index Copernicus, DOAJ, J‐Gate, Ulrich’s, EBSCO and ICV value is: 5.17. [Journal is Listed at Sr. No. 43388 in UGC ‐ Approved List of Journals] 33. Vishal Jain, Mahesh Kumar Madan, “Multi Agent Driven Data Mining For Knowledge Discovery in Cloud Computing”, International Journal of Computer Science & Information Technology Research Excellence Vol. 2, Issue 1, Jan‐Feb 2012, page no. 65‐69, having ISSN 2250‐2734. [Journal is Listed at Sr. No. 48079 in UGC ‐ Approved List of Journals] International Conference – 14 34. Ishleen Kaur, Gagandeep Singh Narula and Vishal Jain, “Identification and Analysis of Software Quality Estimators for Prediction of Fault Prone Modules”, INDIACom‐2017, 4th 2017 International Conference on “Computing for Sustainable Global Development”, 01st – 03rd March, 2017, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA); indexed at Scopus, DBLP through IEEE Xplore. 35. Prachi Dewal, Gagandeep Singh Narula and Vishal Jain, “Detection and Prevention of Black Hole Attacks in Cluster based Wireless Sensor Networks”, 10th INDIACom; INDIACom‐2016, 3rd 2016 International Conference on “Computing for Sustainable Global Development”, 16th – 18th March, 2016 having ISBN No. 978‐9‐3805‐4421‐2, page no. 3399 to 3403, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA) ; indexed at Scopus, DBLP through IEEE Xplore. 36. Usha Yadav, Gagandeep Singh Narula, Neelam Duhan and Vishal Jain, “A Novel Approach for Precise Search Results Retrieval based on Semantic Web Technologies”, 10th INDIACom; INDIACom‐2016, 3rd 2016 International Conference on “Computing for Sustainable Global Development”, 16th – 18th March, 2016 having ISBN No. 978‐9‐3805‐4421‐2/, page no. 1357 to 1362, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA); indexed at Scopus, DBLP through IEEE Xplore. 37. Vishal Assija, Anupam Baliyan and Vishal Jain, “Effective & Efficient Digital Advertisement Algorithms”, CSI‐2015; 50th Golden Jubilee Annual Convention on “Digital Life”, held on 02nd to 05th December, 2015 at New Delhi, published by the Springer under ICT Based Innovations, Advances in Intelligent Systems and Computing having ISBN 978‐981‐10‐6602‐3 from page no. 83 to 91 ; indexed at Scopus and DBLP. 38. Gagandeep Singh Narula, Usha Yadav, Neelam Duhan and Vishal Jain, “Evolution of FOAF and SIOC in Semantic Web: A Survey”, CSI‐2015; 50th Golden Jubilee Annual Convention on “Digital Life”, held on 02nd to 05th December, 2015 at New Delhi, published by the Springer under Big Data Analytics, Advances in Intelligent Systems and Computing having ISBN 978‐981‐10‐6619‐1 page no. 253 to 263; indexed at Scopus and DBLP. 8/15

8

39. Usha Yadav, B K Murthy, Gagandeep Singh Narula, Neelam Duhan and Vishal Jain, “EasyOnto: A Collaborative Semi Formal Ontology Development Platform”, CSI‐2015; 50th Golden Jubilee Annual Convention on “Digital Life”, held on 02nd to 05th December, 2015 at New Delhi, published by the Springer under Nature Inspired Computing, Advances in Intelligent Systems and Computing having ISBN 978‐981‐10‐6746‐4 page no. 1 to 11; indexed at Scopus and DBLP. 40. Prachi Dewal, Gagandeep Singh Narula, Anupam Baliyan and Vishal Jain, “Security Attacks in Wireless Sensor Networks: A Survey”, CSI‐2015; 50th Golden Jubilee Annual Convention on “Digital Life”, held on 02nd to 05th December, 2015 at New Delhi, published by the Springer under ICT Based Innovations, Advances in Intelligent Systems and Computing having ISBN 978‐ 981‐10‐6602‐3; indexed at Scopus and DBLP. 41. Vishal Jain, Gagandeep Singh, Dr. Mayank Singh, “Ontology Development Using Hozo and Semantic Analysis for Information Retrieval in Semantic Web”, 2013 IEEE Second International Conference on Image Information Processing (ICIIP ‐ 2013) held on December 9 ‐ 11, 2013 having ISBN No. 978‐1‐463‐6101‐9, page no. 113 to 118, organized by Jaypee University of Information Technology, Waknaghat, Shimla, Himachal Pradesh, INDIA and proceeding published by the IEEE; indexed at Scopus, DBLP through IEEE Xplore. 42. Vishal Jain, Dr. Mayank Singh, “Ontology Based Pivoted Normalization using Vector – Based Approach for Information Retrieval”, IEEE Co‐Sponsored 7th International Conference on Advanced Computing and Communication Technologies (ICACCT), In association with INDERSCIENCE Publishers, UK, IETE and Technically Co‐sponsored by Computer Society Chapter IEEE Delhi Section, held on 16th November, 2013, organized by Asia Pacific Institute of Information Technology SD India, Panipat, India and proceeding published by Inderscience Publishers. 43. Vishal Jain, Dr. Mayank Singh, “Ontology Based Web Crawler to Search Documents in the Semantic Web”, “Wilkes100 ‐ Second International Conference on Computing Sciences”, in association with International Neural Network Society and Advanced Computing Research Society, held on 15th and 16th November, 2013 organized by Lovely Professional University, Phagwara, Punjab, India and proceeding published by Elsevier Science. 44. Gagandeep Singh, Vishal Jain, “Information Retrieval through Semantic Web: An Overview”, Confluence 2012, held on 27th and 28th September, 2012 page no.114‐118, at Amity School of Engineering & Technology, Amity University, Noida. 45. Vishal Jain, Sanjay Kr. Malik, Sanjiv Agrawal, Neeraj Seth, “Developing and Deploying Ontologies in Semantic Web”, ICDM 2010, IMT Ghaziabad 46. Vishal Jain, Sanjay Kr. Malik, “Using Ontologies in Web Mining for Information Extraction in Semantic Web”, ISCET 2010, RIMT – Institute of Engineering and Technology, Gobindgarh 9/15

9

47. Vishal Jain, Sanjay Kr. Malik, “Developing and Uses of Ontologies in Semantic Web”, IISN 2010, ISTK, Klawad National Journal – 01 48. Vishal Jain, Sanjay Kr. Malik, Pankaj Lathar, “Ontology: Development, Deployment and Merging aspects in Semantic Web: An Overview”, IMS Manthan: The Journal of Innovations is being publishing by IMS Noida and Publishing India Group, June, 2010, Vol. 5 No. 1 page no. 23 o 28 having ISSN 0974‐7141. [Journal is Listed at Sr. No. 47568 in UGC ‐ Approved List of Journals] National Conference ‐ 03 49. Vishal Jain, Gagandeep Singh and Dr. Mayank Singh, “Comparative Study of Search Engine and Semantic Search Engine: A Survey”, NCACT‐2013, held on 30th March, 2013 page no. 57‐61 at Department of Computer Science & Applications, M.D. University, Rohtak. 50. Vishal Jain, Mahesh Kumar Madan, “Cloud Computing In Trust Building Knowledge Discovery for Information Retrieval”, CTNGC 2012, held on 20th October, 2012 page no.30‐32, at IT Department, Institute of Technology & Science (ITS), Mohan Nagar, Ghaziabad, proceeding published by International Journal of Computer Applications (IJCA), USA with impact factor (2011) 0.81. 51. Vishal Jain, Sanjay Kr. Malik, Arun Prakash Agrawal, Sanjiv Agrawal, "Developing Ontologies using Protégé 3.1 and 3.4 Beta in Semantic Web" ETSNT’09, Amity University. Awards 1. Awarded for “Continuous contribution towards quality research and having more than hundred research citations” in the year 2016 by BVICAM, New Delhi. 2. Awarded for “Young Active Member Participant Award 2012‐13”, by Computer Society of India (CSI). 3. Awarded for “Securing S Grade in Internet and Web Technologies”, January, 2002, DOEACC ‘O’ Level examination by A. M. Informatics, Punjabi Bagh, New Delhi. FDP/Workshop Organized/Attended: One Week / Two Weeks Duration 1. One Week Faculty Development Program on “Emerging Trends in Computer Science and IT”, organized by IEEE Computer Society, Delhi Section & IIPC (AICTE) of BVICAM from 24th – 29th July, 2017 at BVICAM, New Delhi. 10/15

10

2. Two Weeks Faculty Development Programme on “Emerging Research Trends in Computer Science and IT”, organized by IEEE Computer Society, Delhi Section & IIPC (AICTE) of BVICAM from 15th – 27th May, 2017 at BVICAM, New Delhi. 3. One Week Faculty Development Program on “Emerging Trends in Computer Science and IT”, organized by IEEE Computer Society, Delhi Section & IIPC (AICTE) of BVICAM from 25th – 30th July, 2016 at BVICAM, New Delhi. 4. Two Weeks Faculty Development Program on “Big Data Analytics”, Sponsored by DST, from 06th June – 17th June, 2016 at Manav Rachna International University, Faridabad, Haryana. 5. Two Weeks Faculty Development Program on “Emerging Trends in Computer Science and IT”, organized by IEEE Delhi Section & IIPC (AICTE) of BVICAM from 16th – 27th May, 2016 at BVICAM, New Delhi. 6. One Week Faculty Development Program on “Capacity Building to Enhance Research Output”, conducted from 21st – 25th July, 2014 at BVICAM, New Delhi. 7. Two weeks Faculty Development Program on “Information and Cyber Security Management”, sponsored by AICTE, from 22nd December, 2013 to 5th January, 2014 at Department of Engineering, College of Technology, G. B. Pant University of Agriculture and Technology, Pant Nagar, Uttrakhand. 8. One week Faculty Development Program on “Advanced Communication Techniques”, from, 10th June to 15th June, 2013 at NIT Delhi, New Delhi. 9. Two weeks Faculty Development Program on “Software Quality Engineering and 21st Century Teaching Learning System”, sponsored by AICTE, from 18th February to 2nd March, 2013 at BVICAM, New Delhi. 10. One week Short Term Course (STC) on “Wireless Network”, Conducted by Computer Science Department from October 10, 2010 to October 15, 2010 at National Institute of Technical Teacher’s Training and Research (NITTTR), Chandigarh 11. Two weeks training program in “Cyber Crime Investigations and Computer Forensics” conducted by University School of Information Technology, GGSIPU, Delhi in July 2009 FDP/Workshop Organized/Attended: Less than One Week Duration 1. Two Days Faculty Development Programme on “Leveraging Technologies for Quality Education” conducted by BVICAM, New Delhi is association with ISTE Delhi Section on, 27th ‐28th January, 2017 at BVICAM, New Delhi. 2. One day National Workshop on “Importance and Use of Copyrights, Patent, Citation and Impact Factor in Research”, conducted by Lingaya’s University, Faridabad on, 25th January, 2014 at Lingaya’s University, Faridabad.

11/15

11

3. Two days Workshop on “Neural Networks and its Implementation with MATLAB”, conducted by BVIMR, New Delhi on, 17th – 18th January, 2014 at BVIMR, New Delhi. 4. One day Faculty Development Program on “Data Mining and Social Media Analytics – Emerging Trends and Challenges” conducted by BVICAM, New Delhi on, 21st December, 2013 at BVICAM, New Delhi. 5. Two days Faculty Development Program on “Website Development through PHP”, conducted by BVICAM, New Delhi on, 9th ‐10th November, 2013 at BVICAM, New Delhi. 6. One day Faculty Development Program on “High Performance Computing using Oracle 12C”, conducted by BVICAM, New Delhi on, 16th November, 2013 at BVICAM, New Delhi. 7. Four Days Refresher course on “Cooperative Policy and Development for Faculty Members of Indian Universities”, from 2nd July‐ to 5th July, 2013, at National Center for Cooperative Education (NCCE), New Delhi. 8. Two days Faculty Development Program on “ICT Enabled 21st Century Teaching Learning System”, on Faculty Convention‐2012 of ISTE Delhi Section, conducted by BVICAM, New Delhi on, 24th – 25th November, 2012 at BVICAM, New Delhi. 9. One day Faculty Development Program on “Emerging Technologies”, conducted by BVICAM, New Delhi and Sponsored by TCS on July, 21, 2012 at BVICAM, New Delhi. 10. One day Faculty Development Program on “Emerging Technologies”, conducted by BVICAM, New Delhi and Sponsored by TCS on March, 3, 2012 at BVICAM, New Delhi. 11. One day Faculty Development Program on “Leveraging Technology for Quality Education” on May, 28th, 2011 at BVICAM, New Delhi. 12. One day National Workshop on “Leveraging Technologies using ERP & Open Source Technology”, conducted by Technia Institute of Advanced Studies, Delhi on 25th March 2011. 13. One day Faculty Development Program on “Leveraging Technology for Quality Education”, conducted by BVICAM, New Delhi and Sponsored by TCS on September, 18th, 2010 at BVICAM, Delhi 14. One day workshop on “Technotriks 2010”, conducted by The Institute of Electronics and Telecommunication Engineers (IETE) at USIT, GGSIPU on March 6, 2010 Conferences / Seminar Organized 1. Publicity Chair and Special Session Coordinator, 11th INDIACom; INDIACom‐ 2017, 4th 2017 International Conference on “Computing for Sustainable Global 12/15

12

Development”, 01st – 03rd March, 2017, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA) 2. Publicity Chair and Special Session Coordinator, 10th INDIACom; INDIACom‐ 2016, 3rd 2016 International Conference on “Computing for Sustainable Global Development”, 16th – 18th March, 2016, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA) 3. Convention Secretariat Member and Special Session Coordinator, CSI‐2015; 50th Golden Jubilee Annual Convention on “Digital Life”, held on 02nd to 05th December, 2015 at New Delhi, approved by Springer AISC Series. 4. Coordinator, One Day CSI Young Talent Search in Computer Programming – 2013, First Level Regional Competition, on 26th July, 2015 at BVICAM, New Delhi. 5. Publication Co‐chair and Special Session Coordinator, 9th INDIACom; INDIACom‐ 2016, 2nd 2015 International Conference on “Computing for Sustainable Global Development”, 11th to 13th March, 2015, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA) 6. Coordinator, One Day CSI Young Talent Search in Computer Programming – 2013, First Level Regional Competition, on 27th July, 2014 at BVICAM, New Delhi. 7. Coordinator, One Week Free Computer Training Programme for School Students from 23rd – 27th June, 2014 at BVICAM, New Delhi 8. Co‐Coordinator, One Day “IET PATW‐2014” on 19th April, 2014 at BVICAM, New Delhi. 9. Local Organizing Chair and Special Session Coordinator, 8th INDIACom; INDIACom‐2014, 2014 International Conference on “Computing for Sustainable Global Development”, 5th ‐ 7th March, 2014, held at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA) 10. Jt. Convener, Two days CSI National Students’ Convention 2014 on 07th – 08th March, 2014 at BVICAM, New Delhi. 11. Jt. Convener, One day CSI Delhi State Students’ Convention on 11th November, 2013 at BVICAM, New Delhi. 12. Co‐Coordinator, One day DietY sponsored workshop on “ESDM with Industrial Visit” on 10th August, 2013 at BVICAM, New Delhi. 13. Coordinator, One Day CSI Young Talent Search in Computer Programming – 2013, First Level Regional Competition, on 28th July, 2013 at BVICAM, New Delhi. 14. Co‐Coordinator, One Day “IET PATW‐2013” on 16th March at BVICAM, New Delhi. 15. Jt. Convener, One day NSC‐2013 (National Student Convention) on 25th, February, 2013 at BVICAM, New Delhi.

13/15

13

16. Organizing Committee Member, Two days National Conference, INDIACom 2013 ‐ Computing for Nation Development, National Conference on 23rd – 24th, February, 2013 at BVICAM, New Delhi. 17. Coordinator, One day “ISTE SRMC – 2012” Competition on 5th November, 2012 at BVICAM, New Delhi 18. Jt. Convener, One day NSC‐2012 (National Student Convention) on 25th, February, 2012 at BVICAM, New Delhi. 19. Organizing Committee Member, Two days National Conference, INDIACom 2012 ‐ Computing for Nation Development, National Conference on 23rd – 24th, February, 2012 at BVICAM, New Delhi. 20. Jt. Convener, One day NSC‐2011 (National Student Convention) on March, 12th, 2011 at BVICAM, New Delhi. 21. Organizing Committee Member, Two days National Conference, INDIACom 2011‐ Computing for Nation Development, National Conference on 10th ‐ 11th, March, 2011 at BVICAM, New Delhi. S e m i n a r / C o n f e r e n c e s A t t e n d e d 1. One Day “Women Empowerment and Social Development Seminar”, on 18th July, 2013 at India Habitat Center conducted by Bharati Vidyapeeth’s Institute of Management and Research (BVIMR), New Delhi. 2. One day “Oracle Higher Education Seminar” at Oberoi, New Delhi scheduled on Tuesday, December 18, 2012 conducted by Oracle Corporation. 3. One day “Viburnix Certified Alumni Administrator Program” on October 13th, Saturday, 2012 at International Habitat Center, conducted by Saviance Technologies, Gurgaon. 4. Two days “All India Seminar on “Emerging ICT : A tool for Nation Development” organized by IET, Delhi at Engineer’s Bhawan, Delhi on February 17‐18, 2012 5. One day National Workshop on “Leveraging Technologies using ERP & Open Source Technology”, conducted by Technia Institute of Advanced Studies, Delhi on 25th March 2011. 6. One Day National Symposium on “Software 2.0 Emerging Competencies” on January, 6, 2011 conducted by CSI, Delhi Chapter at India International Center, Lodhi Road, New Delhi 7. An Evening Lecture by ““Information Security by Leon Strous” on January, 6, 2011 conducted by CSI, Delhi Chapter at India International Center, Lodhi Road, New Delhi 8. One day “Annual Students’ Convention (ASC ‐2010)” organized by ISTE (Indian Society for Technical Education) which is being hosted on September, 25th, 2010 at BVICAM, New Delhi. 14/15

14

9. One day National Conference on “ICT :An Engine for Inclusive Social Growth” on August, 28, 2010 conducted by CSI, Delhi Chapter at Constitutional Club of India, New Delhi 10. One day “Delhi’s Youth Brigade Programme” Launched by, Hon’ble Chief Minister, Govt. of Delhi, as Student Representative of Guru Gobind Singh Indraprastha University on July 29, 2010 Subjects Taught Undergraduate

: ‐

Introduction to Programming, OOPS, Data Structure

Post Graduate

: ‐

FIT, OOPS, Data Communication and Networking, Advanced Computer Networks

Research Area Semantic Web, Web Technology, Computer Networks, Ad‐hoc Networks (DR. VISHAL JAIN)

15/15

15

Int. j. inf. tecnol. DOI 10.1007/s41870-017-0021-z

ORIGINAL RESEARCH

FUZZY-PRoPHET: a novel routing protocol for opportunistic network Khaleel Ahmad1 • Muneera Fathima1 • Vishal Jain2 • Afrah Fathima1

Received: 20 February 2017 / Accepted: 3 June 2017 Bharati Vidyapeeth’s Institute of Computer Applications and Management 2017

1 Introduction

Abstract Opportunistic networks are the subclass of DTN. DTN does not require an end-to-end path to forward a message. In opportunistic networks, nodes may have partial or zero knowledge about the network. Due to which, clear information about the nodes in the network becomes difficult. Opportunistic network applications are for the situations with high tolerance of long delays, high error rate etc.; Routing in such a network is a challenge. In this paper, we propose a protocol called as Fuzzy-PRoPHET which can be a novel routing protocol for opportunistic network. The best models are chosen i.e. PRoPHET and Fuzzy to design an efficient routing protocol. Fuzzy takes all values between 0 and 1, due to which collecting best nodes can become easy. Membership functions are calculated to choose next hop. Fuzzy is the best mathematical tool to represent uncertainty. Therefore, we can achieve better results and can increase throughput with minimum communication delay.

Mobile adhoc networks (MANETs) are self-organizing networks, nodes in this, move randomly and communicates only when if any other node comes in range. Data dissemination in such a network can be done in two ways point-to-point and broadcast; broadcast is usually chosen to spread data all over the network [1]. If a node does not come in range for long time then the message is buffered till the other node comes in range. Opportunistic networks (Oppnets) are a form of MANETs, in which nodes gets connected opportunistically. The network topology varies rapidly due to node mobility and signal interference [2]. Highly dynamic nature of network imposes several challenges. Because of this, traditional routing protocols are not applicable for Oppnets [3]. Oppnets are the best networks if nodes are in range. So, we need to design an efficient routing protocol to improve performance. MANETs are those networks in which the source and destination are wirelessly connected [4]. So, there is an every possibility that source and destination may not be in range always. So, routing becomes crucial due to dynamic nature of network. A node may not have any knowledge about other nodes present in the network. In many cases, nodes have some acknowledgment like user behavior, history etc.; connecting of nodes using this property is called transitivity. Transitivity is high, where a person is present for long time like office, home, or any park etc. In Oppnets, source to destination may not be just a hop distance. So these are multi hop networks. The nodes which come in between source and destination are called Intermediate nodes. Many routing protocols are designed based on history i.e. context aware. Based on this we have designed a novel routing protocol i.e. Fuzzy-PRoPHET. This is based on

Keywords Opportunistic Networks Routing Protocol PRoPHET FUZZY-PRoPHET & Khaleel Ahmad [email protected] Muneera Fathima [email protected] Vishal Jain [email protected] Afrah Fathima [email protected] 1

School of CS and IT, Maulana Azad National Urdu University, Hyderabad, India

2

Bharati Vidyapeeth’s Institute of Computer Applications and Management, Delhi, India

123 16

Int. j. inf. tecnol.

PRoPHET routing protocol. Since, PRoPHET can be the good routing protocol when nodes are in range. In PRoPHET, next hop is selected depending on its Delivery Predictability; the high the Delivery Predictability the node has more chances to be a good forwarder. To the Delivery Predictability we apply Fuzzy, since fuzzy can take all values between 0 and 1. Delivery predictability is fuzzified and their membership functions are calculated. Depending on membership function, the next hop is selected. Minimum membership function has high chances to be a good forwarder.

Fig. 1 The way source is contacting its destination node in PRoPHET

Calculation of Delivery Predictability has three stages.

1.1 Literature survey

1.

Amin Vahadat and Becker [5], proposed epidemic routing, in which message spread like a disease in the whole network so as to improve throughput and reduce communication delay. Boldrni [6], proposed history based routing protocol. It considers user behavior and history for routing. Burgess [7], MaxProp has prioritized both forwarding and dropped packets. Prioritization is done based on history of nodes or any acknowledgment about intermediate nodes. Nabhani and Bidgoli [8], proposed AFRON, adaptive fuzzy routing to prioritize buffer, in order to increase throughput. Malik et al. [9], proposed a routing protocol in DTN. It is compared with shortest path; due to obstacles in realistic routing, performance can be degraded. Fuzzy logic controller is used to improve realistic routing. Paresha and Patel [10], designed a protocol for DTN. Fuzzy is applied to PRoPHET routing protocol, to prioritize messages in the buffer. Buffer is prioritized to increase throughput. Chaudhary et al. [11], proposed fuzzy logic based intrusion detection system (IDS). Black holes can be detected using fuzzy IDS, here Mamdani fuzzy model was used. Due to the dynamic nature of MANETs, we can say that there can’t be any promising solution to fuzzy IDS. Further research is being carried out to design a model which can classify normal and malicious nodes in the network.

2.

2.

Update the metric whenever a node is encountered, so that nodes are always updated, Pða;bÞ ¼ Pða;bÞold þ 1 Pða;bÞold Pinit ; where Pinit is initialization constant i.e. Pinit e (0, 1] Aging: If a pair of nodes does not encounter for long time, its delivery predictability is aged, the long time it takes, the less is probability to be a good forwarder. The aging equation P(a,b) = P(a,b)old 9 ck, where c e (0, 1) this is aging constant and k is number of time units expired, since last encounter. Transitivity: If a node frequently encounters another node, then they are good forwarders. If node a, frequently encounters node b, node b frequently encounters node c, then node c is a good forwarder of message for node a. Pða;cÞ ¼ Pða;cÞold þ 1 Pða;cÞold Pða;bÞ Pðb;cÞ b; b e [0, 1] scaling constant. It decides the impact of transitivity on delivery predictability.

In opportunistic networks, forwarding techniques are not that easy due to absence of knowledge, we can presume any destination. In PRoPHET, selection of next node is based on ‘delivery predictability’. Each node has their ‘delivery predictability’ so when a node wants to transmit any message; delivery predictability of local nodes is considered and compared. After comparison, then the nodes with higher delivery predictability than transmitter (sender) are our good forwarder. The transmitter sends message to that node. At the receiver end the node has every chance to accept or reject the message. If the receiver node needs that message it accepts the message and updates its delivery predictability. Selection of receiver node with high ‘delivery predictability’ is done to increase through put of a network. After sending a message to the next node, the source will not delete the message until it reaches the destination. They are buffered at source. FIFO is used for buffering messages. If at the receiver end, if its buffer is full then the message is deleted.

1.2 PRoPHET: probabilistic routing protocol using history of encounters and transitivity Lindgren et al. [12], proposed a probabilistic protocol. A probabilistic metric called ‘Delivery Predictability’ is introduced i.e. P(a,b) e [0, 1] at every node a, for a known destination node b. Unlike epidemic routing, PRoPHET exchanges summary vector as well as delivery predictability to the receiver node. Summary vector is used to decide which messages to request from the other node based on forwarding strategy (Fig. 1).

123 17


To avoid such scenarios ‘better nodes’ were chosen, these are selected nodes within the range. The nodes which are travelling towards destination can be considered as ‘better nodes’ or ‘best nodes’. In PRoPHET, local ‘best nodes’ are selected and are used for forwarding. 1.3 Proposed work The proposed protocol is Fuzzy-PRoPHET. 1.4 Fuzzy-PRoPHET

Fig. 2 Range versus membership function

In opportunistic routing, complexity usually arises due to absence of knowledge about nodes in the network or partial knowledge in some cases. Due to partial knowledge uncertainty problem may arise. To resolve this, the main theory which is applied from years is ‘probabilistic theory’; it is applied only in the cases of uncertainty. In opportunistic networks randomness is the main aspect. So we are applying probabilistic theory. Fuzzy set theory is an outstanding mathematical tool for uncertainty. Fuzzy was introduced by Zadeh [13], we are applying Fuzzy logic in our network. Unlike crisp set, Fuzzy has intermediate values to give accurate results. This set is called Fuzzy set. In PRoPHET, hop selection is done based on the delivery predictability of the node. After calculation of delivery predictability, those nodes which have high delivery predictability than the threshold are selected as best forwarders by the source node. This set of nodes with high delivery predictability is considered as a Crisp set. It takes decisions in binary form. We cannot use this directly in fuzzy; we need to convert Crisp set to Fuzzy set. The method of converting Crisp set to Fuzzy set is called ‘Fuzzification’. Here, we have divided our fuzzy range into 5 parts, which is depicted in Table 1 and Fig. 2. Ordering of the range is done in reverse order to achieve high throughput with short range. The geometry of fuzzy sets involves both the domains. X = {x1,….xn} where X is the universal set and x1,….Xn are elements of universal set and range [0, 1] of mappings Mn: X [0, 1] [14].

Fuzzy set takes the form as X ¼fx;lA ðxÞjx 2 X; 0 lA ðxÞ 1g; lA(x) or A is the membership function of the element, x is the element of universal set. We need to calculate membership function for the set. Depending on our routing protocol, the designed membership function is, p n ð1Þ AF ¼ 1= 1 þ x2 : Here x is the corresponding element of the set; n is total number of elements in the set or also called as ‘Cardinality’. After completion of our process, fuzzy sets are again converted back to crisp set since calculation of delivery predictability is done in crisp set. The method of converting fuzzy sets to crisp set is called as ‘Defuzzification’. To Defuzzify our membership function, we should write another membership function for Defuzzification, h p i1=2 ADF ¼ 1=ð1 þ xÞ n : ð2Þ These are the membership functions for Fuzzification and Defuzzification, respectively for Fuzzy-PRoPHET. 1.5 Block diagram of Fuzzy-PRoPHET Source is the sender node and it selects few nodes depending on ‘Delivery Predictability’, after selecting nodes. If we take them in a set, it is called a crisp set. To this Fuzzification is done, so it becomes fuzzy set. Fuzzy inference here acts as Interface between crisp and fuzzy. The rule here what we gave is the division of fuzzy range. Depending on membership function, the next hop is chosen in Fuzzy-PRoPHET. After selecting the next hop, message is forwarded i.e. our destination. After the message is forwarded, the set is Defuzzified and the same process repeats. Rule base: Fuzzy range 0–1 is classified in reverse order to achieve more nodes in the shortest range (Figs. 3, 4). 0–0.2 is Very high

Table 1 Classification of fuzzy range Range

Distance

Membership function

0–0.2 0.21–0.4

Very high (VH) High (H)

1 0.9

0.41–0.6

Medium (M)

0.7

0.61–0.8

Low (L)

0.5

0.81–1.0

Very low (VL)

0.4

123 18


0.21–0.4 0.41–0.6 0.61–0.8 0.81–1.0

is is is is

7. 8. 9. 10.

High Medium Low Very low

1.6 Algorithm of Fuzzy-PRoPHET routing protocol

1.7 Implementation

Suppose node ‘A’ wants to forward a message to other node ‘B’, it has to follow the some steps given below: 1. 2. 3. 4.

5.

n

A ¼ 1=ð1 þ x2 Þ . Range is checked. 5:1 5:2 5:3 5:4 5:5 5:6

If the range comes between 0.0 and 0.2, goes to step 6. If the range is between 0.21 and 0.4, goes to step 6. If range is between 0.41 and 0.6, goes to step 6. If range is found between 0.61 and 0.8, goes to step 6. If the range is from 0.81 to 1.0, goes to step 6. Checking that the message is delivered within 15 s or not. 5:6:1 5:6:2

6.

PRoPHET is a probabilistic based routing protocol; depending on its ‘delivery predictability’ its next hop is chosen. In PRoPHET if we apply crisp set for selecting next forwarder, in the boundary of 0 and 1 if 0.5 is threshold. For value less than 0.5 are considered as less ‘delivery predictability’ and for value more than 0.5 as high ‘delivery predictability’. Then the nodes in the range of 0–0.5 are not considered. Due to which through put may affect, reliability will also get affected. To avoid this, Fuzzy is applied to PRoPHET i.e. ‘Fuzzy PRoPHET’. When Fuzzy is applied to PRoPHET, depending on ‘delivery predictability’ few nodes are selected which are in range and Fuzzy is applied. Fuzzy considers all the nodes with high ‘delivery predictability’ and gives accurate result. So that our communication delay is reduced, this directly increases our network throughput. Case I: Let us consider that if threshold is [0.5 then, our crisp set of delivery predictability is {0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1}. We will now perform Fuzzification, its membership function is X and its Fuzzification is shown in Fig. 5. X = {0.43, 0.37, 0.32, 0.28, 0.24, 0.20, 0.17, 0.15, 0.13, 0.11}. Now we Defuzzify to get crisp set, its membership function is X and its Defuzzification is depicted in Fig. 6. X = {0.56, 0.6, 0.64, 0.67, 0.7, 0.74, 0.78, 0.8, 0.81, 0.84}, this is the crisp set.

First, to determine the delivery predictability. Selection of local nodes (crisp set) based on high delivery predictability by node ‘A’. Fuzzification process. Fuzzy sets are formed, and next hop is chosen depending on the membership function p

Message forwarding after path selection. Defuzzification process. Formation of crisp set A = (1/(1 ? x)Hn)1/2 is done. Stop.

If the message is delivered within 15 s, goes to step 1. Else goes to step 10.

Finally, node is selected depending on minimum membership function PD = 1/MF.

Fig. 3 Block diagram of Fuzzy-PRoPHET

123 19

Int. j. inf. tecnol. Fig. 4 Flow chart of FuzzyPRoPHET

123 20


Fig. 8 Membership (Defuzzification)

function

vs

delivery

predictability

Fig. 5 Delivery predictability (D.P) vs membership function (M.F) (Fuzzification)

Fig. 9 Delivery (Fuzzification) Fig. 6 Membership (Defuzzification)

function

vs

delivery

predictability

predictability

vs

membership

vs

delivery

function

predictability

Fig. 10 Membership (Defuzzification) Fig. 7 Delivery (Fuzzification)

vs

membership

function

function

predictability

Now we apply FuzzificationX = {0.26, 0.23, 0.2, 0.17, 0.14, 0.12}, is the fuzzy set which is depicted in Fig. 7. Now we apply Defuzzification for the above set. X = {0.69, 0.71, 0.74, 0.77, 0.81, 0.83} is the crisp set. It is depicted in Fig. 8.

Case II: let us consider the threshold delivery predictability [0.7, and then the crisp set is {0.72, 0.77, 0.82, 0.87, 0.92, 0.97}.

123 21


References

Case III: let us consider the threshold delivery predictability [0.6, and then the crisp set is X. X = {0.62, 0.64, 0.66, 0.68, 0.7, 0.72, 0.74, 0.76, 0.78, 0.8, 0.82, 0.84, 0.86, 0.88, 0.9, 0.92, 0.94, 0.96, 0.98, 1}. Now let us apply Fuzzification, it is depicted in Fig. 9. Fuzzy set X = {0.23, 0.22, 0.20, 0.18, 0.16, 0.15, 0.14, 0.13, 0.12, 0.11, 0.10, 0.09, 0.08, 0.078, 0.07, 0.064, 0.06, 0.053, 0.05, 0.045}. Now we Defuzzify, to get crisp set, it is depicted in Fig. 10. Crisp set X = {0.62, 0.64, 0.66, 0.68, 0.71, 0.72, 0.74, 0.75, 0.77, 0.78, 0.8, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9}. From the above given cases, we have clearly seen that, delivery predictability is inversely proportional to membership function.

1. Mohan SK, Reddy AV (2009) Data dissemination in mobile computing environment. BIJIT-BVICAM’s Int J Inf Technol 1:57–60 2. Sańchez M, Manzoni P (2001) ANEJOS: a java based simulator for ad hoc networks. Future Gener Comput Syst 17(5):573–583 3. Huang CM, Lan KC, Tsai CZ (2008) A survey of opportunistic networks. In: 22nd International Conference on Advanced Information Networking and Applications-Workshops (aina workshops 2008) 4. Hasti A (2011) Study of impact of mobile ad–hoc networking and its future applications. BIJIT-BVICAM’s Int J Inf Technol 4:439–444 5. Vahadat A, Becker D (2000) Epidemic routing for partially connected ad hoc networks. In: Technical Report CS-200006, Duke University 6. Boldrini C, Conti M, Passarella A (2008) Exploiting users’ social relations to forward data in opportunistic networks: the HiBOp solution. Pervasive Mob Comput 4(5):633–657 7. Burgess J, Gallagher B, Jensen D, Levine BN (2006) MaxProp: routing for vehicle-based disruption-tolerant networks. In: Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications 8. Nabhani P, Bidgoli AM (2012) Adaptive fuzzy routing in opportunistic network (AFRON). Int J Comput Appl 52(18):7–11 9. Malik N, Gupta S, Bhushan B (2015) A fuzzy based routing protocol for delay tolerant network. Int J Grid Distrib Comput 8(1):11–24 10. Paresha DJ, Patel NB (2016) The routing prophet protocol in DTN based on fuzzy mechanism. Int J Sci Res Dev 4(4):1125–1128 11. Chaudhary A, Tiwari VN, Kumar A (2014) Analysis of fuzzy logic based intrusion detection systems in mobile ad hoc networks. BIJIT-BVICAM’s Int J Inf Technol 6:690–696 12. Lindgren A, Doria A, Schelen O (2003) Probabilistic routing in intermittently connected networks. In: The Fourth ACM International Symposium on Mobile Ad hoc Networking and Computing (MobiHoc 2003), USA. ISBN 10: 1581136846, ISBN 13: 9781581136845 13. Zadeh LA (2009) Toward extended fuzzy logic—a first step. Fuzzy Sets Syst 160(21):3175–3181 14. Kosko B (1990) Fuzziness vs. probability. Int J Gen Syst 17(2–3):211–240

1.8 Delivery predictability a 1/membership function So, in Fuzzy PRoPHET next hop is chosen depending on its membership function. The less the membership function, the more chances a node considered to be a good forwarder.

2 Conclusion and future work In this paper, we tried for an efficient routing protocol, i.e. Fuzzy-PRoPHET. Fuzzy, a great tool to resolve uncertainty. Due to the partial or zero knowledge of a node in the network, uncertainty raises, and can be compensated by fuzzy. When compared to PRoPHET, we got accurate results by which throughput increases and the fuzzy range for each node is divided with the concept of shortest path to reduce communication delay. This was just a mathematical representation of Fuzzy-PRoPHET. In near future, we will try to simulate, which gives more accurate results.

123 22

Int. j. inf. tecnol. (March 2017) 9(1):93–100 DOI 10.1007/s41870-017-0004-0

ORIGINAL RESEARCH

Differential analysis of token metric and object oriented metrics for fault prediction Ishleen Kaur1 • Gagandeep Singh Narula2 • Vishal Jain3

Published online: 23 February 2017 Bharati Vidyapeeth’s Institute of Computer Applications and Management 2017

Abstract Due to the scarcity of resources, testing of each module is not possible for large projects. Thus, calculation of fault prone modules can be used for better resource utilization. A proposed token metric and the available object oriented metrics for forecasting the fault prone modules of an open source project has been proposed. The results from both set of metrics are compared to assess the effectiveness of the planned approach. The results concluded that (1) the proposed metric can be utilized for the calculation of fault prone modules with equivalent accuracy as in the case of object oriented metrics; (2) the token metric and object oriented metrics show opposite trends of results in precision and recall; (3) the proposed metric is much better for projects involving high risks.

as we move to the testing phase. However, modules if go untested at customer site may degrade the software quality. A better solution is to allocate the available resources to those parts of the software system, where there is a high probability of faults being present. These parts of the software system are called as fault prone modules. The fault prone modules can cause failure of the system and may lead to customer dissatisfaction. Thus there is a great need to recognize these fault prone modules of the system. Various studies have been conducted to calculate the fault prone modules of the software system. Most of them use the software metrics as fault prone estimators. Various static attributes like lines of code, Cyclomatic complexity, Halstead metrics extracted from the modules of the software are used for this purpose [1–4]. The advancement in information technology has changed the dynamics of life and society as well as software development [5]. Requirement and code metrics have also been used for the prediction of fault prone modules [1, 6]. Singh et al. [7] conducted an experiment to predict the fault prone modules using the statistical and machine learning classifiers and the results concluded with decision tree achieving the highest results for the prediction. But the use of all these metrics have not achieved much results in past. In [8], a token based metric based on the spam filter based approach [9] has been proposed for the prediction. Since the token based approach is measured from the words of the source code, it reduces the overhead of extracting various different metrics to give some fair results for prediction. The approach enhanced the existing fault prone prediction process. On manual testing using the approach, the proposed approach provides better results for the prediction. The non-functional requirements like maintainability, portability, does neither have any exact specification nor are there any metrics for specifying objectives of requirements [10, 11].

Keywords Decision tree Fault prone CK metrics Software testing Token metric

1 Introduction In software engineering, exhaustive testing is not possible, which means that each module of the software cannot be tested completely. This is due to the scarcity of resources & Gagandeep Singh Narula [email protected] Ishleen Kaur [email protected] Vishal Jain [email protected] 1

IIT Delhi, New Delhi, India

2

CSIR-NISTADS, New Delhi, India

3

Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi, India

123 23

94

Int. j. inf. tecnol. (March 2017) 9(1):93–100

A continuous effort has been conducted for the prediction of fault prone modules [12, 13]. Basili et al. [14] conducted a study utilizing the object oriented metrics for the calculation of fault prone modules. Aman [15] used the lines of comments for the prediction of fault prone modules in small size modules. Mizuno and Hata [16] performed an empirical comparison of the complexity metrics and text feature metrics which were a part of the fault prone filtering process proposed in [9, 17]. The results concluded with high F1 measure and recall for the text feature metrics and high accuracy and precision in the prediction using complexity metrics. Thus both the metrics show opposite trends of results in evaluation measures. The objective of this study is to analyze the results of fault prediction using the object oriented metrics and the token based metric. For the object oriented metrics, we have used CK metrics proposed by Chidamber and Kemerer in [18–20]. The reason behind using CK metrics is that most of the today’s projects are implemented in object oriented languages and CK metrics are the most popular metrics used for the prediction of fault prone modules in an object oriented software. We have used an open source project for comparison of the prediction using both sets of metrics: proposed token metric and CK metrics. The rest of the paper is organized as follows: Sect. 2 explains the research methodology used in the study. Section 3 describes the dataset used for the study and evaluation measures to evaluate the results of the experiment. Section 4 details the experiments performed and results acquired. Lastly Sect. 5 gives the conclusion of the study.

Training Project N F

F M

Extract tokens from faulty modules

Tokens Extract tokens from nonfaulty modules Fig. 1 Preparation of token metrics of training project

compared with the token files of the faulty and non-faulty modules of the training project. Token metric is computed using Eq. 1, where NF defines the number of similar tokens in faulty token file of training project and the target module while NNF defines the similar tokens in the target module.non-faulty token file and Token metric ¼

NF : NF þ NNF

ð1Þ

The token metric ranges from 0 to 1. The token metric is compared with a selected threshold. The modules with token metric higher than the selected threshold are considered as fault prone, otherwise non-fault prone. Thus, each module is classified as faulty or non-faulty module based on the token metric. The tokenization process takes place by taking the source code of the modules and removing the comments from the modules. The comments we have removed for the process includes all the three types of the comments-singleline, multi-line and Javadoc. Then all the separators are removed which do not have any effect on the fault proneness of the modules, like brackets and semi-colons etc. After the separators, we replaced all the user-defined methods by a keyword ‘‘Function’’ along with its return type, and all the variables by a keyword ‘‘Variable’’ along with its return type. Similarly the constructor names of a class are replaced with a keyword ‘‘Constructor’’. After the preprocessing of the identifier names used in the modules,

2 Methodology The research methodology used in this study is divided into two parts: Using Token based approach and using object oriented metrics. 2.1 Using token metric The proposed token based approach is an extension and improvement over the current fault prone filtering method. The token metric is calculated by extracting the tokens from the modules of the training and target project. Figure 1 explains the mining of tokens from the modules of the training project. The token files of both the faulty and non-faulty modules are stored separately in the database. Figure 2 explains the classification of the modules from the target project. Every module of the target project is

123 24


95 Table 1 Tokenization process Tokens: public int read (char a) public int

Target Project

Tokens

int Function_int Function_int char char Variable_char

Compare tokens Ta with TF and TNF

4. 5. 6.

Extract tokens for each module i-Ta

Replace constructor names with ‘‘Constructor’’. Convert expressions into postfix expressions. Generate tokens using two words of the source code.

Table 1 describes the tokenization process used for the study. In Table 1, ‘read’ is a method with return type int and ‘a’ is a variable with return type char. 2.2 Using object-oriented metrics

Compute token metric (PT)

For object oriented metrics extracted from the source code of modules, CK metrics proposed by Chidamber and Kemerer has been chosen [18]. The CK metrics suite consists of 6 metrics as follows: 2.2.1 Weighted methods per class (WMC)

If PT>th

i

fault prone

i

non-fault prone

WMC measures the sum of the complexities of the functions in a class. It can be given as in Eq. 2. Here n defines the number of methods in a class. n X WMC ¼ Ci : ð2Þ

Ta- Tokens from target project TF- Tokens from faulty modules TNF-Tokens from non-faulty modules PT- Token metric th- Threshold

i¼1

By default, the complexity of each method is 1. Thus WMC can be considered as the number of methods in a class.

Fig. 2 Classification of modules of target project

each expression is replaced by its postfix evaluation. The postfix evaluation is important as removing brackets may lead to same tokens of the expressions with different placement of brackets. For example, (a ? n)-1*d, a ? (n-1)*d and a ? n-(1*d) all may yield different results. But as we will remove all the separators including parenthesis and brackets, all will lead to the same tokens. The postfix evaluation of the expressions will yield different results for expressions with different location of brackets. And then the tokens are generated using two neighboring words of the source code. In this study, only the neighboring words have been used in order to keep the space and time complexity reduced. Thus the tokenization process can be summarized as: 1. 2. 3.

2.2.2 Coupling between objects (CBO) CBO defines the number of class to which a class is coupled to. Coupling measures the degree of interdependence between classes. 2.2.3 Depth of inheritance tree (DIT) DIT measures the depth of the class in the inheritance hierarchy. It describes the number of methods or attributes that the class may inherit. 2.2.4 Lack of cohesion of methods (LCOM) LCOM describes the dissimilarity of methods in a class. Cohesion measures the degree of intra-dependence in a module of a class.

Remove comments and separators from source code. Replace all methods with ‘‘Function_’’ \return type[. Replace all variables with ‘‘Variable_’’ \return type[.

123 25

96


2.2.5 Number of children (NOC)

the files for which linking could not be performed. Table 2 shows the description of the modules used for the prediction.

NOC is an indicator of the number of classes that inherits the methods and attributes of a class. It measures the number of immediate subclasses of the class in inheritance hierarchy.

3.2 Evaluation measures

2.2.6 Response for class (RFC)

For the experiment, an open source project jEdit has been used. The reason for choosing the project is twofold:

To evaluate the results obtained from the experiment, four evaluation measures are being used viz. Precision, Recall, F1 measure and Accuracy. In order to compute the values, a confusion matrix is used. Table 3 shows the confusion matrix. TP (True Positive) defines the correctly classified as fault prone modules. TN (True Negative) defines the number of modules correctly predicted as non-fault prone. FP (False Positive) depicts the quantity of modules forecasted as fault prone but actually non-faulty. FN (False Negative) calculates the quantity of faulty modules, classified as non-faulty. Precision is defined as the ratio of modules correctly predicted as fault prone to the modules predicted as fault prone. Equation 3. mathematically elaborates precision.

•

Precision ¼

RFC defines the number of methods that can be invoked in response to the message of the object of the class. It can be calculated by measuring the number of methods in the class and the methods that can be invoked by the methods of the class.

3 Datasets and evaluation measures 3.1 Collection of source code and metrics

•

Open source-Source code and metrics are easily available. Object oriented- For using object oriented metrics for the prediction, we required a project written in object oriented language and jEdit is written in Java.

TP : TP þ FP

ð3Þ

Recall is the ratio of modules correctly predicted as fault prone to the actual faulty modules. Recall is defined as in Eq. 4.

We have used two versions of the project jEdit 4.0 and jEdit 4.1 for the purpose. The source code of the project is collected from GitHub [21]. GitHub is an online repository containing source code of many open source projects. The CK metrics for the experiment are collected from Promise repository [22] which is a collection of publicly available datasets. Figure 3 shows the snapshot of the metrics collected from the Promise repository. The modules collected from the GitHub are matched with modules from promise repository. Thus we had to omit

Recall ¼

TP : TP þ FN

ð4Þ

Accuracy is defined as the ratio of correctly predicted modules to the total number of modules. Accuracy is defined as in Eq. 5. Accuracy ¼

TP þ TN : TP þ FP þ TN þ FN

ð5Þ

F-measure is defined as the harmonic mean of precision and recall. It is defined as in Eq. 6. Fmeasure ¼

2 Recall Precision : Recall þ Precision

ð6Þ

Table 3 Confusion matrix Forecasted non-fault prone

Forecasted fault prone

Actual non-faulty

TN

FP

Actual faulty

FN

TP

Fig. 3 Snapshot of the metrics collected from promise repository Table 2 Dataset

Project version

#Faulty

#Non-faulty

#Faulty used

#Non-faulty used

jEdit 4.0

75

218

56

174

jEdit 4.1

79

221

49

156

123 26


97

4 Implementation and results

Figure 5 shows a small snapshot of an input sample program and output token file for the same program. The token file for a file is saved with the same name as that of the original file, but with a suffix ‘‘-tokenfile’’ to the file. For example, the file with filename ‘‘FahrenheittoCelcius’’ has its token file with the name ‘‘FahrenheittoCelcius-tokenfile’’. Figure 5 depicts the code after removing all the separators and comments, identifiers have also been replaced and then the tokens have been generated using two neighboring words as explained in Sect. 2. The threshold selected in this study for the classification is 0.5 which means that the modules with token metric higher than 0.5 are considered as fault prone modules. Otherwise, modules with token metric less than or equal to 0.5 are considered as non-fault prone modules. Table 4 gives the confusion matrix prepared for the fault prediction using token based metric. Precision using the token metric can be computed using Eq. 3. Thus precision is calculated as:

JEdit 4.0 is used as the training project and jEdit 4.1 as the target project. 4.1 Using token metric In this experiment, tokens are being extracted from the modules of both the simulation project and target project. The tokens of the target modules are compared with the tokens of the faulty and non-faulty modules of training project. The token metric is computed by counting the number of similar tokens from target module and faulty and non-faulty token files from training project. These values are then used in the Eq. 3 in order to calculate the token metric for each module. Figure 4 shows a snapshot of the code that predicts probability of finding faulty modules in Net Beans IDE.

Fig. 4 Snapshot of the code from NetBeans IDE

123 27

98


Fig. 6 Decision tree Fig. 5 Snapshot of the input and output of tokenization Table 5 Results using CK metrics Table 4 Results using proposed token-based metric

Actual non-faulty Actual faulty

Predicted non-fault prone

Predicted fault prone

133

23

9

40

Actual non-faulty Actual faulty

Predicted non-fault prone

Predicted fault prone

145

11

20

29

Table 6 Evaluation results

40 Precision ¼ ¼ 0:635 40 þ 23

CK metrics

Token metric

Recall can be calculated as given in Eq. 4 which is given as:

Precision

0.725

0.635

Recall

0.6

0.816

Accuracy

0.848

0.843

40 Recall ¼ ¼ 0:816 40 þ 9

F-measure

0.652

0.71

Similarly accuracy and F-measure can be computed as follows: Accuracy ¼

The precision, recall, accuracy and F-measure can be computed using the same formulas that have been used for computing results for token metric.

40 þ 133 ¼ 0:843 133 þ 40 þ 9 þ 23

F measure ¼

Precision ¼

2 0:816 0:635 ¼ 0:71 0:816 þ 0:635

Recall ¼

4.2 Using CK metrics

29 ¼ 0:6 29 þ 20

Accuracy ¼

A decision tree is a supervised graphical technique that is used for the classification of modules into two classes-fault prone and non-fault prone. In a decision tree, each internal node represents an attribute, while the leaf nodes denote the class labels. The tree prepared from the training project is given in Fig. 6. We have used J48 decision tree of Weka for the classification. The confusion matrix for classification using decision tree are shown in Table 5.

29 ¼ 0:725 29 þ 11

29 þ 145 ¼ 0:848 145 þ 29 þ 11 þ 20

F measure ¼

2 0:6 0:725 ¼ 0:652 0:616 þ 0:725

4.3 Results analysis Table 6 shows the comparison of the results for the prediction using both the experiments. It can be seen from the

123 28


99

Comparison of Results CK metrics

metric is an important aspect to foresee the fault prone modules in case of risky projects. The future work involves the study of combination of the token metric and the complexity metrics in order to combine their effects on the opposite trends of results that have been experienced in this study.

Token metric

0.9 0.8 0.7 0.6 0.5 0.4 0.3

References

0.2 0.1 0

Precision

Recall

Accuracy

1. Jiang et al (2007) ‘‘Fault Prediction Using Early Lifecycle Data’’ ISSRE. In: 18th IEEE Symposium on Software Reliability Engineering, pp 237–246 2. Marshima MR, Teo NHI, Yusop NSM and Mohamad NS (2011) ‘‘Fault prediction model for web application using genetic algorithm’’. In: International conference on computer and software Modeling (IPCSIT), vol 14 3. Jin C, Jin S, Ye J (2012) Artificial neural network-based metric selection for software fault-prone prediction model. IET Softw Inst Eng Technol 6(6):479–487 4. Tamanna et al (2011) Efficiency metrics. BVICAM’s Int J Inf Technol (BIJIT) 3(2):377–381 (issue 6) 5. Singh VB et al (2012) Open source software reliability growth model by considering change-point. International Journal of Information Technology (IJIT) 4(1):405 (issue 7) 6. Sandhu PS, Brar AS, Goel R, Kaur J (2011) ‘‘A model for early prediction of faults in software systems’’. In: Computer and (ICCAE), vol 4, 2nd International Conference, pp 281–285 7. Singh Y, Kaur A, Malhotra R (2010) Prediction of fault-prone software modules using statistical and machine learning methods. Int J of Comput Appl (IJCA) 1(22):8–15 8. Kaur I, Kapoor N (2016) ‘‘Token-Based approach for cross project prediction of fault prone modules’’. In: International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT), pp 514–520 9. Mizuno O, Ikami S, Nakaichi S, Kikuno T (2007) Spam filter based approach for finding fault-prone software modules. Proc. Int, Workshop Mining Software Repositories 10. Bokhari et al (2010) A Comparative Study of Software Requirement Tools for Secure Software Development. BVICAM’s Int J Inf Technol (BIJIT) 2(2):207–216 (issue 4) 11. Jiang Y, Cukic B and Menzies T (2008) ‘‘Cost curve evaluation of fault prediction models’’. In: Proceedings of 19th international symposium on software reliability engineering, USA, pp 197–206 12. Yu L (2012) ‘‘Using Negative Binomial Regression Analysis to Predict Software Faults: A Study of Apache Ant’’. In: International journal of information technology and computer science (IJITCS), vol 8. MECS, pp 63–70 13. Pighin M, Marzona A (2013) Fault persistency and fault prediction in optimization of software release. Int J Inf Technol Comput Sci (IJITCS), MECS 7:15–23 14. V.R. Basili, L. Briand and W.L. Melo, ‘‘A validation of objectoriented design metrics as quality indicators’’, in Technical Report, Univ. of Maryland, April 1995. 15. Aman H (2012) ‘‘An Empirical analysis of the impact of comment statements on fault-proneness of small-size module’’. In: Software engineering conference (APSEC), 19th IEEE Asia Pacific Conference, vol 1, pp 362–367 16. Mizuno O, Hata H (2010) ‘‘An Empirical Comparison of Faultprone Module Detection Approaches: Complexity Metrics and Text Feature Metrics’’. In: 34th Annual IEEE Computer Software and Applications Conference, Seoul, pp 248–249

F-measure

Fig. 7 Comparison of results

results that the token metric is having higher recall and F-measure in the prediction of fault prone modules, while CK metrics provide higher precision. Accuracy is almost same for both the metrics. Figure 7 shows a clear graphical representation of the results. Higher recall in results means that majority of the fault prone modules are covered in the prediction, while higher precision means that out of the predicted fault prone modules, majority are classified correctly. Also, the accuracy using the CK metrics is marginally higher than in the case of token metric. Thus it can be seen that CK metrics and token metric shows opposite trends in the prediction of fault prone modules. Similar results were observed in [16] where the fault prone filtering technique and CK metrics were compared for the prediction. But in [16], higher accuracy was reported for CK metrics than the metric obtained from fault prone filtering, while there is no significant difference in accuracy in our case. This may be due to the fact that the proposed token metric shows higher accuracy than the existing fault prone filtering technique. Thus, the accuracy for the token metric is as high as obtained from the CK metrics.

5 Conclusion and future scope The identification of fault prone modules can be of great aid in the testing process by allocating the resources to the major faulty modules. In this study, the proposed token metric is compared with the traditional object-oriented metrics- CK metrics for predicting the fault prone modules to evaluate our proposed approach. The token metric proved to be equally significant in the prediction. The proposed metric has higher recall and F-measure than the existing set of metrics. Also, it achieves accuracy almost equal to the CK metrics in prediction. Thus the token

123 29

100

Int. j. inf. tecnol. (March 2017) 9(1):93–100 20. Kumar G et al (2014) Optimization of Component based software engineering model using neural network. BVICAM’s Int J Inf Technol (BIJIT) 6(2):405–414 (issue 12) 21. GitHub. Available: https://github.com/ 22. Jureczko M and Madeyski L (2010), ‘‘Towards identifying software project clusters with regard to defect prediction’’. In: PROMISE, 6th international conference on predictive models in software engineering (2010), Available: http://promisedata.org

17. Mizuno O and Kikuno T (2007) ‘‘Training on errors experiment to detect fault-prone software modules by spam filter’’. In: Proc. Of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC-FSE), pp 405–414 18. Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493 19. Bishnu PS, Bhattacharjee V (2012) ‘‘Software fault prediction using quad tree-based k-means clustering algorithm. In: IEEE Transactions Knowledge Data Engineering 24(6)

123 30

Indian Journal of Science and Technology, Vol 9(S1), DOI: 10.17485/ijst/2016/v9iS1/109672, December 2016

ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645

An Efficient Face Detection and Recognition for Video Surveilnlance Dipti Mishra1*, Mohamed Hashim Minver2, Bhagwan Das3, Nisha Pandey4 and Vishal Jain5 ECE Department, Pranveer Singh Institute of Technology, Kanpur - 209305, Uttar Pradesh, India; [email protected] 2 Addalaichenai National College, Srilanka; [email protected] 3 University Tun Hussein Onn Malaysia, Malaysi; [email protected] 4 Gyancity Research Lab, India; [email protected] 5 Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM) New Delhi - 110063, Delhi, India; [email protected] 1

Abstract In this paper, a comprehensive scheme is proposed for unconstrained joint face detection and recognition in video sequences for surveillance systems. Unlike conventional video based face recognition techniques, emphasis is laid on the acquisition of a pose constrained training video database followed by the extraction of well aligned face images from the training videos. We have proposed a new Indian Faces Video Database (IFVD) to demonstrate the performance of the proposed approach especially in the challenging environment of varying skin color and texture of faces from the Indian subcontinent. Our approach produces successful face tracking results on over 86% of all videos. The good tracking performance induces high recognition rates: 85.86 on Honda/UCSD and over 77.49 % on IFVD. The proposed technique is robust and aims to develop a unified framework to address the challenges of varying head orientation, pose and illumination level in a highly integrated fashion so as to benefit from the interdependence between the high fidelity face detection and the subsequent recognition phases.

Keywords: Adaboost, Classification, Face Detection, Face Recoginition, Kalman Tracking, Manifold Learning, SVM

1. Introduction

mentioned in1. However a key drawback of such work is that the framework adopted for developing the requisite mathematical models and algorithms typically considers only a single component at a time while strictly controlling other components, often manually. For instance in these applications1 which considers face recognition, people used databases containing facial mug-shot images which are often manually aligned and cropped. However, practical detection and tracking systems seldom output well aligned images. Thereby such schemes which focus on the individual components, led to error propagation through the various other components. This is one of the principle reasons why such schemes fail to perform satisfactorily when employed in real-world scenarios. Thus

In recent years, face detection and recognition have attracted significant research interest which demands for robust face recognition techniques in surveillance and authentication applications. Video surveillance is used to catch the mischievous person which is unauthorized for access. So we need to have a biometric system which utilizes any biological information like our finger prints, iris, facial information etc. Spatio-temporal face recognition requires various aspects like detection, tracking, face preprocessing for illumination correction and recognition based on appearance manifolds. In literature, several schemes have been proposed in a variety of applications *Author for correspondence

31

An Efficient Face Detection and Recognition for Video Surveilnlance

it is important to design and study the performance of these components comprehensively as part of a real scenario, rather than optimizing the individual components, which tends to ignore the critical interdependencies. The beauty of the principle is that it is substantially easier for the user to provide training videos of the persons rather than manually cropped and registered images. So that the system should be able to successfully recognize suitably aligned face images from these video sequences. The quality of images has a direct impact on the recognition performance. Secondly the training videos should capture and represent a large variety of in-plane orientation and out of plane pose variations. This allows developing face models2,3 to provide a richer description of each subject’s face. A novel algorithm is proposed for high fidelity detection of both the location and orientation of face using OpenCV detector, which facilitates the extraction of well registered face images from the training video set. In context of recognition, the appearance manifolds in2 framework is subsequently employed to construct face models based on these appropriately registered images. Further the pose constraints are imposed during the training video acquisition. A new Indian Faces Video Database (IFVD) has also been constructed to capture the vast diversity in the faces of Indian subjects. Further the IFVD strives to represent maximal pose variations. The tracking and recognition procedure for the test videos is described by Lee4 with a few modifications in order to achieve more robust and smooth detection performance. The performance of the proposed system is demonstrated using Honda/UCSD and IFVD video databases.

to exploit prior information available in the form of spatio-temporal continuity of the video sequence. The visual tracker that was proposed earlier8 is a popular tracking algorithm which employs subspace based adaptive appearance model that is learnt online, rather than tracking a fixed target. This was enhanced in1 to deal with the problem of drift which occurs due to the adaptation of the appearance model to non-targets. While these schemes provide efficient solutions for tracking faces, they are computational complex. In contrast our algorithm offers a simpler solution which also addresses the issue of pose variations while simultaneously avoiding drift by using reinforcements from the face detector. Other commonly used appearance based techniques for face tracking in videos are based on Mean-Shift9 and particle filters10 both of which are pixel intensity based approaches. In contrast our algorithm employs an adaptive motion model in the form of a Kalman filter, which complements the pixel intensity based face detector more naturally. Evident integration is also a important aspect for face recognition in real world videos. The work in3 proposes a technique for face recognition from video sequence using high dimension probability density estimation followed by density matching using the Kullback Leiber Divergence11. However a major drawback of the scheme is that it is offline. The probabilistic appearance manifold framework3 offers an efficient on-line evidence integration technique for video based face recognition. This work was further extended in4 to also incorporate face tracking using appearance manifolds. However appearance manifolds are not available during extraction of faces from training videos. This is the major thing which motivated us to the current work. The proposed scheme has increased robustness to pose variation together with a smoother tracking and recognition performance. Typically OpenCV detector is employed to detect and locate the frontal poses, then the bounding box are extracted from the video frame for face preprocessing for illumination correction to increase the facial features needed for recognition. The processed face is then resized and features are fed to classifiers for recognition.

2. Literature Survey For face detection, Viola and Jones proposed a scheme in5 which continues to be a popular frontal face detector. The work was later extended in6 to detected faces with varying poses like pan and tilt and orientations like in-plane and out-of-plane poses. Viola Jones detector based on Haarlike rectangular features were employed, learnt using Adaboost7 for different combination of the pose and inplane rotations in order to draw the final bounding box around the face. But the problem with that is the training time to train the videos and complexity is very large. Moreover a drawback of that scheme is it employs face detection in each frame, rather than employing face tracking initialized by a face detector, and therefore fails 2

Vol 9 (S1) | December 2016 | www.indjst.org

3. Face Detection OpenCV detector is able to detect the Spatio-temporal location of face in video frame. It is based on Viola and Jones detection which is using Haar Like rectangular

32

Indian Journal of Science and Technology

Dipti Mishra, Mohamed Hashim Minver, Bhagwan Das, Nisha Pandey and Vishal Jain

features. The limitation of this detection is it only able to detect the frontal faces and also detecting some non-faces in a crowded environment which is undesired. Most of the non-faces detected by the OpenCV detector are deprived of skin color pixels. To eliminate these non-faces we have incorporated Skin filter as shown in figure 1. Skin filter eliminates the detected objects if it doesn’t contain certain number of skin pixels. We have implemented Color based and Gaussian Model based Skin filter to detect skin pixels. Color based skin filter uses particular parameter for the RGB color space defined in12. For skin pixels, skin ratio is calculated which is the ratio of total number of skin pixels and total number of pixels in the face. Skin ratio above the threshold is finally treated as face. The threshold is learnt by running the filter on various databases i.e. Choke Point Video database, Honda/UCSD Video database, Indian face video database. The limitation of this filter is luminance exists in color space which depends on position of light source. Secondly only certain skin colors can be detected due to which false positives become high. The luminance is removed by using Gaussian model based Skin filter which uses chromatic color space i.e., normalized RGB.

r=

filter to check whether the detected object is face or nonface. This filter uses Pose faces for comparison between face and non-face. Each image of GTAV face database is cropped (as shown in Figure 3) across face and mean of all these images is calculated to get the full frontal face. Each pose face image (N×N) is converted to column vector (N2 ×1) and all the five images are accumulated in matrix T. Then the size of this matrix is further reduced by using PCA to reduce computation and called as projection matrix фi .The object to be detected is also converted to column vector and resized by using PCA and is called test projection matrix фt .Then the euclidean distance between фi and фt is calculated. Among these five distance the smallest is compared with the threshold (learnt over various databases) to prove whether it is a face or non-face.

4. Face Detection Facial poses can be broadly classified into two types i.e. frontal and non-frontal (as shown in Figure 4). The non-frontal poses are further classified into in-plane and out-of-plane poses. The OpenCV detector is not able to detect non-frontal faces, so to detect non-frontal faces there is need of tracking. Sometimes tracking is also needed when there is change of illumination over the face. Efficient recognition takes place on faces with all facial features. Frontal and in-plane pose contains all facial features. So we have implemented a face tracker to detect these non-frontal poses. Mean shift tracking is implemented which is a non-parametric (kernel) based tracking and uses color histogram for tracking. It is an iterative mechanism which compares color histogram of the target object and the candidate region until the similarity coefficient i.e. Bhattacharya coefficient is maximized (as shown in Figure 5). Gaussian kernel of size equal to target window is used and then the kernel values are added to target window. Then the color histogram of target model

R B and b = R+G+ B R+G+ B

where R,G,B= Red, Green and Blue Channel pixel of RGB image. The Gaussian model for skin detection is used as discussed in13. The model consists of large skin dataset14 with various age groups (young, middle and old) and racial groups. Total learning sample size is 245057 in which 50859 are the skin samples and rest is non-skin samples. Skin samples are passed through low pass filter to reduce noise in the dataset. According to13, distribution of skin color is implemented using Gaussian Model N(µ,σ)

1 µ= N

σ=

1 N

N −1

∑x i =0

Where xi=[r,b]T

i

(1)

and search window is calculated. Bhattacharya coefficient

N −1

∑ ( x − µ )( x − µ ) i

i

T

ρ ( y ) is used to calculate the similarity score between the

(2)

color histograms.

Where N is Number of skin pixels and xi is the RGB pixel matrix of ith skin sample. Skin pixel by using Gaussian Model based Skin filter is calculated as shown in figure 2 By using this filter detection is relatively faster, false positives are low and detection can be taken over wide variety of skin color. Skin filter is then followed by PCA i =0


n −1

ρ ( p ( y ), q ) = ∑ ( pu ( y )qu )

(3)

u =0

Where p(y) is the n- bin histogram of search y location, q is the n-bin histogram of target and n is the size of histogram bins. Larger the value of ρ ( y) , larger the

33


3


similarity score and the search window will move in the direction of increasing Bhattacharya coefficient, until it is maximized. Mean shift tracking is able to track nonfrontal faces but the orientation problem (in-plane poses) is still not taken into consideration. Recognition can be further increased by resolving orientation problem. There is a need of a new tracker which tackle both i.e. tracking of non-frontal faces as well as orientation problem. The proposed scheme (as shown in figure 6) employs faces which are localized in the video frame using bounding boxes. Each such bounding box is characterized by a quadruple [x,y,s,θ], where (x,y) denotes the coordinates of the center of the bounding box from the left corner of the frame, s denotes the mean and height of the bounding box and θ is the angle by which the bounding box is rotated with respect to horizontal. The algorithm leverages the spatio temporal characteristics of a typical surveillance video captured by a static camera. Since the algorithm is online, it can be assumed without loss of generality that the bounding box for i-1th frame is available while localizing the face in the ith frame. Typical modern surveillance cameras record high quality videos at 25-30 frames per second. Thus the ROI in the ith frame is substantially narrowed to within a few pixels around µi-1. Employing this prior information, N candidate bounding boxes are generated for the ith frame by random sampling from the multivariate Gaussian distribution centered at µi-1 (Best N=20).

Where Ii and It are the source and template image. The coordinate with the largest score is the estimate of the location of the template within the input image. Figure 7 shows the results of template matching. For a sufficiently large N, one of the candidate bounding boxes is oriented in the same direction as the actual face with a high probability. Note that if the face is oriented at an angle θ measured counter-clockwise from the horizontal, rotating the frame by 900 - θ makes the face vertical.The bounding boxes location are rectified by using OpenCV and template matching on rotated frame. But the template matching is not an efficient tracker as it does not work on rotated faces (The template can never be matched).So we implemented Kalman filter to correct bounding boxes of template matching for rotated frames using past observations. Kalman filtering is a recursive analytical technique to estimate time dependent physical parameters in the presence of noise. The Kalman filter uses the equations of kinematics to predict the movement of pixel in the next frame from the past statistical variations I the measurements. This comprises of 2 stages the prediction and the correction stage. The block diagram of working of Kalman filter is shown in Figure 8. The assumptions for the face tracking model are the velocity model and the measurement (wi) and the process noise matrix(vi) are taken as Gaussian distributed such that

{

ρ1.ρ 2 .ρ3 .ρ 4 .......  N ( µ , σ ) (4) Where ρi is the bounding box for orientation estimation, µ is the previous bounding box i.e., [ x, y, s, θ ] , s=

w+h 2 Where

{

E vk vl

=i 1 =j 1

4

i

i


t

t

fork = l  }= 0R otherwise  k





{

T

}= 0 ∨ l , k

Where Qk and Rk are symmetric positive semi-definite matrices. In velocity model, the system state matrix (bi) will be: bi= [position, velocity]

The bounding box is given by [ x, y, s, ] ,where (x,y)= position of face from top left corner of frame.s is the mean of width and height and theta is the face orientation angle. System state matrix for Kalman filter will be .

.

.

.

.

.

.

.

T bi = [ x, y, s, θ , x, y, s, θ ] , where [ x, y, s, θ ]=[1,1,0.1,1]

The initial error covariance matrix (P0) is given higher value to get a good estimate i.e. P0=I8.104 and error vari-

c

i

T

E wk vl

w, h are the width and height of the

detected face and σ = diag[16,16, 5,144] . Then to tackle tracking of non-frontal poses, we implemented Template matching which is a technique to find template image in source image by using different types of scoring. One of the most common measure used in template matching is to compare the similarity of different patches of input image with the template is SAD (Sum of Absolute Differences)15. Let Ii and It denotes intensities of the input and template images respectively. The matching score at the image coordinate (x,y) is given as, T T (5) = SAD ( x, y ) ∑ ∑ I ( x , y ) − I ( x , y ) r

}

Q fork = l  T E wk wl =  k  0 otherwise

t

34



ance is taken as equal for all components of state vector i.e. R= 42.25.I4. The estimated bounding box bi- using ith

quite large, a single linear support vector classifier yields poor results. To overcome this problem, we employ multiple weak linear classifiers integrated together using the principle of Adaboost18 to obtain a strong classifier. Each of the weak classifiers is obtained by learning one linear SVM on a subset of the training data obtained by sampling from a given discrete distribution over the training samples. The final classifier is a linear combination of the SVMs trained in each iteration, with the weighting coefficient set as a decreasing function of the classification error. The confidence score of the classifier is used as the metric for choosing the best candidate bounding box. Using the SVM classifier each of the N bounding boxes is scored. Final Score Hk(z) with the maximum value is chosen as best frontal face estimate of the previous detected face.

bounding boxes generated from Gaussian distribution is = bi− Abi + wk

, where state transition matrix A is given as

I I  A =  4 4 04 I 4  And the process noise is  a ay a a  wi =  x , , s , θ , ax , a y , as , aθ  . 2 2 2 2  Updating

the

error

covariance

equation

as

T

−

= Pi APi −1 A + Q , Where Q is the process error covaT

riance matrix i.e. E{wi wi } Now the Kalman filter gain is given by

= K Pi − H T ( HPi − H T + R ) −1

H k ( z )=

(5)

The measurement stage is updated by + i

− i

m i

− i

b = b + K (b − Hb )

i =1

where Hk(z) = Final score for kth bounding box, α i

(6)

are the weights, hi(z) is the Classifier trained for HoG features of individual person= aTz +b, z is HoG feature of person (1116×1), T is Number of weak classifiers (Thirty), k isindex of the bounding boxes [1, 2, .N]. Face preprocessing is also needed to increase the human facial features under following conditions to obtain good results in recognition. There may be illumination variation when the light source is changing or there may be illumination change caused due to pose variations. So there is highly need of normalizing the illumination of the face. Illumination correction algorithm is based on Weber law19 which states that human perceives any stimuli relative to the background rather than perceiving it in absolute terms.

Where H is called the output transition matrix and is +

given as H = [ I 4 , 04 ] , bi is the corrected value of sysm

tem state matrix and bi is measured system state matrix i.e. bi = [ x, y, s, θ ] .So the updated error covariance is given as, m

P= ( I − KH ) Pi − i

(7) Where Pi is estimated error covariance matrix. Therefore the proposed detailed algorithm for template matching and Kalman filtering is shown in Figure 9 and its results are shown in Figure 10 and Figure 11 respectively.The proposed QoR score is a measure of the extend of the match. between the alignment of the bounding box and that of the enclosed face. In order to characterize this, a linear classifier is trained on the Head Pose Database16 which consists of 2790 labeled face images of 15 people with varying degrees of pan and tilt. The faces with either pan or tilt in the range [-150,+150] are chosen as the positive training samples while the remaining images are used as negative samples. The classifier is trained on the HoG (Histogram of Oriented Histograms) features (z)17 extracted from the images in the two classes. Since the pose variation (as shown in Figure 11) within these two classes is itself


T

∑α i hi ( z ) (8)

∆I (9) I Where ∆I is increment threshold, I is initial stimu-

k=

lus intensity, k is weber fraction (remains constant with variable I) Now according to Lambertian reflectance model20, a face image is represented as: F(x,y)= R(x,y)I(x,y)

(10)

Where F(x,y) = image pixel value, R(x,y)= reflectance value and I(x,y) denotes the illuminance at each pixel. Weber Law Descriptor (WLD) is given by

35


5


p −1

ε ( xc ) = arctan(α ∑ i =0

x c − xi ) xc

Where dH is the Hausdorff distance between the image I and Mk. Hausdorff distance measures the greatest of all distances from apoint in one set to the closest point in the other set.

(11)

Where α i = adjusting parameter (magnifying/ shrinking) the intensity difference between neighboring pixels, p is the number of neighboring pixels, xc is the center pixel, xi = neighboring pixel. WLD applied to the face images F(x,y) gives an illumination invariant representation of F known as “Weber-Face” (WF).

d H ( x, y ) = max{sup x∈X inf y∈Y d ( x, y ),sup y∈Y inf x∈X d ( x, y )} Where X,Y are two different sets d(X,Y) is the distance between them. Distance between manifold and image is given by

F ( x, y ) - F ( x - i∆x, y - j ∆y ) ) F ( x, y ) j∈A

d H ( I , M k ) = ∫ d ( x, I ) PM k ( x I )dx

WF ( x, y ) = arctan(α ∑∑

I ( x − i∆x, y − i∆y ) ≈ I ( x, y )

Where PM k ( x I ) = conditional probability of x being optimal point on Mk given I

Using total probability theorem,

(13)

p

PM k ( x I ) = ∑ PC i ( x I ) P (C i k I )

So the equation becomes R ( x, y ) - R( x - i∆x, y - j ∆y ) WF ( x, y ) = arctan(α ∑∑ ) R ( x, y ) i∈ A j∈ A

i =0

(14)

(17)

k

Using both above equation we get, p

After applying Weber’s law, the reflectance gets constant on face image and the illumination corrected image is obtained as can be seen in Figure 12.

d H ( I , M k ) = ∑ P(C i k I ) ∫ d ( x, I ) PM k ( x I )dx (18)

5. Face Recognition

to pose subspace Cki in manifold Mk. P (Ck | I ) and

k * = arg mind H ( I , M k ) k


Mk

i =0

PC i ( x I ) is conditional probability of x belonging k i

P(Ck i | I ) is the probability that Cki will contain x that

Appearance manifolds2 are low dimensional subspace based representations for face images of a person, suitable for drastic pose variations. Each person is represented by a collection of PCA subspaces (each of which corresponds to a particular pose) and the corresponding probabilities of transition from one pose to other. The manifolds are constructed from the images extracted from the training videos. We used the distance metric proposed in2 to find the distance of a new image from the appearance manifolds of each individual in the training database. Manifolds represent space which resembles Euclidean space near each point and is non-linear in nature. The collection of linear subsets is called Pose Subspace. Each pose subspace is an affine plane computed through PCA. The connectivity between the pose subspaces is learnt over the training videos. Let Mk be the manifold for kth person(shown in Figure 13). Mk will be the collection of pose subspace (Cki) where i=1….N for N pose subspace. To recognize the person we calculate the minimum Hausdorff distance between manifold (Mk) to I (test image).

6

(16)

Mk

(12) Where A={-1,0,1} The illumination component is commonly assumed to vary slowly, which gives us: i∈A

has minimal distance from I P

P(Ck i | I t , I 0:t −1 ) = α P( I t | C i kt )∑ P(C i kt | C i kt−1 )P(C i kt−1 | I t −1, I 0:t −2) (19) j =1

where p

∑ i =1

α is normalization factor which ensures that

P(Ck i | I t , I 0:t −1 ) =1, i

P(It|Cki) is the probability that face image I t ∈ Ck can be conveniently calculated as P ( I t | Ck i ) = γ k t exp(

d H 2 ( I t , Ck i ) ) 2σ 2

(20)

Where γ k is the normalization factor ensures t

P

∑ P(I i =1

t

| Ck i ) = 1 ,

P(C i kt | C i kt −1 ) is the transition i

probability of x in the current frame lying in Ck such that i

the previous frame it belonged to Ck . Therefore the face

recognition algorithm is given in Figure 10.

(15)

36



6. Results

faces is higher than non frontal.Face recognition is based on presence of facial features like size and location of nose, eyes, mouth etc.Non-Frontal faces lack these many of these features which results in low recognition. Table IV shows the difference in recognition rate of frontal and non-frontal faces.

We have experimented into two databases i.e., Honda/ UCSD video database4 and Indian face Video database. Honda database is used to provide a standard video database for providing a wide range of different pose for evaluating face tracking/recognition algorithms. The resolution of each video sequence is 640×480. The recognition is performed by using 9 training video and 1-2 test videos per training video. IFVD characterize the effect of structured pose variations (glasses, beard, moustache and hair styles). The database consists of 30 training (one video per person) and 38 test videos and has significant pose variations with the resolution of 710×576.In face detection we have experimented on two different algorithms i.e., (i) Voila Jones (ii) Voila Jones and filters. Table I is showing the variation in results. In face tracking we have performed on two different algorithms i.e., mean shift tracking and another is Template matching and Kalman Filter based tracking(TMKL) (shown in Table II)

Figure 4. Different types of poses.

Table V contains the training and testing time for the system for varying image sizes. The reason for sluggish increase in training time is due to presence of many other functions like detection and preprocessing which are not heavily dependent on feature size. The reason for sudden increase in testing time is due to Nearest Neighbor Classifier is heavily dependent on size of feature vector as it takes norm of vectors. The software can be made on many platforms such as Python, C++ and MATLAB. The video used for testing is of 12 sec with resolution of 720×576.The run time on Python, C++ and MATLAB is 25.57, 20.39.95.985603 seconds respectively++ and OpenCV have some compatibility problems with the GUI.

Figure 1. Skin filter.

Figure 2. Gaussian based Skin filter.

7. Conclusions An end-to-end integrated face detection, tracking and recognition system has been proposed in this paper. The framework developed emphasizes the extraction of appropriately aligned faces from a pose constrained training video database. This has been shown to lead to a significant enhancement in the performance of the video based face detection and recognition system. A new IFVD has been developed to test the performance of the proposed scheme, along with existing video database. The

Figure 3. Cropped Images at different angles.

In face recognition, appearance manifold is implemented on two types of tracking i.e. mean shift tracking and TMKL. The results are shown in Table III. Most of the faces detected by mean shift but not by TMKL are non-frontal and the recognition of frontal Vol 9 (S1) | December 2016 | www.indjst.org

37


7


proposed system shows a robust tracking and recognition performance in the presence of orientation, pose and illumination variation with minimal system complexity.

Figure 5. Histogram coefficient.

Comparison

by

Bhattacharya

Figure 9. Face Tracking Algorithm.

Figure 10. Face Tracking results using Kalman Filter. Figure 6. Proposed Algorithm.

Figure 7. Results of template matching.

Figure 8. Block diagram operation of Kalman filtering.

8


Figure 11. Different poses at different angles of pan and tilt.

38



Table 2. Face Tracking results of two Algorithm in (%) Database

Honda/UCSD Database

Indian Video Database

Algorithm

MS tracking

TMKL tracking

MS tracking

TMKL tracking

Accuracy

84

82.4

87

86.4

Table 3. Face Recognition results of tracking Algorithm (%)

Original images Weber Face images Figure 12. Weber faces obtained after applying Weber’s law.

Database

Honda/UCSD Database


Algorithm

MS tracking

TMKL tracking

MS tracking

TMKL tracking

Accuracy

78.67

85.86

70.67

77.49

Table 4. Recognition results of Frontal vs Non-Frontal poses in (%)

Figure 13. Appearance Manifold for k person. th

Feature Size

24×24

32×32

48×48

Frontal Face

76.27

77.39

75.96

Non-Frontal Poses

8.3

6.3

7.14

Table 5. Running time of various feature size Feature Size

24×24

32×32

48×48

Training

5.88 hrs

5.96 hrs

6.03 hrs

Testing

1.21 s

2.44s

7.53 s

8. References 1. Kim M, Kumar S, Pavlovic V, Rowley H. Face tracking and recognition with visual constraints in real world videos. IEEE Conference Computer Vision and Pattern Recognition. 2008 Jun. p. 1–8. 2. Lee KC, Ho J, Yang MH, Kriegman D. Video-based face recognition using probabilistic appearance manifolds. Computer Vision and Pattern Recognition, 2003 Proceedings, 2003 IEEE Computer Society Conference. 2003; 1: I–313. 3. Shakhnarovich G, Fisher JW, Darell T. Face recognition from long-term observations. Computer vision ECCV, Springer. 2002; 851–65. 4. Lee KC, Ho J, Yang MH, Kriegman D. Visual tracking and recognition using probabilistic appearance manifolds. Computer Vision and Image Understanding. 2005; 99(3):303–31. 5. Viola P, Jones M. Robust real-time object detection. International Journal of Computer Vision. 2001; 4:34–47.

Figure 14. Face Recognition Algorithm at testing phase.

Table 1. Face Detection results of two Algorithm in (%) Database

Honda/UCSD Database


Algorithm

Voila Jones

Voila Jones and Filters

Voila Jones

Voila Jones and Filters

Accuracy

88.4

94.1

86.5

93.7


39


9


6. Jones M, Viola P. Fast Multiview face detection. Mitsubishi Electric Research Lab TR-20003-96. 2003; 3:14. 7. Freund Y, Schapire RE. A decision-theoritic generalization of online learning and an application to boosting. Computational learning theory, Spinger. 2008; 23–37. 8. Ross DA, Lim J, Lin RS, Yang MH. Incremental learning for robust visual tracking. International Journal of Computer Vision. 2008; 77(1-3):125–41. 9. Comaniciu D, Ramesh V, Meer P. Real-time tracking of non-rigid objects using mean shift. Computer Vision and Pattern Recognition, 2000 Proceedings. IEEE Conference. 2000; 2. p. 142–9. 10. Nummiaro K, Koller-Meier E, Van Gool L. An adaptive color-based particle filter. Image and Vision Computing. 2003; 21(1):99–110. 11. Kullback S. Information theory and statistics. 1968. 12. Peer P, Kovac J, Olina S. Human Skin color clustering for face detection. EUROCON 2003, International Conference on Computer as a Tool. 2003. 13. Yang J, Waibel A. A Real-Time Face Tracker. CMU CS Technical Report.

10


14. Bhatt R, Dhall A. Skin Segmentation Dataset. UCI Machine Learning Repository. 15. Dawoud NN. Fast template matching method based optimized sum of absolute difference. 16. Gourier N, Hall D, Crowley JL. Estimating face orientation from robust detection of salient facial structures. FG Net Workshop on Visual Observation of Deictic Gestures, FGnet (IST-2000-26434) Cambridge, UK. 2004; 1–9. 17. Dalal N, Triggs B. Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition. IEEE. 2005; 1:886–93. 18. Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Machine Learning. 1999; 37:297–336. 19. Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W. WLD: A robust local image discriptor. IEEE Transactions Pattern Analysis and Machine Intelligence. 2010; 32(9):1705–20. 20. Lambert JH. Photometric sive de mensure gratibus luminis colorum umbre Eberhard Klett. 1760.

40


Indian Journal of Science and Technology, Vol 9(46), DOI: 10.17485/ijst/2016/v9i46/106917, December 2016


A Discussion about Upgrading the Quick Script Platform to Create Natural Language based IoT Systems Anirudh Khanna1*, Bhagwan Das2, Bishwajeet Pandey3, DMA Hussain4 and Vishal Jain5 Chitkara University, Kalu Jhanda - 174103, Himachal Pradesh, India; [email protected] 2 University Tun Hussein Onn Malaysia, Malaysia; [email protected] 3 Gyancity Research Lab, India; [email protected] 4 Aalborg University Denmark; [email protected] 5 Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi – 110063, India; [email protected] 1

Abstract Objectives: With the advent of AI and IoT, the idea of incorporating smart things/appliances in our day to day life is converting into a reality. The paper discusses the possibilities and potential of designing IoT systems which can be controlled via natural language, with help of Quick Script as a development platform. Methods/Statistical Analysis: Quick Script (or QS) is an open-source, easy to learn tool made by our team of student developers for programming virtual conversational entities. This paper focuses on a discussion about how some improvements can be made in the underlying implementation of QS and the resulting uncomplicated and simple platform which can be used to create natural language based IoT systems. It explores the architecture/design pattern required for creating such systems. Findings: This exploration reveals how the idea of turning a simple NLP tool to handling IoT systems can be implemented, and where all the necessary changes/ additions are to be made. The benefits of this will include sharing the power of controlling and even programming (up to some extent) to the user end. As well as providing a simple intermediary to make communication between man and his machines a little more natural. Application/Improvements: It has always been a fantasy in movies to have appliances and gadgets work according to our speech inputs in real time. We humans have always tried to take complete advantage of technologies for living better and working more productively. The idea behind this paper drives for the same cause. Applications of any natural language based service can be endless–ranging from home to industry. With the speech based interaction, this will even help the physically disabled people.

Keywords: Artificial Intelligence, Internet of Things, Natural Language Processing, Quick Script, Smart Devices

1. Introduction

above definition. In fact, IoT encourages more efficient ways of interaction between humans and various devices. It has drastically changed the ways in which interaction happens between humans-and-machines as well as machines-and-machines and pushed the concept of smart living to a new level. IoT, together with the other emerging Internet developments will act as the backbone of the digital economy and society1. That’s why IoT is also used for active contour modeling in object tracking2.

Internet of Things (IoT) is a system of interrelated computing devices and electronic appliances that are identified uniquely on a network and have the ability to transfer and receive data over the network and perform actions according to given data smartly, without requiring constant human supervision. It is to be noted that once developed and presented for use, working with IoT systems does not remain that complex as it sounds in the *Author for correspondence

41

A Discussion about Upgrading the Quick Script Platform to Create Natural Language based IoT Systems

There is much research going on to find a way to apply IOT in network security3 and computer memory4. Internet of Things describes how ideas and thoughts of human beings could create a technological or electrical connection between the things itself and people which could benefit them and can create ease in their life5. In the earlier days, this interaction was mostly localized – it was not possible to control appliances over a network; use of switches and similar interfaces was the only way to turn machines ON/OFF or to control them. Then came the concept of smart devices and Internet of Things enabled homes – where every appliance is given an IP address and with the power of Internet, we are able to control our devices from anywhere in the world with any device that has a connection. But what now – what about the next level of human and machine interaction? We foresee that it is the interaction in natural language which will take hold in the future. And developments are already happening in this field; take personal assistants like Siri and Cortana for example. Controlling cell phones just by talking to them used to be a distant dream two decades back. If it is possible to control a smart phone with natural language, why not control every device, every appliance in our homes and offices in such a way? By combining the concepts of Artificial Intelligence and Internet of Things, it is definitely possible to convert today’s “smart devices” to “Intelligent Devices”. Chatbots are a prolific example of Natural Language Processing (NLP) system. There are some languages and development tools which are used to create chatbots, and they can be employed with IoT systems also. The student team from Chitkara University developed QuickScript as one such platform for designing programs to interact with the computer in natural language. It is an open source language which focuses on simplicity and coherence while programming virtual conversational entities. It can be downloaded from its website6 or GitHub repository7. Figure 1 shows the Quick Script interface. The internal implementation of QuickScript is in C language, and also there is ongoing development of it better versions, which will positively be available from time to time. Its syntax is minimalistic and simple and can be easily modified (upgraded) according to the user needs. These are the reasons why QuickScript can become a perfect tool to add natural language processing to IoT systems and smart appliances.

2

Vol 9 (46) | December 2016 | www.indjst.org

Figure 1. QuickScript Interface (in C implementation).

2. Quick Script Syntax Every Quick Script program consists of lines of text called “entries”. Each entry can be (generally) seen as a single unit of knowledge in the NLP system being designed. It is often mentioned that the power of Quick Script is its simplicity, and to justify that, some light must be shed on the structure of Quick Script syntax. Each statement (entry) can be seen as two parts – one part (called “prefix”) which tells the purpose of that very statement, i.e. how it acts in the program and the second part (called “content”) consists of the actual content (knowledge) which is to be stored in the knowledge base. Entries can be classified in various types based on their prefix. They can be patterns (that are matched with a user input), responses (text the chatbot will reply when a pattern is found), comments, SRAI statements, learn commands etc. Simplest set of entries can be a group of pattern-response pairs, which surely can support a crude form of conversation, but is practically not sufficient for a perfect chatbot. Still, it is enough to give a fair idea about programming in QuickScript: >> HELLO ## Hi! Good to see you. >> WHO ARE YOU ## I am a chatbot written in QS. >> ARE YOU FEMALE ## I am a male chatbot. Figure 2 shows the resulting conversation based on the above given code. Figure 3 briefly displays some types of QuickScript entries with examples.

42


Anirudh Khanna, Bhagwan Das, Bishwajeet Pandey, DMA Hussain and Vishal Jain

>> TURN ON THE FAN PLEASE {ON} >> TURN OFF THE FAN PLEASE {OFF} >> WILL YOU SWITCH OFF THE LED {OFF}

Figure 2. Screenshot of conversation in the chatting interface. Figure 4. Conceptual diagram for a natural language based smart system.

Here, the simplicity of the syntax is the most important thing to be noted. All the underlying programming of turning an appliance ON or OFF is completely hidden from the designer of IoT system – supporting abstraction! In fact, this will enable a wider number of people to easily design some of their own IoT appliances themselves. Moreover, with the new External Learning feature of Quick Script (see complete Quick Script documentation8 for details), the end user can make modifications to existing NLP database and also add some of his own – he just has to learn a few, simple syntax and commands of Quick Script.

Figure 3. Various types of entries with examples (Refer to QuickScript documentation file for details8).

3. Quick Script to be used with IoT Enabled Systems Due to the simplicity in syntax and uncomplicated implementation, Quick Script can be readily upgraded with newer syntax of entries that can be used in programming NLP for smart appliances. As shown in Figure 4, the NLP created in Quick Script can be interfaced with an existing NLP system. Also, with some necessary additions in the underlying programming, Quick Script can be directly employed to control the functions of smart devices. Now the question arises – what can be the possible up gradations implemented in QuickScript to make it suitable for programming any such thing in real life? While a number of new kinds of entries must be added to make it work satisfactorily at an industrial level, just for a glimpse of how it will look like, we can consider adding simple {ON} and {OFF} queries which will be implemented as in code given below:


4. Advantages There are various advantages in the above described ideas of utilizing NLP with Internet of Things. In fact, a 2010 study on IoT published by the IEEE Computer Society9 proposes “Smart-Object Typology”, where the features of such a system are divided into three parts - Awareness (ability to understand real world events and human activities), Representation (the programming model — in particular, programming abstractions) and Interaction (ability of interaction with the user). It is a proudly stated fact that QuickScript has the potential to enhance all of these features in an IoT system.

43


3

A Discussion about Upgrading the Quick Script Platform to Create Natural Language based IoT Systems

Figure 5 shows an example where user may himself program the software to turn appliances ON and OFF, with the help of External Learning feature.

that were discussed in this paper. Also, gratitude to the whole QuickScript team, which joined the open source QuickScript project and with time, shaped it to betterment.

7. References 1. Vermesan O, Friess P, Guillemin P, Gusmeroli S, Sundmaeker H, Bassi A, Jubert IS, Mazura M, Harrison M, Eisenhauer M, Doody P. Internet of things strategic research roadmap. Internet of Things: Global Technological and Societal Trends. 2011; 1:9-52. 2. Musavi SH, Chowdhry BS, Kumar T, Pandey B, Kumar W. IoTs enable active contour modeling based energy efficient and thermal aware object tracking on FPGA. Wireless Personal Communications. 2015 Nov; 85(2):529-43. 3. Singh D, Garg K, Singh R, Pandey B, Kalia K, Noori H. Thermal aware internet of things enable energy efficient encoder design for security on FPGA. International Journal of Security and its Application. 2015 Jun; 9(6):271-8. 4. Verma G, Moudgil A, Garg K, Pandey B. Thermal and power aware Internet of Things enable RAM design on FPGA. IEEE 2nd International Conference on Computing for Sustainable Global Development (INDIACom); 2015 Mar 11. p. 1537-40. 5. Bhatia EK, Ohri S, Kaur G, Dhankar M, Dabas S. Future perspective and current aspects of internet of things enable design. International Journal of Software Engineering and its Applications. 2015; 9(8):127-32. 6. QuickScript - Easy to learn language for Artificial Intelligence. Available from: http://anirudhkhanna.github. io/QuickScript 7. GitHub: anirudhkhanna – QuickScript. Available from: https://github.com/anirudhkhanna/QuickScript 8. GitHub: QuickScript/documentation. Available from: https://github.com/anirudhkhanna/QuickScript/tree/master/documentation. 9. Kortuem G, Kawsar F, Sundramoorthy V, Fitton D. Smart objects as building blocks for the internet of things. IEEE Internet Computing. 2010 Jan; 14(1):44-51.

Figure 5. End users may be able to design/supervise a natural language IoT of their own.

5. Challenges While discussion about such an implementation is quite easy, real development poses some challenges: • Portability: Current version of QuickScript is not portable enough to suit every machine. Working with Internet of Things requires portability to a great extent. Although, QuickScript usage will be probably fit when operated from a computer system, but not on many portable devices yet. • Security: IoT brings every appliance to the Internet and that means a great threat to security and privacy. Necessary security measures are not yet implemented in QuickScript, because it was never intended for such a purpose.

6. Acknowledgements Sincere acknowledgement and thanks to the mentors who encouraged the development of the ideas

4


44


BIJIT - BVICAM’s International Journal of Information Technology Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)

Lexical, Ontological & Conceptual Framework of Semantic Search Engine (LOC-SSE) Gagandeep Singh Narula1, Usha Yadav2, Neelam Duhan3 and Vishal Jain4 Submitted in February, 2016; Accepted in July, 2016 Abstract – The paper addresses the problems of traditional So, there is need to develop search engine that analyses user keyword based search engines that process query syntactically query and produces meaningful results with higher precision rather than semantically. In order to increase degree of and low recall. relevance and higher precision to recall ratio, it describes The following paper is categorized into following sections. proposed architecture of Semantic Search Engine (SSE) Section 2 describes objective and scope of research carried out in which incorporates Google search results as input and given paper. Section 3 presents brief survey of research processes them with the help of Semantic Web (SW) conducted in context of evolution of SSE’s and their technologies. Modules to accomplish various tasks like query methodologies. Section 4 provides bird’s eye view of Semantic processing, importing existing ontologies and extraction of Web layered architecture and comparative analysis of studied knowledge have been introduced in proposed framework. At literature survey. Section 5 describes proposed SSE framework last, the PROMPT algorithm is being applied to compare along with its implementation. Section 6 validates higher query graph and document graph which leads to improved precision to recall ratio in comparison to GOOGLE. Section 7 concludes the given paper followed by references. results that are presented to user.. Index Terms- Semantic Web (SW), Ontology, PROMPT, Protégé 3.4.8, Jena, Resource Description Framework (RDF) and Knowledge Retrieval

2. 0. OBJECTIVE, SCOPE & FINAL OUTCOME Objective “To enhance GOOGLE [2] search results with the help of Semantic Web technologies”. Scope A user would be able to learn about semantic web technologies, semantic web tools, ontology development for knowledge representation and storing that knowledge using some open source framework. Final Outcome The intended final outcome of work carried out is precise and relevant search results produced by enhancing GOOGLE search results with the employment of SW technologies.

1.0. INTRODUCTION Traditional search engines are tools for retrieving information from massive sources on the web.The results are being produced by performing keyword based search most of the tme.The main drawback of search engines is lack of relevance. To illustrate the problem in a better way, consider a query “Mobile phones with red cover” submitted to a traditional search engine. It produces relevant as well as irrelevant results in relation with terms-mobile phones, red lotus, flower and cover. The search experience does not consider stopping words, auxiliary verbs that reflects the meaning of given statement. Likewise in above query, the term “with” has lost its significance due to which results are being produced in context of lotus and red flower. In order to reduce this ambiguity and perform intelligent search, concept of Semantic Web (SW) came into existence in 1996 as envisioned by Tim Berners Lee [1]. SW is defined as global mesh of information in machine interpretable format [2].It is practically not feasible to annotate the entire web content into semantic tags so that current search engines could behave like Semantic Search Engines (SSE).

3.0. RELATED WORK Several studies that have been conducted with an aim to build SSE and ranking of results as follows: Debajyoti et.al [3] proposed semantic search framework that produces relevant results by performing mapping between classes and instances with the help of RDF codes. Fatima et.al [4] adds query optimizer, user interface and processor in its framework but it too has some limitations. Zhang et.al [5] performed keyword based search by finding RDF files and compares keywords with its contents. Swati et.al [6] proposed information retrieval system in context of university domain but it does not evaluate GOOGLE search results. Kumar et.al [7] made use of mapper and query processor for representation and scanning of keywords respectively. For comparative analysis of these works, refer to Table 1.

1

Research Scholar, M.Tech (CSE), CDAC Noida [email protected] 2 Assistant Professor, CDAC-Noida [email protected] 3 Assistant Professor (CE), YMCA University of Science & Technology, Faridabad, India [email protected] 4 Assistant Professor, Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi [email protected]

4.0. SW ARCHITECTURE According to Kevin Kelly [8], it suffers from fax effect which means that development of semantic web is costly and its technologies have not been utilized fully. But, still most of researchers are trying their hands on this web technology to achieve machine- human interaction [8]

Copy Right © BIJIT –2016; July - December, 2016; Vol. 8 No. 2; ISSN 0973 –5658 45

1004

BIJIT - BVICAM’s International Journal of Information Technology

5.0. COMPONENTS OF PROPOSED SSE FRAMEWORK The proposed framework as outlined in Fig 2. consists of three phases: • Generation of user query graph with the help of SW technologies. • Generation of document relation graph by analyzing GOOGLE search results. • Comparison of source and target ontologies that leads to improved results First phase (a) GUI: - The interface on which search is performed is treated as main component of any search engine. In context of traditional search engines, queries are written by developers and results are matched with pre-defined keywords stored in databases. But in proposed work, ontology is used as backend in interface. In given framework, input query is being passed through user interface as well as GOOGLE search engine. It is passed to search engine in order to enhance search results with the help of SW technologies. (b) Designing /Importing existing ontology: - The proposed framework uses PROTÉGÉ 3.4 beta [10] for importing existing ontology related to given domain. Protégé is an open-source tool for editing and managing Ontologies. It is the most widely used domain-independent, freely available, platformindependent technology for developing and managing ontologies. (c) Extracting knowledge from given ontology: - Apache JENA framework can be used to represent relationship between classes, properties and instances from given ontology. It will lead to formation of knowledge base. JENA is a java framework for building semantic web applications that provides programmatic environment for RDF, RDFS, and OWL and consists of rule based inference engine [11].

. Figure1: SW architecture [9] Table 1: Comparative Analysis Research Work Debajyoti et.al [3]

Fatima et.al [4]

Zhang et.al [5]

Swati et.al [6]

Kumar et.al [7]

Pros

Cons

(a) Uses ontology to maintain semantic relationships among classes and instances rather than using NLP.

(a) No ranking of results is being done.

(b) Values of property can be computed from RDF codes and displayed to user (a) Query optimizer scans keywords and matches them with words stored in ontology database.

(a) Combines Google search results with RDF and present them in hierarchical fashion. (b) OntoSearch acts as visualization tool and can be linked to other web ontology editor tools (a) Uses WordNet API for generation of semantically similar words. (b) Matches terms used in user query with designed ontology to produce refined query. (a) Uses Mapper to represent semantic results into textual format. (b) Query processor scans keywords and matches them with words stored in ontology database.

(a) No updating of ontology database. (b) User interface is not connected to any semantic framework. (a) Synonym problem is not well addressed in this version of tool

Second Phase Same user query is being entered in GOOGLE search engine and results are retrieved. These results are in form of HTML (Text) documents. So, relationship among those text documents is extracted by converting them into RDF documents. It is done with the help of Text2RDF application.

(a) Does not evaluate Google results.

Third Phase This phase requires comparison of target ontology graph and source ontology graph. In both graphs, concepts are represented by nodes while relations are represented by edges. It is done with the help of PROMPT [12] algorithm. Features of PROMPT are as: • Besides merging ontology, it identifies locations for integration of ontologies, type of operations to be performed and resolves conflicts. • Interactive merging process i.e. several choices are being performed by user and PROMPT selects them automatically on basis of user preferences.

(a) No comparison and evaluation of IR performance. (b) Does not evaluate Google results


1005

Lexical, Ontological & Conceptual Framework of Semantic Search Engine (LOC-SSE) •

restrictions.

Handle conflicts like name conflicts, dangling references, redundant classes and slot value

Figure 2: Proposed LOC-SSE framework 5.1. Pros of Proposed Approach • The given framework evaluates GOOGLE search results in addition to user query. • User interface is connected to semantic framework called as JENA in order to retrieve knowledgeable results from ontology. • Relationships among classes, properties and instances are represented in form of user query graph. • On other hand, document graph is being created from GOOGLE search results. • Thus, above methodology adds lexical, conceptual and ontological flavor to proposed framework. 5.2. Implementation Above approach is being implemented as shown in steps below: Consider user query as “List the faculties of CSE in IIIT Hyderabad”

Figure 3: Home Screen Step 1 (a) User designed GUI: This form is drawn in NetBeans IDE 8.0


1006


(b) Showing data connectivity among Protégé, NetBeans IDE and Jena

From Fig 5, subsection of target ontology has been extracted further on basis of query “List the faculties of CSE in IIIT Hyderabad”.

Figure 4: Importing libraries & its successful execution Step 2: Designing of (Educational_institute.owl)

ontology

on

given

domain Figure 7: Extracting target ontology portion using SPARQL query Step 4: Creation of knowledge base involves: Generation of rules using Semantic Web Rule Language (SWRL) Four rules are being created that can lead to inferences related to given query. (i) Rule1_//_Hod_is_AssoProf_whose_Name_is Its expression in SWRL is CSE:isAssoProf(?A, ?S) ∧ CSE:isHod(?H, ?A) → CSE:hasName(?H, ?S) (ii) Rule2_//_AssoProf_is_senior_to_Lecturer_and_AsstProf Its expression in SWRL is CSE:isAsstProf(?A, ?S) ∧ CSE:isLecturer(?L, ?A) → CSE:isAssoProf(?L, ?S) (iii) Rule3_//_AsstProf_for_TeachingFaculty Its expression in SWRL is CSE:isTeachingFaculty(?G, ?F) → CSE:isAsstProf(?F, ?G)

Figure 5: Educational_institute domain ontology Step 3: Extracting knowledge from given ontology

Figure 6: Displaying properties and URI’s of Educational_institute ontology

Figure 8: Rules generated using SWRL Step 5: Target ontology graph


1007

Lexical, Ontological & Conceptual Framework of Semantic Search Engine (LOC-SSE)

Figure 12: GoogleCSE ontology graph (source ontology)

Figure 9: Target ontology graph

Step 10: Comparison Step 6: Now, query is entered on Google and it produces links of It is done by comparing both ontologies using PROMPT other faculties of IIT BHU, IIT Hyd in addition to IIIT Hyd algorithm where source ontology is “GoogleCSE.owl” and target ontology is “computerscience.owl”

Figure 10: GOOGLE search results page Step 7: Results (documents) It involves conversion of HTML links to semantic web resources like RDF so that ontology can be created which can be said as “GoogleCSE.pprj” or “GoogleCSE.owl” Figure 13: Execution of PROMPT algorithm Step 11: Improved results

Figure 11: Conversion of HTML links of IIIT Hyd into RDF Similarly, conversion of IIT Hyd and IIT BHU can be done into RDF. Step 8: Document Relation Extraction It involves designing of ontology from above RDF results.

Figure 14: Enhanced Results


1008


6.0. EVALUATION MEASURES Sample query

Google

Our system

[6]. List the faculties

Precision= 7/21

Precision= 9/21

= 0.33

= 0.42

of CSE in IIIT Hyderabad

Recall

= 7/16 = 0.43

Recall

= 9/16

[7].

= 0.56

[8].

[9].

Fig 15: Higher P to R ratio of our system than GOOGLE [10].

7.0. CONCLUSION AND FUTURE SCOPE The given paper presents a Lexical, Ontological & Conceptual framework of Semantic Search Engine (termed LOC-SSE) with the help of semantic web technologies. The proposed system is implemented and evaluated on basis of Precision- Recall Ratio. From implementation & analysis of the proposed framework, it is concluded that given system produces more accurate results as compared to Google. As a future work, it can be extended by developing agent based middleware search engine with the help of JADE (Java Agent Development Environment).

[11]. [12]. [13].

REFERENCES [1]. Berners-Lee, Tim, James Hendler, and Ora Lassila. "The semantic web.", Scientific american 284.5 (2001): 28-37. [2]. Tim Berners-Lee, The Semantic Web Revisited, IEEE Intelligent Systems, 2006 [3]. Debajyoti Mukhoupadhyay, Aritra Banik, Jhalik Bhattacharya, “ A Domain Specific Ontology Based Semantic Web Search Engine” 2011 IEEE 6TH International Conference on Intelligent systems and Artificial Intelligence , Jaypee University, Shimla, 1114 February 2011. [4]. Arooj Fatima, Cristina Luca, George Wilson “ New Framework for Semantic Search Engine” 2014 IEEE UKSim-AMSS 16th International Conference on Computer Modeling and Simulation, 26-28 March 2014, Cambridge, pages 446-451. [5]. Yi Zhang, Wamberto Vasconcelos, Derek Sleeman “OntoSearch: An Ontology Search Engine” In Proceedings of AI-2011, the twenty-fourth SGAI

[14].

[15].

[16].

[17].

International Conference on Innovative Techniques and Applications of Artificial Intelligence, Springer, pages 58-69 Swati Rajasurya, S.Swamynathan, “Semantic Information Retrieval using Ontology in University Domain” 2012 IEEE 6th International Conference on Intelligent systems and Artificial Intelligence, July 2012, Jaypee University, Shimla. S.S. Kamath, Garima Meena, K.Kumar “A Semantic Search Engine for Answering Domain Specific Userqueries” 2014 IEEE International Conference on Communications and Signal Processing (ICCSP), 3-5 April 2014, pages 1097-1101 Gagandeep Singh Narula, Dr. S.V.A.V. Prasad and Dr. Vishal Jain, “Use of ontology to secure the cloud: A Case Study”, International Journal of Innovative Research and Advanced Studies (IJIRAS), Vol 3 Issue 8 July 2016, ISSN 2394-4404 Gagandeep Singh, Vishal Jain, “Information Retrieval (IR) through Semantic Web (SW): An Overview”, In proceedings of CONFLUENCE-The Next Generation Information Technology Summit, 27-28 September 2012, pp 23-27 Gagandeep Singh, Vishal Jain , Dr.Mayank Singh, “Ontology Development Using Hozo and Semantic Analysis for Information Retrieval in Semantic Web” in ‘ICIIP-2013 IEEE Second International Conference on Image Information Processing ‘, Jaypee Univ. Shimla, 911 Dec 2013 https://jena.apache.org/ http://protegewiki.stanford.edu/wiki/PROMPT Gagandeep Singh Narula, Dr. Subhan Khan, Yogesh. “DST’s Mission Mode on Program Natural Resources Data Management System (NRDMS)”, BIJITBVICAM’s International Journal of Information Technology, Jan-June 2016 Vol.8 No.1 pages 973-978 having ISSN No. 0973-5658 with impact factor 0.605, indexed with IET (UK), INSPEC Meenu Dave, Mikku Dave, Y S Shisodia, “Cloud computing and knowledge management as a service”, BIJIT-BVICAM’s International Journal of Information Technology, July-Dec 2013 Issue 10 Vol.5 No.2 pages 619-622 Usha Yadav, Gagandeep Singh Narula, Neelam Duhan, Vishal Jain and BK Murthy, ”Development and Visualization of Domain Specific Ontology using Protégé”, Indian Journal of Science & Technology (INDJST), Vol. 9 No. 16, April 2016 having ISSN No. 0974-5645 and indexed with Thomson Reuters (Web of Science), Scopus (Elsevier), Index Copernicus, SJR=1.3 Anil Kumar, Jaya Lakshmi, “Web Document Clustering for Finding Expertise in Research Area”, BIJITBVICAM Journal of Information Technology, July-Dec 2009, Vol 1 No 2, Issue 2 pages 137-140. S. Ajitha, T.V, Suresh Kumar & K. Rajnikanth, “Framework for multi-agent systems performance


1009

Lexical, Ontological & Conceptual Framework of Semantic Search Engine (LOC-SSE) prediction”, BIJIT-BVICAM Journal of Information Technology, Issue 12, July-Dec 2014, Vol. 6 No.2 pages 774-778 [18]. Parul Gupta, AK Sharma, “A Framework for hierarchical clustering based indexing in search engines”, BIJIT-BVICAM Journal of Information Technology, Issue 6, July-Dec 2011 Vol. 3 No.2 pages329-334


1010

Indian Journal of Science and Technology, Vol 9(39), DOI: 10.17485/ijst/2016/v9i39/90671, October 2016


Decision based Cognitive Learning using Strategic Game Theory Uttam Singh Bist*, Manish Kumar, Anupam Baliyan and Vishal Jain Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi – 110063, India; [email protected], [email protected], [email protected], [email protected]

Abstract Objective: To modernize a system that can bring its own decision independently. Methodology: In this paper, we are projecting a novel model of the cognitive learning process using similar to the human learning technique. Findings: We have proved here that the relationship between two individuals may change their judgment in the same surroundings. Many researchers and scientists are working together over than six decades to develop an intelligence system as a human. Decision making is not so elementary. Each and every decision depends upon prior knowledge and decisions. With a slight change in nature may change the decision from pros to cons, from good to bad. In such a dynamic environment, we need to develop some dynamic system that can change the decision accordingly to the environment. Game theory plays an important role to handle such dynamic decision making in this world. We make decisions and learn through the observations and experience and then store the observed or concluded results into our knowledge base. Learning makes us more powerful to produce a sound determination. Rule based systems define the relationship between a person to another, then that decision does efficiently and consequently to the kinship. Application: This paper introduces a new version of thinking capability of machine in dynamic nature using game theory. We trust that this paper will require a revolution in sound system design and clay sculpture.

Keywords: Cognitive Computation, Cognitive Decision, Cognitive Learning, Cognitive Learning Process; Decision Based

Cognitive Learning, Decision Based Learning, Strategic Game Theory

1. Introduction

them. If kids do have knowledge of the generalizations, what is the frame of such knowledge? We concentrate on the acquisition of inflectional morphology where developmental data are abundant. We demonstrate a theory of how to create, use relationships and decision making in human behavioral generalizations. The theory consists of a performance model and an acquisition procedure. The execution model is stated in terms of an incrementally constructed constraint mechanism. The acquisition procedure abstracts a series of more and more sophisticated behavioral constraints from a chronological succession of lessons. The performance model fills in the details of a behavior event by enforcing these constraints. Our special system design is coherent with what you would expect of computer technologists. We think naturally in terms of buffers, bidirectional constraints, symbolic differences and greedy learning algorithms.

In the epoch of advanced computing, many researchers and scientists are forming together to develop a computing system whose intelligence is like a man. Nearly every kid discovers how to respond and to realize the strategies to build his/her determination. At an appropriate stage of development a child takes the strategy with amazing speed: Typically a child finds out many new strategies for each and every kind of person and their correct usage each day. The learning is efficient, in that a minor does not need to induce a decision the same strategy repeated over and over again or to be corrected very often. This decision making must be easy, but we do not have effective theories that explain the phenomenon. The mystery intensifies when we acknowledge that children read many new strategies for each kind of somebody without ever trying *Author for correspondence

52

Decision based Cognitive Learning using Strategic Game Theory

As you will see, each of those particular concepts came to take on an important function in our processing and learning system and in our ultimate ends. We present our learning model in the domain of Psychology - the link between the structure of the great unwashed and their kinship. Here a key to faster learning is that the relations that are actually used in behaviors are merely a few of the possible weightage/responses/strategy. In each relationship only a few of the potential combinations of strategy may look in relation. We see that it is the thinness of these places that makes the acquisition of regularities effective. But it is not possible until we do not have an eligible model for this determination. Researchers and Scientists are hunting together with a learning technique to provide the intelligence to the system, while it is impossible to achieve an intelligence system without a mannequin which is alike to the human cognitive process of remembering, decision and learning. We require to be organized a model which can take own decision about the problem. These determinations should be grounded upon the systems own opinions and weightages. Through this work we can design the best and certainly an intelligence organization. In this report, we are projecting a novel model of cognitive learning which would be parallel as per human cognitive functions. Cognitive sciences play a most important function in this theoretical account. This mannequin is an improved cognitive learning process based upon decision theory and knowledge based system.

the world make the system more intelligent than in front so it becomes necessary to system must have the know about its own condition of cognition. The organization gains new knowledge of the world and stores it as a new knowledge and compiles it to see about the new knowledge. In this paper, we are considering the system as an expert system in its field as each and every man has a specialization in something. The system might be expert in molecular engineering, neurology, medicine, technology, behaviorism or technology. The organization must be able to carry its own decision via the experience and/ or decision theories (e.g. game theory). To build up such system we need a model which can modify it according to the nature, behavior, circumstances or behavior between the different object2. Such cognitive functionalities in an intelligent system should be dynamic in nature and able to identify the behavior of the person or agent as it never would remain same. For this design model should possess an IQ to learn its own determination regarding the position and behavior lies in the creation. It could be possible only when we receive a solid and powerful decision theory found and apply into the scheme. The conclusion should be autonomous (it will not a preset decision) and the system can alter its decision according to the environment and circumstances. Knowledge adoption is a new step in the cognitive process for learning purposes. The system should have kept all the new knowledge in the world and store or update its knowledge as it required. It will increase the power to interact with the universe in the real-time with a proper decision or sometimes alternate decision too.

2. Cognitive Process Model

2

2.1 Cognitive Process

2.2 Cognition

According to the Department of Philosophy, Stanford University, “Cognitive science is the interdisciplinary study of mind and intelligence, embracing philosophy, psychology, artificial intelligence, neuroscience, linguistics and anthropology”. Cognitive functions (e.g. anticipation and planning) can be done by simulating the intelligent system’s interaction with the world imply different circumstances in different environments through its decision, actions and their consequencesd1. Such study would require a dependable and updated model which produces an intelligent system to update the system about the cosmos every bit well as itself too. If we just see the updating knowledge about the world, then it will never know about its own condition. As the change in the knowledge about

Knowledge might be set as the human’s mental process of knowing with aspects such as consciousness, perception, reasoning and the judicial decision3. From a computational perspective, cognition can be regarded as a collection of information which is processed and extracted by the human nervous system, reasoning, decision making. Human knowledge will help us gather more insights into the nature of cognitive decision support4. Cognitive mapping is a united process of information for reasoning, decision making and knowledge adoption, i.e. not existing in knowledge up to now and also an appropriate conduct of a mechanism not yet absurd by knowledge. A decision is always related to the action or sequences of actions based on the causes and real-time word behavior.

Vol 9 (39) | October 2016 | www.indjst.org

53


Uttam Singh Bist, Manish Kumar, Anupam Baliyan and Vishal Jain

information about the world by running within it than by learning about it, hearing lectures on it or studying abstract models of it. Learning is an active procedure of making sense of new information within the context of our own reason and experience. The learning process will take the role of some emotions as the reinforcement function (happiness, sadness) into the account. Due to the complexity of the real environment, the definition of new emotions or redefinition of the old ones will be needed5. A learning task characterized by a very unbalanced distribution of examples into classes can be solved using the principle of the division of example space into subspaces6. In Computer World, learning is broadly split into two categories: Supervised Learning and Unsupervised Learning. In Supervised Learning, the model specifies the effect one set of observations called inputs has on another lot of reflections called outputs. In other words, the inputs are assumed to be at the beginning and outputs at the conclusion of the causal chain. The models can include mediating variables between the inputs and outputs. In Unsupervised Learning, the learning can proceed hierarchically from the observations into ever more abstract levels of representation. Each additional hierarchy needs to learn only one step and therefore the learning time increases (approximately) linearly in the number of levels in the model hierarchy. Two types of learning: Explicit Learning and Implicit Learning. Explicit Learning made the subjects reconstruct their knowledge system and mental models, whether in Implicit Learning have different outcomes compared with Explicit Learning7. In Cognitive Process Knowledge and Learning has their own importance. The evolution in learning has been explained as a series of the transformations of the sets of the spaces into experiences  rules  concepts8. Learning is the key process of knowledge adoption. In this process the subject acquires the knowledge through his own experiences and observations.

As we know that the behavior and the strategies never remains static in nature. A decision should be based upon the dynamic behavior of the world. The organization must have a smartness to identify the strategy and reflect with a proper strategy against the nature with maximizing his profit. It implies it must hold a greedy algorithm to maximize. Thus, we need to imply a real-time system which will take decision based upon the current positions and behavior of the cosmos. It is how the model naming and action determination processes take place. A cognitive system does not any means without implying it to the world real-time situations or conditions. The world will provide the data to control the information and knowledge through which we can justify the capability of the acting and feeling of doing so in a timely fashion. Decision making is a complex procedure. Cognitive process is an important component of the decision based learning scheme for knowledge adoption. It always depends upon the conditions of the creation as human do. Thus, the model is the groundwork of the knowledge and strategic decision making. To design a cognitive system for genuine autonomy that is capable of imagining things out before acting, it seems necessary.

2.3 Learning Process In human learning, we are actively sensing the Earth round us by renting in new information, comparing it to our current understanding and negotiating meaning out of those interactions. The cognitive domain of human behavior is of central importance in the acquisition of both a foremost and second language. The process of perceiving judging, recognizing and remembering are central to the job of internalizing a language. Learning is the procedure of acquiring material as discrete and relatively isolated entities that are relatable to cognitive structure only in an arbitrary and verbatim fashion not permitting the formation of [meaningful] relationships. Every kid learns how to infer and to resolve a problem. At an appropriate stage of development a child learns solutions with amazing speed: Typically a child learns many new minds and their correct usage each day. The learning is efficient, in that a minor does not need to figure out the same problem repeated over and over again or to be corrected very often. Thus, learning must be easy, but we do not have effective theories that explain the phenomenon. We ‘construct’ our knowledge through experience by doing. The human psyche is better equipped to assemble


3. Cognitive Process Model In this paper our main object is to propose a new model for the cognitive learning with the decision based system. In this work we are also trying to find out what are the necessary steps to be taken by a human to solve a problem and store it or not into his knowledge. This model is a perfect collaboration of the cognitive processes, learning theory and decision theory. Decisions have to be taken by analyzing the conflicts and trade-offs9.

54


3


3.1 Proposed Model for Cognitive Learning based on Decision Theory Let take an example, what are the necessary steps to solve a problem with human. 1. Study the problem. 2. Comprehension: Comprehension is a process by the cognitive model to express the problem via the prior knowledge, if its solution is already in knowledge, then use the prior knowledge to solve this problem otherwise use step 3. 3. Elaboration: Elaboration is the process where the human rectifying the problem by using its prior knowledge and the new problem and trying to sort out the problem by using its own rules/principles. Rectifying the problem is referring to the make the necessary changes in problem through which he/she can easily understand the problem. 4. After the elaboration, it becomes necessary to make a significant decision based on the current knowledge and elaborated problem, which one would be the best source to solve such problem. 5. Learning: It becomes necessary to store the results for future use. As we do in the legislature or constitution, where each and every judgment stores into the AIR-year and recall them when we found the same trial is running for decision. To judge this trial the defense lawyer make a reference of the same case which was held on past. 6. Retrieval of knowledge: A process where you can access the prior knowledge to elaborate the problem though you can understand the problem properly and also useful for the further solution for the pre-computed result. 7. Reconstruction: In this step the new problem creates a new knowledge for the system which might be helpful for future use. This step contains knowledge updating, new knowledge construction, etc. Our new proposed model for cognitive learning process flow depicts in Figure 1.

•

3.2 Explanation of Model

•

Figure 1. New proposed model for cognitive learning process.

•

•

• Comprehension: When we meet with a new person, do not begin by “share information about us” in the sense of starting at the first and moving sequentially toward the last or confidential information; first predict what the person will mean. Prediction which plays a key role in cogent account of studying behavior and trustworthiness can be understood as “the prior elimination of unlikely alternatives” or “questions we ask the world”. We not only predict and observe what a person’s behavior or what a person will mean to us. • Learning: Comprehension does not necessarily lead to learning - at least, not to learn of a meaningful, useful 4


kind. Only, once new information has been covered - by tying it to what is already known - cognitive theorists say that the new information can then be learned through activities which enrich the connections between the new and the old knowledge. Decision making is to identify the person’s relation, trustworthiness. Likewise, identify the strategy and the behavior (i.e. Cooperative or Non-cooperative) for the individual and what scheme of mine would be suited for him/her. Learning takes place when the new information becomes a part of the existing knowledge about existing person. Also update the strategic behavior of that person. When refined and richly integrated, the new knowledge becomes meaningful and useful. Recall and Reconstruction: People apparently do not store knowledge as long, complete relationships about all people around about him, but rather in a dynamic, interlinked network in which the elements have been analyzed into categories linked by multiple relationships that may be organized as schemas, scripts, narratives or other forms.

4. Algorithm Proposed using Game Theory Cognitive science is a study of psychological study of human personality and social behavior. Cognitive

55



c omponent was obtained by applying the mutation technique followed in the evolutionary algorithm10. The detailed research study shows that early predictions of cognitive disorders are primarily linked only with the behavioral aspects of children, while their clinical reports or consecutive analytical test approaches are not much taken into consideration11. In this work we simulate a relationship between some sorts of people with different behavior in a network. This simulation also depicts the strategic behavior between different persons. This work will help us to identify the appropriate strategy to the opponent and also it will help us to identify the nature of the opponent person i.e. either friend or enemy or competitor. To imply this work we need to design an algorithm for the proposed model of cognitive learning process which also acquires the decision based on the dynamic strategy and the behavior between two/more different persons.

people, whether the dashed line represents an indirect relation between two people also can say inherits the relation from the direct related persons to that person. First step to initialize the network of all the people, i.e. establishes the essential relationship among the people as in Table 1. For example, Ravana is the enemy of Rama. Rama is friend of Hanuman. You can find in the above relation diagram. Somewhere is any relation lies between Ravana and Hanuman which is denoted by the dashed line. To simulate the work I am using some character from the Ramayana (A Holy Indian Book). According to this book the Lord Rama is a good hearted man, whether Ravana doesn’t have a good heart. In summary, Rama, Lakshmana, Hanuman, Sugreeva, Jamvanta, all are in the Lord Ramas group or we could say they are wellwisher of the Lord Rama or Friend. Whether Meghnada, Kumbhkarana and Ravana in same group. Ravana took Rama’s wife so Rama has to fight with Ravana to take his wife back. We need to identify that relationship between in Ravana and Hanuman in our second step. In this step we must have followed some rules which are pre-defined and gather the knowledge about the person which does not exist in knowledge. In this step we also assign the behavior, i.e. frank, trustworthy, untrustworthy or enemy. This step is a kind of knowledge adaption, which is a hallmark of human intelligence. In third step, assigned the weightage for the person which is categorized as given in Table 2. In fourth step, we obtained a strategy matrix for each kind of problem. Every problem has its own pay-off matrix as for the Ram Ravan’s strategy given in Table 3. We just add the weightage of the person as per his behavior and get the new pay-off matrix. We used Table 1 and Table 2 to identify the new pay off matrix i.e. Table 4 as resultant matrix after finding the weightage from Table 1 and Table 2 and applied it on Table 3. The major task is here to identify the where we need to reduce the weightage and where to add. It sometimes becomes complicated.

4.1 Algorithm for Decision based Cognitive Learning using Strategic Game Theory • Initialized the network of people. • Identified the relationship with a person either known, unknown, in relation or enemy through an expert system. • Assigned a weightage according the nature or behavior or relating to that person. • Create a payoff matrix with the strategically play to that person. • Using Game theory to discover out a suitable scheme to maximize the payoff with that individual (i.e. opponent). • Keep all the gathered knowledge about that person in the knowledge database.

4.2 Understand the Model As we can see in Figure 2 that Rama and Ravana has direct relation of Enemy but we also can find that there is no direct relation between Hanuman and Ravana. So the solid lines represent the direct relation between two

Table 1. Relation trust-factor table

Figure 2. Relationship diagram between two persons.


56

S. No.

Relation

Weightage

1

Brother, Sister

2

2.

Friend

1

3.

Known

0

4.

Unknown

-1

5.

Enemy

-2


5


Table 2. Behavioral trust-factor table S. No.

Behavior

Weightage

1

Trustworthy

1

2.

Unknown

0

3.

Untrustworthy

-1

(Megnada is son of Ravana) (Kumbhkarana is brother of Ravana) (Hanuman introduce Sugreeva to Rama) (Sugreeva is friend of Jamvanta) New assertions get after the processing of these assertions to the rule based system, as follows: (Hanuman is friend of Lakshmana) (Lakshmana is friend of Hanuman) (Hanuman is friend of Rama) (Rama is enemy of Ravana) (Megnada is enemy of Rama) (Megnada is friend of Ravana) . . . Done After processing these rules on the system we easily identified the relationship between any two persons. E.g. if we would find the relationship between the “Rama” and “Ravana” it should be “Enemy” and the trust factor would be “Untrustworthy”. (find-relation “Rama” “Ravana”) Now suppose that Rama has 2 strategies whether Ravana has 3 different strategies to fight. Now the Pay-off Matrix for the known person is given below: Rama Ravana 1 2 0 1 -1 -2 2 3 1 2 3 -1 In this matrix Ram has its own matrix in form of the following matrix: Now according to Table 3, the Ravana can opt either strategy 1 or strategy 2 to fight with Ravana, whether Rama has 3 strategies to fight with Rama. But their relationship is enemy to each other therefore there should be some negative impact on the Pay-off Matrix. After compilation the trust-factor in this Matrix we got a new kind of Pay-off Matrix i.e. Rama Ravana 0 1 -1 0 -2 -3 1 2 0 1 2 -2

Table 3. Strategy table for pay-off Matrix Ravana’s Strategy 1 ,2

0,1

-1,-2

2,3

1,2

3,-1

Rama’s Strategy

Table 4. Strategy table for pay-off Matrix (Relationship and behavior) Ravana’s Strategy Rama’s Strategy

0,1

-1,0

-2,-3

1,2

0,1

2,-2

In our fifth step, we solve the problem using game theory and find out the strategy game which would be cooperative if the person is known or trustworthy or both and non-cooperative for the person if he is unknown or enemy. Sixth step just store the knowledge for the future use. If in the future the person does not change his behavior or state of trustworthiness otherwise it will recompile the program from the step 2. Now the concern is that if Rama has 2 strategies to win over Ravana. Similarly, Ravana also have 3 different strategies to defeat the Rama. The main problem is that when Rama or Ravana make changes in their current strategy then what will happen. How the either Rama or Ravana change his strategy to win the fight. This is the major concern about this fight. To select a strategy against of the opponent strategy we have to study the matrix i.e. pay-off matrix in game theory. Now I am providing here the simulated results of study this fact. The following assertions passed to the rule based system and compile or elaborate the system to find new rules and assertions or could say adapt a new knowledge after processing these assertions into the rule based system. (Rama is friend of Hanuman) (Lakshmana is brother of Rama) (Ravana is enemy of Rama) 6


57



6. References 1. Cotterill R. Enchanted looms: Concious networks in brains and computers. Cambridge, U.K.: Cambridge University Press; 2000. 2. Bellas F, Duro RJ, Faina A, Souto D. Multilevel Darwinist Brain (MDB): Artificial evolution in a cognitive architecture for real robots. IEEE Transactions on Autonomous Mental Development. 2010; 2(4):1–5. 3. Brasil LM, Azevedo FMD, Barreto JM, MoirhommeFraiture M. Complexity and cognitive computing lecture notes. Computer Science. 1998; 1415:408–17. 4. Lu J, Niu L, Zhang G. A situation retrieval model for cognitive decision support in digital business ecosystem. IEEE. 2011; 60(3):1059–69. 5. Malfaz M, Alvaro CG, Barber R, Salichs MA. A biologically inspired architecture for an autonomous and social robot. IEEE Transactions on Autonomous Mental Development. 2011; 3(3):232–46. 6. Machova K, Paralic J. Basic principle of cognitive algorithms design. CiteSeer; 2004. p. 1–7. 7. Zhong Y. Study on cognitive decision support based on learning and improvement of mental models. ISECS International Colloquium on Computing, Communication, Control and Management; Guangzhou. 2008. p. 490–4. 8. Albus J, Lacaze A, Meystel A. Theory and experimental analysis of the cognitive processes in early learning. IEEE; USA. 1994. 9. Velumoni D, Rau SS. Cognitive intelligence based expert system for predicting stock markets using prospect theory. Indian Journal of Science and Technology. 2016 Mar; 9(10):1–6. 10. Thomas J, Kulanthaivel G. Classification of preterm birth with optimized rules by mutating the cognitive knowledge of PSO. Indian Journal of Science and Technology. 2015 Jul; 8(15):1–5. 11. Mythili MS, Shanavas ARM. Meta heuristic based fuzzy cognitive map approach to support towards early prediction of cognitive disorders among children (MEHECOM). Indian Journal of Science and Technology. 2016 Jan; 9(3):1–7.

Figure 3. Resultant output for Ramayana.

Therefore, the resultant matrix is shown in Table 4. The result Matrix is: Ravana Rama .0.0503 1.0000 0 0 0.9497 0 //Grey colored cell is not an option in Rama’s strategy. According to this solution we found that Ravana can opt either 1st or 3rd strategy to fight with Rama whether Rama have to opt only 1st strategy to fight with Ravana. we can see in the following graph Figure 3.

5. Conclusion This work will help us to understand the cognitive learning process through game theory. It will help us to motivate some other learning process through the other strategic decision making system. The game theory is the basic concept to make strategic decision for a given problem which we human can understand easily and interpret them and take fast decisions for the same. In future we can interpret the human being problems by introducing NLP in this learning process.


58


7

International Journal of Innovative Research and Advanced Studies (IJIRAS) Volume 3 Issue 8, July 2016

ISSN: 2394-4404

Use Of Ontology To Secure The Cloud: A Case Study

Gagandeep Singh Narula

Dr. S. V. A. V. Prasad

Research Scholar, M.Tech (CSE), CDAC Noida, India

Professor, Electronics and Communication Department, Lingaya’s University, Faridabad

Dr. Vishal Jain Assistant Professor, Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi

Abstract: The paper presents layout of secured ontological based cloud computing architecture in order to secure information uploaded by multiple users on cloud storage layer. It provides solution for retrieval of queries and provides fast response to users by using concept of Web ontology language (OWL) and its description logics. In addition to this, it also adds flavor of semantic web; discussed its evolution and role of ontology in development of semantic web. Keywords: Semantic web, Ontology, Cloud computing and Security.

I.

results are retrieved / found on basis of author’s name, publisher’s name, ISBN number etc. The most important feature in modern library is that record field values are ordered and their values are interpreted in international standards like MARC format. It uses vocabularies in form of concept hierarchy like Dewey Decimal Classification System (DDC) or Universal Decimal Classification System (UDC). These standards are very vital for dissemination of information in libraries. Similarly in web, these standards are used as ontologies to capture values of records. Then, these values acts as metadata to maintain interoperability between standards. If there are multiple standards, then they are to be mapped first before sharing of information. Semantic web technologies like XML, RDF aims to create ontologies and metadata either from scratch or from existing ontologies. New ontology can be created from existing ontology by performing various ontology evaluation approaches like PROMPT, OntoMetric etc. Thus, it is concluded that semantic web is an application for generation of metadata and enhances the results of current web with the help of ontology.

INTRODUCTION

The term semantic web was coined by Sir Tim Berners Lee in 1965. It came into existence with aim to abridge gap between humans and machines. Its architecture includes ontology that can be said as spine of semantic web. Ontologies are data models that represent meaning of semantics in expressive way. Ontologies are used to maintain relationship among real world entities belonging to particular domain. It has various definitions like philosophical, formal, explicit, specific, shared and many more. There are notions to express ontologies called as Ontology languages like RDF, OWL. They have predefined syntaxes and logic based semantics that performs reasoning and manipulations with the help of ontologies. Semantic web is treated as third generation of web (Web 3.0) that focuses on generation of metadata and its annotations are filled in machine understandable form. Difference between current web and semantic web can be illustrated by taking an example of library. An old library with full of books without catalogue is treated as current web while modern computerized library with catalogues is semantic web. Obviously, modern librarians work faster because they have to search catalogues directly rather than searching whole books. In catalogue, Page 148

www.ijiras.com | Email: [email protected] 59


ISSN: 2394-4404

classes. Relationship between classes and subclasses is defined by super concept-sub-concept and is defined as isahierarchy. Example: There is class named Institute with its sub classes IIT, IIIT, NIT’s. So, it is represented as IIT is a subclass of Institute. There are social ontologies that help to achieve interoperability among social web applications in order to move from social web to semantic web. Some of them include FOAF, SIOC, XFN, GoodRelations, RSS Feeds and many more.

There are various problems associated with the development of semantic web. According to Kevin Kelly, it suffers from fax effect which means that development of semantic web is costly and its technologies have not been utilized fully. But, still most of researchers are trying their hands on this web technology to achieve machine human interaction. The following paper is categorized as follows: Section 2 makes readers aware of evolution of semantic web and makes them think to move towards this research area. Section 3 defines concept of ontology as one of research areas. Section 4 describes features and service models of cloud computing followed by recent studies conducted in context of cloud computing as described in section 5. Section 6 presents ontological based secured framework to prevent multiple data from unauthorized access. Finally, the paper is ended with conclusions and references.

IV. CLOUD COMPUTING Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This definition from the National Institute of Standards1 has gained broad support from the industry. The NIST definition of cloud computing describes five essential characteristics, three service models and four deployment models.

II. SEMANTIC WEB This section let readers think about few questions like why would current web need any extension? Why there are irrelevant results produced while on current web? The reason that is common to both questions is lack of knowledge gap between user and machines. Current web does not offer mechanism to provide deeper understanding of information. Various knowledge management solutions and technologies are there in field of AI to deal with this while missing information can be accessed with the help of ontologies. Ontologies can be social as well as formal. They are formal in such a way that they maintain human-machine interaction to enable knowledge reasoning while social confines maintaining relationships between classes and properties of other ontologies. Semantic web aims to transform web documents to information. Meaningful data is called as information. It involves creation of common framework that leads to sharing of data and its reuse among various applications. Application of semantic technologies covers areas like data integration, knowledge discovery, and resource discovery, classification of data and designing of intelligent systems.

FIVE ESSENTIAL CHARACTERISTICS     

ON-DEMAND SELF SERVICE: Users are able to provision, monitor and manage computing resources as needed without the help of human administrators BROAD NETWORK ACCESS: Computing services are delivered over standard networks and heterogeneous devices RAPID ELASTICITY: IT resources are able to scale out and in quickly and on an as needed basis RESOURCE POOLING: IT resources are shared across multiple applications and tenants in a non-dedicated manner MEASURED SERVICE: IT resource utilization is tracked for each application and tenant, typically for public cloud billing or private cloud chargeback

THREE SERVICE MODELS 

III. ONTOLOGY Ontology is treated as formal, explicit specification of shared conceptualization [1]. Besides its formal nature, philosophical aspects, handling real world scenarios, it also acts as medium of linking between human and machines. Ontology in itself is a vast research area that includes mapping, merging, extraction, moving and evaluation of ontologies. Ontology evaluation approaches are classified in following categories [2].  On basis of comparing ontologies  On basis of usage and application of ontologies  On basis of set of documents related to domain ontology  On basis of human evaluation in order to meet ontology requirements and compatibilities. Components of ontology include classes, properties, instances, inheritance functions, slots, frame values and sub Page 149



SOFTWARE AS A SERVICE (SAAS): Applications delivered as a service to end-users typically through a Web browser. There are hundreds of SaaS service offerings available today, ranging from horizontal enterprise applications to specialized applications for specific industries, and also consumer applications such as Web-based email. PLATFORM AS A SERVICE (PAAS): An application development and deployment platform delivered as a service to developers who use the platform to build, deploy and manage SaaS applications. The platform typically includes databases, middleware and development tools, all delivered as a service via the Internet. PaaS offerings are often specific to a programming language or APIs, such as Java or Python. A virtualized and clustered grid computing architecture is





ISSN: 2394-4404

often the basis for PaaS offerings, because grid provides the necessary elastic scalability and resource pooling. INFRASTRUCTURE AS A SERVICE (IAAS): Compute servers, storage, and networking hardware delivered as a service. This infrastructure hardware is often virtualized, so virtualization, management and operating system software are also part of IaaS as well. An example of IaaS is Amazon’s Elastic Compute Cloud (EC2), Simple Storage Service (S3) and Rackspace.

 Figure 1: Service Models [3]



V. LITERATURE SURVEY Several studies have been performed by researchers, scientists, programmers in context of cloud computing. Youseff et al. [4] clearly describes aspects of cloud computing like migration of data from cloud to cloud, setting cluster nodes theoretically only. Lee et.al [5] proposed an algorithm for handling and managing distribution of resources. The work covers processing of user requests not only in terms of processor size but also in accordance with service level agreements (SLA’s). Fujiwara et.al [6] led to development of framework for cloud forensic environment. Sim et.al [7] introduced ontological based retrieval method for accessing information from cloud. Khafagy et.al [8] proposed development view of ontological based file system for accessing and sharing of information from multiple cloud service providers. Aforementioned studies depict role of ontology in acquiring secured cloud architecture. Parkin et al [9] devised a secured ontology layout that looks after implications caused due to human behavioral interactions while dealing with management services.



 

Figure 2: Security Framework Description of modules is as follows: Agent Layer: - It involves user interface agent that acts as interface between user and multiple agents. Various agents can be used like Communicating agent, flow agent etc. Cloud Data Storage (CDS):- It is used to manage data flow sent by multiple cloud users by taking services from various cloud service providers (CSP’s) on PAY PER USE policy. It holds database in which all queries and related information are stored. This information is used further. Ontology Development and Generation of rules:- On basis of queries and information stored on cloud server, ontology is being developed with the help of various ontology development tools like Protégé[10], WebODE[11] and many more. Then, rules are developed to derive inferences from given query. It can be done by using OWL and its description logic foundations [12]. Management Services: - Above produced rules are useful in handling management services like billing, usage of data, payment details etc.

VII. CONCLUSION AND FUTURE SCOPE The paper presents secured ontological based cloud computing framework that includes four modules viz. Agent layer, CDS, Ontology development and Management of services. It provides solution for retrieval of queries and provides fast response to users by using concept of OWL and its description logics. Ontology can be developed in formats like OWL-DL, OWL LITE and OWL FULL. As future work, a feedback framework can be developed that incorporates use of various agents to gather knowledge from various cloud service providers in order to provide recommendations to cloud users on basis of their interests.

VI. SECURITY WITH ONTOLOGY IN CLOUD COMPUTING This section presents layout of secured ontological based cloud computing architecture to secure information uploaded by multiple users on cloud storage layer. Fig 2 depicts schematic presentation of it.

REFERENCES [1] Gagandeep Singh, Vishal Jain, “Information Retrieval (IR) through Semantic Web (SW): An Overview”, “In proceedings of CONFLUENCE 2012- The Next Page 150



[2]

[3]

[4]

[5]

[6]

[7]

[8] Feel, H. T., Khafagy, M. H. 2011. First International Symposium on Network Cloud Computing and Applications (NCCA). pp 9-13. [9] Parkin, S. E., van Moorsel, A. ., and Coles, R.: An information security ontology incorporating human behavioural implications. In SIN ’09: Proceedings of the 2nd international conference on Security of information and networks, pages 46–55, New York, NY, USA, 2009. ACM [10] Protégé Available at: http://protege.stanford.edu/ [11] Corcho, O., Fernandez-Lopez, M., Gomez-Perez, A. and Vicente, O. 2002. WebODE: An Integrated Workbench for Ontology Representation, Reasoning and Exchange, Prof. of EKAW2002. Springer LNAI 2473 (2002) 138153 [12] Zhihong, Z. and Mingtian, Z. 2003. Web Ontology Language OWL and its Description Logic Foundation. In proceedings of The Fourth International Conference on Parallel and Distributed Computing, Applications and Technologies. PDCAT. PP 157-160. [13] Talib, A. M., Atan, R., Abdullah, R., and Murad, M. A. A.: CloudZone: Towards an Integrity Layer of Cloud Data Storage Based on Multi Agent System Architecture. ICOS 2011, pp.127-132. [14] Han, T. and Sim, K. M. 2010. An Ontology-enhanced Cloud Service Discovery System. In proceedings of International Multi Conference of Engineers and Computer Scientists vol. 1. IMECS 2010, March 17-19, 2010 Hong Kong

Generation Information Technology Summit at Amity School of Engineering and Technology”, September 2012, pp 23-27 Janez Brank, Marko Grobelink, Dunja Mladenic, “A Survey of Ontology Evaluation Techniques”, Department of Knowledge Technologies Jozef Stefan Institute, Jamova, 2000 Aderemi A. Atayero, Oluwaseyi Feyisetan, “Security Issues in Cloud Computing: The Potentials of Homomorphic Encryption, Journal of Emerging Trends in Computing and Information Sciences”, VOL. 2, NO. 10, October 2011 Youseff, L., Butrico M. and Silva, D.D. 2008. Towards United Ontology of Cloud Computing. Grid Computing Environments Workshop. GCE’08. pp 1-10 Ma, Y.B., Jang S. and Lee, J.S. 2011. Ontology-Based Resource Management for Cloud Computing. In proceedings of ACIIDS(2). PP 343-352. Takahashi, T., Kadobayashi, Y. and Fujiwara, H. 2010. Ontological Approach toward Cybersecurity in Cloud Computing. Proceeding SIN '10 Proceedings of the 3rd international conference on Security of information and networks. ACM. New York, NY, USA. Han, T. and Sim, K. M. 2010. An Ontology-enhanced Cloud Service Discovery System. In proceedings of International MultiConference of Engineers and Computer Scientists vol. 1. IMECS 2010, March 17-19, 2010 Hong Kong.

Page 151

ISSN: 2394-4404


IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN (E): 2321-8843; ISSN (P): 2347-4599 Vol. 4, Issue 5, May 2016, 1-12 © Impact Journals

EVALUATION AND VALIDATION OF ONTOLOGY USING PROTEGE TOOL VISHAL JAIN1 & S. V. A. V. PRASAD2 1

Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad, India 2

Professor, Electronics and Communication Department, Lingaya’s University, Faridabad, India

ABSTRACT Ontology is one of the most important constituent in semantic web layered architecture. Without ontology, it is impossible to maintain relationships among real world entities. Various operations can be performed on ontology like merging, mapping, evaluation and validation of ontologies. The paper classifies ontologies at various levels like lexical level, hierarchical level, syntactic level and design level. Besides this, a comparative study is also provided on different methods for evaluation and validation of ontologies like Prompt, OntoMetric, OntoClean and many more.

KEYWORDS: Semantic Web, Ontology, Evaluation, Protégé, Prompt INTRODUCTION The term semantic web was coined by Sir Tim Berners Lee in 1965. It came into existence with aim to abridge gap between humans and machines. Its architecture includes ontology that can be said as spine of semantic web. Ontologies are data models that represent meaning of semantics in expressive way. Ontologies are used to maintain relationship among real world entities belonging to particular domain. It has various definitions like philosophical, formal, explicit, specific, shared and many more. There are notions to express ontologies called as Ontology languages like RDF, OWL. They have predefined syntaxes and logic based semantics that performs reasoning and manipulations with the help of ontologies. Semantic web is treated as third generation of web (Web 3.0) that focuses on generation of metadata and its annotations are filled in machine understandable form. Difference between current web and semantic web can be illustrated by taking an example of library. An old library with full of books without catalogue is treated as current web while modern computerized library with catalogues is semantic web. Obviously, modern librarians work faster because they have to search catalogues directly rather than searching whole books. In catalogue, results are retrieved/found on basis of author’s name, publisher’s name, ISBN number etc. The most important feature in modern library is that record field values are ordered and their values are interpreted in international standards like MARC format. It uses vocabularies in form of concept hierarchy like Dewey Decimal Classification System (DDC) or Universal Decimal Classification System (UDC) . These standards are very vital for dissemination of information in libraries. Similarly in web, these standards are used as ontologies to capture values of records. Then, these values acts as metadata to maintain interoperability between standards. If there are multiple standards, then they are to be mapped first before sharing of information. Semantic web technologies like XML, RDF aims to create ontologies and metadata either from scratch or from existing ontologies. New ontology can be created from existing ontology by performing various ontology evaluation approaches like PROMPT, OntoMetric etc. Thus, it is concluded that semantic web is an application for generation of metadata and enhances the results of current web with the help of ontology. There are various problems associated with the development of semantic

Impact Factor(JCC): 1.9586- This article can be downloaded from www.impactjournals.us

63

2

Vishal Jain & S. V. A. V. Prasad

web. According to Kevin Kelly, it suffers from fax effect which means that development of semantic web is costly and its technologies have not been utilized fully. But, still most of researchers are trying their hands on this web technology to achieve machine human interaction. Continuous efforts are being laid down by researchers in order to make information systems as intelligent systems that encompass human interaction with machines. It has led to focus towards semantic concepts of data that holds interpretation and relationship with other concepts . In recent years, various studies have been conducted by scientists, researchers in order to allow semantic web technologies to work in distributed environment and enables knowledge sharing of information in machine understandable format. It is necessary to validate and evaluate ontologies while building them because ontology building is a task that requires working from scratch of project. Ontology Evaluation is one of the key techniques for the area of semantic web. Ontology can be evaluated for a particular domain in which it exists but an independent evaluation of ontology is still hard to solve. They can be solved on basis of 3 levels •

Scope of ontology

•

Taxonomy view (whole view, Isahierarchy)

•

Adaptability of semantic web relations

Semantic Web This section let readers think about few questions like Why would current web need any extension? Why there are irrelevant results produced while on current web? The reason that is common to both questions is lack of knowledge gap between user and machines. Current web does not offer mechanism to provide deeper understanding of information. Various knowledge management solutions and technologies are there in field of AI to deal with this while missing information can be accessed with the help of ontologies. Ontologies can be social as well as formal. They are formal in such a way that they maintain human-machine interaction to enable knowledge reasoning while social confines maintaining relationships between classes and properties of other ontologies. Semantic web aims to transform web documents to information. Meaningful data is called as information. It involves creation of common framework that leads to sharing of data and its reuse among various applications. Application of semantic technologies covers areas like data integration, knowledge discovery, and resource discovery, classification of data and designing of intelligent systems.

Figure 1: “Evaluation of Semantic Web”

Index Copernicus Value: 3.0 - Articles can be sent to [email protected]

64

3

Evaluation and Validation of Ontology Using Protege Tool

Ontology Ontology is treated as formal, explicit specification of shared conceptualization . Besides its formal nature, philosophical aspects, handling real world scenarios, it also acts as medium of linking between human and machines. Ontology in itself is a vast research area that includes mapping, merging, extraction, moving and evaluation of ontologies. Ontology evaluation approaches are classified in following categories.

•

On basis of comparing ontologies

•

On basis of usage and application of ontologies

•

On basis of set of documents related to domain ontology

•

On basis of human evaluation in order to meet ontology requirements and compatibilities. Components of ontology include classes, properties, instances, inheritance functions, slots, frame values and sub

classes. Relationship between classes and subclasses is defined by super concept-sub-concept and is defined as

isahierarchy. Example: There is class named Institute with its sub classes IIT, IIIT, NIT’s. So, it is represented as IIT is a subclass of Institute. There are social ontologies that help to achieve interoperability among social web applications in order to move from social web to semantic semantic web. Some of them include FOAF, SIOC, XFN, GoodRelations, RSS Feeds

and many more.

Figure 2: “Ontology and its Constituents”

CLASSIFICATION OF ONTOLOGY EVALUATION APPROACHES Ontology evaluation approaches are classified in following categories.

•

On basis of comparing ontologies

•

On basis of usage and application of ontologies

•

On basis of set of documents related to domain ontology

•

On basis of human evaluation in order to meet ontology requirements and compatibilities.


65

4


CURRENT APPROACHES IN ONTOLOGY EVALUATION AND VALIDATION A craze towards learning of semantic web and its technologies has led to development of huge number of ontologies that needs to be evaluated and validated based on following approaches are stated below: Evolution Based As the word suggests evolution specifies evolving from time to time. It is known that ontologies vary from time to time that leads to enhancement of knowledge and more precise results. This method tracks changes and improvements done in existing ontology when subjected to different versions. As per Noy, evolution in ontology is being caused due to three features mainly-variable domain, changes in specification and conceptualization. Logical (Rule-based) This approach makes use of rules for deducing inferences in given ontology. When object properties of two classes are different from each other, they are represented as (owl: different from) or disjoint from each other as (owl: disjoint with). The rule for corresponding query follows some syntax. If query is “My aunt is my father’s sister”, then represent it with rule in corresponding family ontology. Rule is written as: Rule1_//_MyAunt_is_my_father’s_sister. Metric-Based (Feature-based) This approach provides quantitative scenario to given ontology as it tracks variations of classes and properties of source ontology and target ontology. It also performs operations like union, intersection between two schemas of ontology that leads to distributed percentage of instances of given schema.

METHODS FOR EVALUATING ONTOLOGIES OntoMetric Why OntoMetric? Choosing ontology for new project among various domains is one of the major problems that are being faced by knowledge engineers. Till now, ontologies are being chosen by little experience of researchers but it should be chosen by taking its schema into consideration. What is OntoMetric? This method consists of set of processes that user must select in order to determine compatibility and selection of ontologies. It is used for selection of optimal ontology among various domains and making it compatible as per standards of given project. OntoMetric is based on Analytic Hierarchy Process (AHP) that also provides methods for reusing of ontologies. AHP considers dimensions that need to be checked before using ontology. Following are the features of AHP: •

Content of ontology

•

Ontology implementation language

•

Steps required for building ontology


66

5


•

Platform used

•

Const incurred in building ontology So, it is concluded that OntoMetric acts as quantitative measure for every candidate ontology by using

dimensions. OntoClean Why OntoClean? Finding meta-relations among concepts is not much easier task. It requires cleaning of ontologies. What is OntoClean? Following are properties of OntoClean: •

Rigidity- It defines the links between property and individuals. A property is said to be rigid iff it is vital for all its mentioned instances. A property is non rigid if it is not vital for some of its instances and it is anti-rigid iff it is not vital to all its instances.

•

Unity- This property specifies that parts of schema are unified if they are found by joining instances to common relation R as represented by ˂R € I1˅I2˅…In ˃ There are two building blocks that play vital role in implementation of OntoClean:

•

Set of axioms that specifies given requirements and constraints of given ontology.

•

A Meta ontology that is also called as Schema hierarchy. It includes object properties and data type properties.

OntoQA It is implemented in form of java application that employs Sesame (open source framework). It acts as RDF repository in which various OntoQA components are used as: •

Ontology- It finds metric values of ontology. Ontology schema holds following elements viz. Classes (C), Properties (P), Instances (I) and Inheritance Functions (HC). Knowledge base of ontology holds following elements viz. Instances (I), Class Instantiation Function (CF) and Relationship Instantiation Function (RI)

•

Ontology and Keywords- It uses Word Net to find synonyms of given terms used in given ontology. It also uses above calculated metric values to determine overall quality value of ontology.

•

Keywords- OntoQA makes use of Swoogle-a crawler based meta search engine that finds RDF and OWL documents in context of entered keywords.

Prompt It is one of plug-in and acts as tool for comparing ontologies. It is partial not complete algorithm for representing ontologies. With the help of this plug-in, user can perform various functions on given ontology like comparison, merging, mapping, extract features from source ontology and move it to target ontology. It holds various features that are used to provide suggestions and reduces conflicts between ontologies. These features cover:


67

6


•

Classes and slots used for merging

•

Hierarchy of both schemas (PROMPT will give better suggestion if two classes are similar because they are easier to merge)

•

Attachment of slots with respect to classes

•

Facets and their values (It is required to restrict range of classes while merging their slots) Besides this, PROMPT also helps to identify conflicts that are among the following-

•

Naming conflicts

•

Null references

•

Redundant classes

Working of PROMPT Algorithm: Input- Source ontology and target ontology Steps are: •

List of common classes is being created and matched

•

Operation is performed by user on basis of PROMPT’s suggestions

•

PROMPT performs operations automatically and lists extra changes to do with ontology.

•

Generates suggestion list on basis of structure of ontology

•

Determine conflicts occurred while merging both ontologies and provide its solutions. In our Protégé-based implementation, we use Protégé component-based architecture to allow the user to plug in

any term-matching algorithm.

Figure 3: The Flow of Prompt Algorithm The gray boxes show the action performed by the Prompt tool. And the white boxes show the actions performed by the user.


68

7


Modes of PROMPT Tool The PROMPT tab allows you to manage multiple ontologies in Protégé-2000. Using PROMPT you can: •

Merge two ontologies into one;

•

Extract a part of an ontology;

•

Move frames from included to including project;

•

Compare two versions of the same ontology and create a merged version Except for move frames, these operations create a copy of the ontology in your working project and leave the

original project intact. In the case of move frames, however, both the included and including projects are changed. Merge Mode It lets users to merge two existing ontologies and develops new ontology from them. The original ontologies are left untouched. In merge mode, the Prompt Tab has three sub tabs: •

The Suggestions Tab

•

The Conflicts Tab

•

The New Operations Tab With the help of these tabs, merging and mapping processes can be done effectively.

Extract Mode It lets users to extract/retrieve some subset features of knowledge base that contributes to part of source ontology. Retrieved ontology can either be saved as new .owl file or moved in existing project but source ontology left unchanged. In extract mode, the Prompt Tab has two sub tabs: •

The Suggestions Tab

•

The New Operations Tab The Conflicts tab does not appear in extract mode. Via these tabs, PROMPT guides you through the extract process, making suggestions based on the frames you

have already copied. Moving Frames Mode Moving frames mode allows us to move frames from an included to an including project. This is the only mode that alters the original ontology as well as the target ontology. It is a good idea to make copies of both the including and included ontologies before you enable moving frames mode. There is inclusion mechanism in Protégé that led to reusing of ontologies and their frames from existing project. This mode allows moving of frames of existing ontology to included project.


69

8


In moving frames mode, the Prompt Tab has two sub tabs: •

The Conflicts Tab

•

The New Operations Tab

The Suggestions tab does not appear in moving frames mode. Via these tabs, PROMPT guides you through the moving frames process, identifying conflicts and proposing conflictresolution strategies. Compare Mode This mode is only used for comparing two ontologies by finding frame changes in them. Like merge, compare uses a number of heuristics to make a best guess as to changes and correspondences. In this case, the heuristics behind the standard PROMPT merge can be modified. When the two projects to be merged are known to be different versions of the same ontology, Protégé can use stronger heuristics, suggesting, for example, that a single unmatched sibling of the same parent in the two different versions may be the same frame with a different name. By concentrating on the differences between two similar projects (rather than the similarities of two different projects as in merge), compare can give much better results for the type of reconciliation required in version control. PROMPT Operations PROMPT is used to copy, merge, move and extract information from source ontologies to target ontologies and vice versa. Except move operation, source ontology remains unchanged because in move operation, source ontology classes have been added into current project. The operations available depend on our initial choice of how the mode for incorporating our source ontologies into the working project. The available operations for each type of PROMPT action are as follows: Table 2: Modes of Operations in PROMPT Mode

Available Operations

Merge Mode

Merge Classes, Merge Slots, Merge Instances, Copy Class, Copy Slot, Copy Instance, Remove Parent

Extract Mode

Copy Class, Copy Instance, Copy Slot

Moving Frames Mode

Move Class, Move All Instances Of Class, Move Instance, Move Slot

Compare Mode

View Only; No Operations Available

CONCLUSIONS & FUTURE SCOPE Different kinds of mismatches that could happen in ontology integration and sketched the current solutions to reconcile different mismatches have been discussed in this paper and also argued that mappings are crucial components for many applications. Many works on ontology mapping have been done in the context of a particular application domain.


70

9


ACKNOWLEDGEMENTS I, Vishal Jain, would to give my sincere thanks to Dr. M. N. Hoda, Director, Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi, for giving me opportunity to do Ph.D from Linagay’s University, Faridabad.

REFERENCES 1.

Nuno Silva and João Rocha : Ontology Mapping For Interoperability In Semantic Web. GECAD - Knowledge Engineering and Decision Support Research Group.

2.

T. Berners-Lee. The semantic web. Scientific american, 284(5):35–43, 2001.

3.

A.Valente, T. Russ, R. MacGrecor, and W. Swartout. Building and (re)using an ontology for air campaign planning. IEEE Intelligent Systems, 14(1):27–36, 1999.

4.

T. R. Gruber. Translation approach to portable ontology specification Knowledge Acquisition, 5(2):199–220, 1993.

5.

J. Heflin, J. Hendler, and S. Luke. Coping with changing ontologies in a distributed environment. In Proceeding of the AAAI workshop on ontology management, 1999.

6.

N. F. Noy and M. A. Musen. Prompt: algorithm and tool for automated ontology merging and alignment. In Proceeding of Seventeenth National Conference on Artificial Intelligence (AAAI-2000), 2000.

7.

Natalya F. Noy and Deborah L. McGuinness Stanford University, Stanford, CA, 94305 Ontology Development 101: A Guide to Creating Your First Ontology

8.

H. Chalupsky. Ontomorph: A translation system for symbolic logic. In A. G. Cohn, F. Giunchiglia, and B. Selman, editors, KR2000: Principles of Knowledge Representation and Reasoning, pages 471–482, San Francisco, CA, 2000. Morgan Kaufmann.

9.

H. Chalupsky. A translation system for symbolic knowledge. In Proceedings of the 7th International Conference on Principles of Knowledge Representation and Reasoning, 2000.

10. M. Klein. Combining and relating ontologies: an analysis of problems and solutions. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI-01), Workshop: Ontologies and Information Sharing, Seattle, USA, 2001. 11. P. Visser, D. M. Jones, T. Bench-Capon, , and M. Shave. An analysis of ontological mismatches: Heterogeneity versus interoperability. In AAAI 1997 Spring Symposium on Ontological Engineering, Stanford, USA, 1997. 12. P. Visser, D. M. Jones, T. Bench-Capon, and M. Shave. Assessing heterogeneity by classifying ontology mismatches. In Proceedings of the International Conference on Formal Ontology in Information Systems (FOIS98), Trento, Italy, 1998. 13. G. Wiederhold. An algebra for ontology composition. In Proceedings of 1994 Monterey Workshop on Formal Methods, pages 56–61, CA, USA, 1994.


71

10


14. P. Visser and T. Bench-Capon. On the reusablity of ontologies in knowledge-system design. In Proceeding of the seventh International Workshop on Database and Expert Systems Applications, pages 256– 261, 1996. 15. S. Melnik and S. Decker. A layered approach to information modeling and interoperability on the web. In Proceedings of the ECDL 2000 Workshop on the Semantic Web, Lisbon, Portugal, 2000. 16. J. Euzenat. Towards a principled approach to semantic interoperability. In A. Gomez-Perez, M. Gruninger, H. Stuckenschmidt, and M. Uschold, editors,Workshop on Ontologies and Information Sharing, IJCAI01, Seattle, USA, 2001. 17. V. Chaudhri, A. Farquhar, R. Fikes, P. Karp, and J. Rice. Okbc: A programmatic foundation for knowledge base interoperability. In Proceedings of AAAI-98, pages 600–607, 1998. 18. S. Bowers and L. Delcambre. Representing and transforming model-based information. In Proceedings of the FirstWorkshop on the Semantic Web at the Fourth European Conference on Digital Libraries, Lisbon, Portugal, 2000. 19. D. Calvanese, S. Castano, F. Guerra, D. Lembo, M. Melchiorri, G. Terracina, D. Ursino, and M. Vincini. Towards a comprehensive framework for semantic integration of highly heterogeneous data sources. In Proceedings of the 8th InternationalWorkshop on Knowledge Representation meets Databases (KRDB 2001), 2001. 20. D. Calvanese, D. G. Giuseppe, and M. Lenzerini. Ontology of integration and integration of ontologies. In Proceedings of the International Workshopon Description Logic (DL 2001), 2001. 21. B. Richardson, L.J. Mazlack. Approximate Ontology Merging For The Semantic Web. University of Cincinnati, Cincinnati, United States. Published in IEEE, 2004. 22. A.Alosoud, V. Haarslev, N. Shiri. Deptt. Of Computer Science and Software Engineering, Concordea Univesity, Montreal, Quebec, Canada. Published in Journal of Information Science XX(X) 2008 pp. 1-20. JIS-0759-v3. 23. A.Sheth and J. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computer Survey, 22(3), 1990. 24. J. Madhavan, P. A. Bernstein, P. Domingos, and A. Halevy. Representing and reasoning about mappings between domain models. In Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence (AAAI 2002), pages 80–86, Edmonton, Alberta, Canada., 2002. AAAI Press. 25. A.Maedche, B. Motik, N. Silva, and R. Volz. Mafra - a mapping framework for distributed ontologies. In Proceedings of the 13th European Conference on Knowledge Engineering and Knowledge Management EKAW, Madrid, Spain, 2002. 26. D. McGuinness, R. Fikes, J. Rice, and S. Wilder. An environment for merging and testing large ontologies. In Proceedings of the 7th International Conference on Principles of Knowledge Representation and Reasoning, Colorado, USA, 2000. 27. S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its


72

11


application to schema matching. In Proceedings of the International Conference on Data Engineering (ICDE), 2002 28. N. F. Noy and M. A. Musen. Prompt: algorithm and tool for automated ontology merging and alignment. In Proceeding of Seventeenth National Conference on Artificial Intelligence (AAAI-2000), 2000. 29. W. Litiwin, L. Mark, and N. Roussopoulos. Interoperability of multiple autonomous databases. ACM Computer Survey, 22(3):267–293,1990. 30. R. Hull. Managing semantic heterogeneity in databases: A theoretical perspective. Proceeding of the 16th ACM SIGACT SIGMOD SIGART Symposium on Principle of Database systems (PODS’97), pages 51–61, 1997 31. T. Catarci and M. Lenzerini. Representing and using interschema knowledge in cooperative information systems. Journal of Intelligent Cooperative Information Systems, 2(4):375–398, 1993. 32. C. Batini and M. Lenzerini. A comparative analysis of methodologies for database schema integration. ACM Computer Surveys.18(4), 1986. 33. E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal, 10:334–350, 2001. 34. D. McGuinness, R. Fikes, J. Rice, and S. Wilder. An environment for merging and testing large ontologies. In Proceedings of the 7th International Conference on Principles of Knowledge Representation and Reasoning, Colorado, USA, 2000. 35. N. F. Noy and M. A. Musen. Prompt: algorithm and tool for automated ontology merging and alignment. In Proceedings of Seventeenth National Conference on Artificial Intelligence (AAAI-2000), 2000. 36. G. Stumme and A. Maedche. FCA-Merge: Bottom-up merging of ontologies. In Proceedings of the International Joint Conference on Artificial Intelligence IJCAI 01., Seattle, USA, 2001. 37. D. Beneventano, S. Bergamaschi, F. Guerra, and M. Vincini. The momis approach to information integration. In ICEIS 2001, Proceedings of the 3rd International Conference on Enterprise Information Systems, Portugal, 2001. 38. D. Beneventano, S. Bergamaschi, I. Benetti, A. Corni, F. Guerra, and G. Malvezzi. Si-designer: A tool for intelligent integration of information. In 34th Annual Hawaii International Conference on System Sciences (HICSS-34). IEEE Computer Society, 2001. 39. N. F. Noy and M. A. Musen. Evaluating ontology-mapping tools: Requirements and experience. In n the Proceedings of the Workshop on Evaluation of Ontology Tools at EKAW’02 (EON2002), Siguenza, Spain, 2002.


73

12


AUTHORS

Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing P.hD from Computer Science and Engineering Department, Lingaya’s University, Faridabad. Presently he is working as Assistant Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE.

Dr. S. V. A. V. Prasad has completed M.Tech., Ph.D. Presently working as professor and Dean School of Electrical Sciences. Sir has actively participated and organized many refresher courses, seminars, workshops on ISO, ROHS, Component Technology, WEEE, Organizational Methods, Time Study, Productivity enhancement and product feasibility etc. Sir has developed various products like 15 MHz dual Oscilloscope, High Voltage Tester, VHF Wattmeter, Standard Signal, Generator with AM/FM Modulator, Wireless Becom, High power audio Amplifier, Wireless microphone and many more in the span of 25 years (1981 – 2007). Sir has awarded for excellence in R&D in year 1999, 2004 and National Quality Award during the Year 1999, 2000, 2004 and 2006.Sir is Fellow member of IEEE and life member of ISTE, IETE, and Society of Audio & Video System. Sir has published more than 90 research papers in various National & International conferences and journals. Sir research area includes Wireless Communication, Satellite Communication & Acoustic, Antenna, Neural Networks, and Artificial Intelligence.


74

I.J. Education and Management Engineering, 2016, 3, 9-19 Published Online May 2016 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijeme.2016.03.02 Available online at http://www.mecs-press.net/ijeme

Ontology Engineering and Development Aspects: A Survey a

Usha Yadav, b Gagandeep Singh Narula, c Neelam Duhan and d Vishal Jain

a

Research Scholar, Department of Computer Engineering, YMCA University of Science and Technology, Faridabad, India b Research Scholar, M.Tech CSE (IV Sem), C-DAC, Noida c Associate Professor, Department of Computer Engineering, YMCA University of Science and Technology, Faridabad, India d Assistant Professor, Bharati Vidyapeeth’s Institute of Computer Applications (BVICAM), New Delhi

Abstract Ontology can be defined as hierarchical representation of classes, sub classes, their properties and instances. It has led to understanding the concepts of given domain, deriving relationships and representing them in machine interpretable language. Ontologies are associated with different languages that are used in mapping of multiple ontologies. Several applications of ontologies have led towards realization of semantic web. The current web (2.0) is approaching towards semantic web (3.0) that performs intelligent search and stores results in distributed databases. The paper makes readers aware of various aspects of ontology like types of ontology, ontology development life cycle phases, activities involved in ontology development and ontology engineering tools. Ontology engineering contributes to meaningful search and provides with open source tools for deploying and building ontologies. Index Terms: Semantic web, Ontology, RDF, OWL and Ontology Engineering. © 2016 Published by MECS Publisher. Selection and/or peer review under responsibility of the Research Association of Modern Education and Computer Science.

1. Introduction Ontology is considered as backbone of semantic web. It is essential to understand meaning of each term along with classes, properties and instances associated with it. It is possible only if we have categorized information. This categorization of information in hierarchical manner is termed as Ontology. It itself is associated with wide range of concepts like merging, mapping, engineering and development which are considered as individual research areas. Ontology mapping and merging involves integration, aligning and reusing of data so that it can be used with existing web applications. Ontology engineering * Corresponding author. E-mail address: [email protected], [email protected], [email protected], [email protected]

75

Ontology Engineering and Development Aspects: A Survey

10

involves use of automatic tools for managing, mapping and integrating ontologies in order to extract knowledge from them. An ontology mapping approach must fulfill given requirements:     

It should be able to generate mappings automatically. User should be able to accept, reject and add mappings. Mappings should be defined and organized systematically. Knowledge must be explored from derived mappings. Hierarchical relationships between concepts and instances must be considered

The given paper is categorized into following sections. Section 2 gives brief overview of semantic web in order to relate it with ontology. Section 3 defines various definitions of ontology and its types. It also defines classes, properties and instances of given sentences in OWL format. This section also throws some light on importance of ontology in semantic web. Section 4 describes scope of ontology engineering by introducing METHONTOLOGY approach. Section 5 extends this METHONTOLOGY approach by discussing various activities needed in development of ontologies. Section 6 presents survey of various ontology engineering tools followed by conclusion and references. 2. Semantic Web (Futuristic Web 3.0) The word semantic conveys “what it means” instead of just focusing on its structure. Semantic web aims to focus on discovering meaning of content that can be understandable by machines as well as humans. It is futuristic web (web 3.0) whose target is to present capabilities of an agent in performing specific tasks. An agent can be entity with/without body who works in an autonomous way by communicating with other agents for performing specific tasks. For instance, one of the agents utilized for extracting data is mining agent while facilitator agent is used for communicating with each other. Few views about nature of semantic web are as follows: (a) The machine readable data view: The idea of semantic web as envisioned by Tim Berners Lee defines data on web and linked in such a way that it can be understandable by machines and utilizing it in various applications [1]. (b) The intelligent agent’s view: It involves use of agents for retrieving and manipulating data in order to make searching process intelligent. (c) The distributed database view: The information retrieved is stored in databases that need to be mapped to ontologies for knowledge representation of concepts. Semantic web provides flexibility by linking databases and deriving logic rules from them in order to make data more understandable. (d) The improved searching view: Semantic web employs use of agents that performs meaningful search and produces relevant results in context of given query. With the help of Web 3.0, it will become possible to have semantic based search instead of traditional keyword based search. 3. Definitions of Ontology Guarino and Giaretta (1995) defined seven definitions of ontology (Fig 1) that can be used in different disciplines viz philosophical discipline, formal, specified, conceptualization of system using logical theory and many more. They stated that logical theory helps developers to build ontology as “A logical theory which gives an explicit, partial account of a conceptualization.” [3].

76


11

Fig.1. The Seven Definitions of Ontology

In other words, Ontology can be defined as hierarchical representation of classes, sub classes, their properties and instances. It has led to understanding the concepts of given domain, deriving relationships among them and representing them in machine interpretable language [2]. Ontology as philosophical science implies what is structure of objects, events, properties in every area of reality. Ontologies are associated with different languages that are used in mapping of multiple ontologies. Initially RDF and XML were developed in which XML specifies syntax of content rather than its semantics while RDF points to semantics of data. After these startup languages, more expressive and defined languages came into existence. One of the most common ontology languages includes OWL (Web Ontology Language) developed by W3C, DARPA Agent Markup Language (DAML) developed by Defense Advanced Research Projects Agency (DARPA), OIL (Ontology Interface Language) developed by Europeans and DAML+OIL. 

Below is an example of OWL specification: Football and Basketball are sports. Football is not Cricket. Football is not Basketball. < owl: Class rdf: about = “#Football”>



Below is an OWL specification for Property. Gagan plays football. < owl: ObjectProperty rdf: about = “#plays”>

77


12



Below is an OWL specification for property restriction. Basketball is only being played by collegestudents. < owl: Class rdf: about = “#”Basketball”>

3.1. Types of Ontology [6] Ontologies are generic knowledge bases that are sharable and acceptable. They are of different types:  Upper ontology Existing well defined structures of given domain point towards upper ontology. IEEE P1600 standard defines standard upper ontology (SUO) which covers 3D and 4D modeling dimensions of multiple ontologies. But 3D and 4D visualization aspects of ontologies have not been reached yet.  Heavy-weight and Light-weight ontologies Light weight ontologies covers traditional search engines like Google, Yahoo which presents hierarchical structure by taking minor conceptual domains into consideration. Heavy weight ontologies create hierarchical structure by paying attention to each concept, their principles as per philosophical motives and develop semantic relations among classes and subclasses. Example of heavy weight ontologies is upper ontology that builds instance models.  Domain and Task ontologies Task ontologies are used to specify features of architecture of given knowledge based system whereas domain ontologies specify features of knowledge related to given domain on which various tasks like diagnosis, monitoring, design etc. are performed. Task ontology provides theory of all concepts/vocabularies used in existing structure while domain ontology defines relationship among classes, properties and instances used in designed ontology. 3.2. Construction of Ontologies  Building ontologies manually According to Noy and McGuiness [4], following are the steps required in the development of ontologies manually. 1.

2.

Scope identification: - Ontology is a knowledge base model consisting of classes, properties and their instances (CPI) related to particular domain. So, it is viable to determine scope of given problem and then proceed further. Elaborate resources: - Specify tools and editors used to build ontology and formalize them.

78


3.

4. 5. 6. 7.

13

Define taxonomy: - After identification of scope and resources, the concepts are organized in hierarchical fashion. It is called as taxonomy. If football is subclass of sport, then all instances in football must be an instance of sport class. Define properties: - Properties needs to be specified from statements. Example: Students read books. Read is their property. Define facets: - It implies use of RDF Schema (rdfs) to express desired ontologies. Define instances: - If book is class, title is subclass and contents in this chapter are instances. Verify for ambiguities: - Ontologies need to be verified so that they cover scope of given domain.

 Re-using existing ontologies Ontologies available from third parties and existing online ontologies can be improved, re-used and modified with the help of web ontology editor tools like Web Protégé, Hozo, Knoodl, Vitro etc.

Fig.2. Gruber Ontology Structure

3.3. Role of Ontology in Semantic Web  Ontology is considered as backbone of semantic web that plays vital role in representing meaningful information from huge amount of unstructured web data.  Removes word sense disambiguation (WSD),  Re-use and analyze domain knowledge.  Helpful in solving reasoning problems, classification and problem solving techniques.  Helps in achieving interoperability in semantic web.

Fig.3. Moving Towards Semantic Web

79


14

4. Ontology Engineering Automated use of tools for developing ontologies is called as Ontology Engineering. It is emerging areas in knowledge management and semantic web development sectors. In other words, ontology engineering is a set of activities that are performed during various phases of ontologies like conceptualization, formulization, design and deployment of ontologies. The main target of ontology engineering is “To build models of every domain and interpret them in machine language”. It also includes concepts related to target domain and derives relationships among them. Various ontology editor tools are available for building and deploying ontologies. Some of them are Protégé, OntoEdit, Hozo and many more. These tools consists manuals for development of ontologies and their axioms. Ontology engineering is time taking task that involves life cycle models. One of the examples of life cycle model is METHONTOLOGY approach. This approach performs development oriented activities (specifying requirements, conceptualization, design, formalization of conceptual model related to particular domain and maintenance of ontologies), support or integral oriented activities (knowledge gathering, integration and documenting ontologies) and management activities [8]. The scope of ontology engineering varies widely but is not limited to philosophy, reusing and sharing knowledge and designing ontologies. 5. Activities Involved In Methontology Approach for Ontology Development Development of ontology is as complex as measuring quality of software. It requires each and every minute detail of activities and tasks from plinth to paramount. Ontology engineer needs to go through all development methodologies and existing design principles for reaching at some conclusions. Development-oriented activities are sub divided into pre-development, development and post development processes that occur sequentially while Support oriented activities are conducted in parallel manner in conjunction with any of development activities. 5.1. Management Activities These activities are performed for identifying type of tasks and verifying that designed ontology has covered required specifications.  Scheduling: This activity defines timing of tasks, their dependencies and allocation of resources.  Quality Assurance: This activity assures that the quality of designed ontology and its documentation satisfy user requirements.

5.2. Development-Oriented Activities These activities are the backbone of ontology development process. It has three sub activities- predevelopment, development and post development which are defined below.  Pre-development Activities Type of environment: It specifies type of application platform to be used for developing ontologies. It also includes selection of ontology editors as tools for defining classes, properties and their instances. Feasibility study: It checks if given ontology can be built in given environment and complies with user requirements.

80


15

 Development Activities Specification phase: - This phase consists of following activities-domain vocabulary definition, identifying resources, identifying axioms, identifying relationships, identifying data characteristics, applying constraints and verification. Domain vocabulary definition defines name and properties related to given domain. Identifying resources means grouping different URI‟s into single class. Identifying axioms are structures that define behavior of concepts. Identifying relationships means relating class with their subclasses with properties and instances. Identifying data characteristics presents features of resources and their relationships. Applying constraints defines restrictions to be used with classes and properties. Verification means checking proposed ontology model for correctness. Conceptualization phase: - This phase creates a model in context of given domain and presents knowledge at knowledge level. Several strategies are presented for defining conceptualization viz. top down approach and bottom up approach. Top down approach begins with super class that extends to refine ontology structure. This approach is mostly used in philosophical sciences. Bottom up approach begins with databases that are sources of multiple data and then does refinement to develop suitable ontology. This process is followed by information extraction (IE) and ontology learning tools like Text-to-Onto. Design Phase: - This phase proposes physical structure of designed ontology that is based on RDF model. RDF model consists of three triples-Resource, Property and Value. Example of RDF statement: Potato is cultivated in brown soil with fertilizer value K567. Here potato is resource, brown soil is property and K567 is value. Formalization phase: - This phase produces ontology as output by using ontology tools. It transforms conceptual model into formal model that can be re-written in suitable syntax. Implementation phase: - It implements formalized ontology with the help of any of semantic languages (OWL, SPARQL) and executing them using some reasoner like Pellet OWL Reasoner.  Post-development Activities Maintenance: Ontologies must be updated from time to time so that user gets effective and latest results. It has led to continuous evolvement of ontologies. Ontology maintenance approaches may be centralized or decentralized. Centralized specifies whole control of maintaining ontology is under single entity or group of entities. De-centralized approach provides control to everyone involved in maintenance team. It is faster and cheaper because tasks are divided among several entities. Well defined guidelines must be provided in context of maintenance of ontologies [8]. Re-using existing ontologies: - Ontologies can be re-used by various applications for formalizing knowledge into understandable form. First ontology can re-use second ontology by referencing elements of second ontology in its axioms. The rules or axioms of existing ontologies are adapted to generate new ontology. It is much easier task rather than developing ontology from scratch. 5.3. Support Activities Knowledge acquisition: This activity gathers knowledge from different sources of information stored in repositories. The multiple sources may include domain expert knowledge, online books and ontologies. Integration: Combining multiple existing ontologies in order to generate new ontology is called Integration of ontologies. The process holds ontology merging and ontology alignment. Ontology merging obtains new ontology derived from several ontologies belonging to similar domain while ontology alignment is used for identifying mapping between source ontologies. Proper documentation is needed in order to reuse and integrate existing ontologies. Configuration management: It specifies identification, documentation, recording and reporting of different

81


16

versions of ontology. Project management tools and ontology editing environments can be used for performing process configuration management like change request form, control and many more.

Fig.4. Activities of the Ontology Development Process Table 1. Comparison of Ontology Engineering Tools

Tool S.No

Protégé

WebODE

OntoEdit

Features 1.

Collaborative

Yes

Yes

No

2.

Backup recovery

No

yes

No

Yes

No

Yes/No

3.

Querying

4.

Import

RDF(s), OWL

RDF, OWL, DAML+OIL

RDF, DAMIL+OIL

5.

Storage

Files, JDBC

JDBC

Files

6.

Exception handling

No

No

No

Pellet

Prolog

OntoBroker

OKBC

HTML and Java

Flogic

PROMPT

ODE Merge

Yes

Java

Java

Java

7. 8. 9. 10.

Reasoner Base language Merging Implementation language

82


17

6. Ontology Engineering Tools Various tools that support ontology engineering have been identified in Duineveld et al. (2000). Further evaluations on ontology evaluation frameworks have been done in EON 2002 workshop (Sure and Angele, 2002). KAON OImodeller (Bozsak et al., 2002; Motik et al., 2002) falls in category of KAON ontologies. It provides easy integration of ontologies into an enterprise infrastructure. KAON suite consists of KAON24 which is one of semantic web rule languages (SWRL). Protégé´(Noy et al., 2000) was first ontology editor that works with RDF and OWL and is provided with built in plug-ins. WebODE (Arpirez et al., 2001) is an „ontology engineering workbench‟ that provides various services for ontology engineering. OntoEdit (Sure et al., 2002, 2003) is an open source framework that integrates and manages services of different extensions of given ontology. It also derives inferences on basis of different extensions of ontologies. 7. Conclusion and Future Scope Ontology is considered as backbone of semantic web. The paper describes almost every aspect related to ontology. It covers ontology development, ontology engineering, activities needed in METHONTOLOGY approach for development of ontology and ontology engineering tools. The activities cover development, support and management of ontologies. These aspects of ontologies can be used for future development of knowledge bases. Ontologies can be re-used in different domains with the help of ontology engineering tools. Ontology engineering contributes to semantic directed search and helps in solving real world problems.

References [1]

[2] [3]

[4]

[5] [6] [7] [8] [9]

[10] [11]

P. Lambrix, “Towards a Semantic Web for Bioinformatics using Ontology-based Annotation”, in proceedings of the 14th IEEE international workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises, 2005, pp.3-7. Sowa, J.F., “Ontology, metadata, and semiotics”, Lecture Notes in AI #1867, Springer-Verlag, Berlin, 2000, pp. 55-81. Gagandeep Singh, Vishal Jain, “Information Retrieval (IR) through Semantic Web (SW): An Overview”, in proceedings of CONFLUENCE-The Next Generation Information Technology Summit, 27-28 September 2012, pp 23-27. Natalya F. Noy and Deborah L. McGuinness. “Ontology Development 101: A Guide to Creating Your First Ontology”'. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, March 2001. T. Gruber, “Towards principles for the design of ontologies used for knowledge sharing”. Int. J. of Human and Computer Studies, 43(5/6):907–928, 1995. Riichiro Mizoguchi, “Tutorial on ontological engineering”, New Generation Computing, 22(2004), 6196, Ohmsha Ltd and Springer Verilag, 2004. Dayal U, Kuno H, “Making the Semantic Web Real”, IEEE Data Engineering Bulletin, Vol.26, No.4, pp 4-7, 2003. Rudi Studer, Stephan Grimm, and Andreas Abecker (Eds.), “Semantic Web Services: Concepts, Technologies and Applications”, Springer, 2007. Vishal Jain, Dr. Mayank Singh, “Ontology Based Information Retrieval in Semantic Web: A Survey”, International Journal of Information Technology and Computer Science (IJITCS), MECS publishers, 2013, 10, 62-69. Amjad Farooq and Abad Shah, “Ontology Development Methodology for Semantic Web System”, Pakistan Journal of Life Social Sciences, Vol.6 No.1, May 2008, pp 50-58. J. Mayfield, “Ontologies and text retrieval”, Knowledge Engineering Review, 2007.

83


18

[12]

[13] [14] [15] [16]

[17] [18]

[19]

Vishal Jain, Dr. Mayank Singh, “Ontology Development and Query retrieval using Protégé tool”, International Journal of Intelligent Systems and Applications (IJISA), MECS publishers, 2013, 09, 6775 Kaushal Giri, “Role of Ontology in Semantic Web”, DESIDOC Journal of Library & Information Technology, Vol.31 No.2, March 2011, pp 116-120. S. Luke, L. Spector, D. Rager and J. Hendler, “An Introduction to Ontology”, “In Proceedings of the First International Conference on Autonomous Agents (Agents 97)”, pages 59-66, 1997. M. Preethi, Dr. J. Akilandeswari, “Combining Retrieval with Ontology Browsing”, “International Journal of Internet Computing (IJIC), Vol.1, Issue-1”, 2011. Abdeslem DENNAI, Sidi Mohd., “Semantic Indexing of web documents based on domain ontology”, International Journal of Information Technology and Computer Science (IJITCS), MECS publishers, 2015, 02, 1-11. Berners Lee, J.Lassila, “Ontologies in Semantic Web”, “Scientific American”, May (2001) 34-43. Gagandeep Singh, Vishal Jain, Dr.Mayank Singh, “Ontology Development Using Hozo and Semantic Analysis for Information Retrieval in Semantic Web” in „ICIIP-2013 IEEE Second International Conference on Image Information Processing „, Jaypee Univ. Shimla, 9-11 Dec 2013. Xin Shi, Shurong Tong, “Ontology mapping of design process knowledge based on classification”, International Journal of Education and Management Engineering (IJEME), MECS publishers, 2011, 5, 17-25.

Authors’ Profiles Usha Yadav received her B.E. in Information Technology with Honors from Maharshi Dayanand University, Rohtak in 2009 and M.Tech with Honors in Computer Engineering from YMCA University of Science and Technology, Faridabad in 2011. She is pursuing her Ph.D in Computer Engineering from YMCA University of Science and Technology, Faridabad. She is currently working as a Project Engineer in CDAC, Noida and has three years of experience. Her areas of interest are search engines, social web and semantic web.

Gagandeep Singh Narula received his B.Tech in Computer Science and Engineering from Guru Tegh Bahadur Institute of Technology (GTBIT) affiliated to Guru Gobind Singh Indraprastha University (GGSIPU), New Delhi. Now, he is pursuing M.Tech in Computer Science from CDAC Noida affiliated to GGSIPU. He has published various research papers in various national, international journals and conferences His research areas include Semantic Web, Information Retrieval, Data Mining and Knowledge Management. He is also a member of CSI.

Dr. Neelam Duhan received her B.Tech. in Computer Science and Engineering with Honors from Kurukshetra University, Kurukshetra and M.Tech with Honors in Computer Engineering from Maharshi Dayanand University, Rohtak in 2002 and 2005, respectively. She completed her PhD in Computer Engineering in 2011 from Maharshi Dayanand University, Rohtak. She is currently working as an Assistant Professor in Computer Engineering Department in YMCA University of Science and Technology, Faridabad and has an experience of about 12 years. She has published over 30 research papers in reputed

84


19

international Journals and International Conferences. Her areas of interest are databases, search engines and web mining.

Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing PhD in Computer Science and Engineering Department, Lingaya‟s University, Faridabad. Presently, He is working as Assistant Professor in Bharati Vidyapeeth‟s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE.

How to cite this paper: Usha Yadav, Gagandeep Singh Narula, Neelam Duhan, Vishal Jain,"Ontology Engineering and Development Aspects: A Survey", International Journal of Education and Management Engineering(IJEME), Vol.6, No.3, pp.9-19, 2016.DOI: 10.5815/ijeme.2016.03.02

85

Indian Journal of Science and Technology, Vol 9(16), DOI: 10.17485/ijst/2016/v9i16/88524, April 2016


Development and Visualization of Domain Specific Ontology using Protege Usha Yadav1, Gagandeep Singh Narula2*, Neelam Duhan3, Vishal Jain4 and B. K.Murthy1 Centre for Development of Advanced Computing (CDAC), B-30 Noida - 201307, Uttar Pradesh, India; [email protected], [email protected] 2 Computer Science and Engineering, Centre for Development of Advanced Computing (CDAC), B-30, Noida - 201307, Uttar Pradesh, India; [email protected] 3 Department of Computer Engineering, YMCA University of Science and Technology, Faridabad - 121006, Haryana, India; [email protected] 4 Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi -110063, India; [email protected] 1

Abstract Background/Objectives: The research aims to explore differences among various ontology development tools, its languages; finally developed and visualized ontology on specific domain. Methods/Analysis: Railway Enquiry System (RES) ontology is being developed with the help of Protege tool and visualized using TGViz tab. It involves creation of various classes and their instances so that a person can find references to its query. Findings: The following manuscript makes readers aware of concept of Semantic Web because the search performed by today’s search engines is based on keyword extraction technique which leads to irrelevant and incomplete results marked with low precision and high recall. Developed ontology depicts real world scenario of railway reservation system. With this ontology, a person can check its seat availability, train fare details, PNR status and many more. Improvements/Applications: The given ontology can be extended to develop railway tracking web based application using Web Ontology Language (OWL) and Semantic Web Rule Language (SWRL).

Keywords: Ontology, Ontology Tools and Languages, Protege, Semantic Web

1. Introduction

generation of web and known as Social and Read/Write web2. Web 3.0 is considered as third generation of web and is known as Semantic Web (SW)3. Till this, machines are not clever as they perform tasks on basis of user input requirements. Web 4.0 is fourth generation of web and is known as Symbiotic Web. It will make machines to think in an intelligent way by reading contents of web and producing that information which loads the website faster4. In order to increase degree of relevance, there is need to move towards Semantic Web (web 3.0) and ontology. In broad terms, Semantic Web is known as Global Information Mesh which consists of annotated documents represented in language friendly to humans as well as machines. It curtails the gap between humans and machines. Ontology represents relationship among classes, properties and

World Wide Web (www) is a distributed repository of millions of documents which covers wide range of multidisciplinary information; to extract and retrieve particular information among these documents is a cumbersome job. There are two confusing terms associated with extraction and retrieval. Information Retrieval specifies retrieving information from millions of documents irrespective of documents are relevant or not while Information Extraction specifies extraction of information from relevant documents. WWW is the largest information construct that has gained various advancements ranging from web 1.0 to web 4.0. Web 1.0 is first generation of web that is read only and static web1. Web 2.0 is second *Author for correspondence

86

Development and Visualization of Domain Specific Ontology using Protege

instances in hierarchical fashion. Table 1 illustrates the differences among various generations of web. The paper is organized as follows: Section 2 presents brief information about Semantic Web and its layout. Section 3 explicitly defines ontology ranging from its components to development tools and languages. In addition to this, a comparative study has also been described among various development tools and languages. Section 4 presents case study on Railway Enquiry System (RES) and its ontology is being developed with the help of Protege tool.

• XML- It stands for Extensible Markup Language that consists of namespaces and schemas to define structure of data on web. • Resource Description Framework (RDF) - It is used for describing information in form of data models which in turn consists of triples viz. Subject, Predicate and Property. Example of RDF is given in Figure 1 • RDFs - It stands for RDF Schema that acts as vocabulary language to represent and inference RDF data models. • Ontology - It is defined as set of terms used to describe given domain and derive inferences from it. • Logic and Proof - In this layer, agents can make inferences in finding requirements of given resources with the help of inference systems6. • Trust - It signifies assurance and degree of loyalty to information7

2. Semantic Web (SW) The idea of SW was given by the inventor of www-Tim Berners Lee in 1996 that targets to convert present information into machine friendly language5. In simple words, it is termed as repository of information and languages involved for presenting such information.

2.1 Architecture Its layout consists of following components: • Unicode and URI - Unicode represent each character uniquely and provide intellectual style while URI is Uniform Resource Identifier that represents data in syntactical format. Figure 1. Example of RDF.

Table 1. A comparison among various generations of Web S.No

1.

2. 3.

2

Web 1.0.

Web 2.0.

Reading

Reading/ Writing

Web 3.0.

Read-writeexecute or Read-write-execportable concurrency personal web Focus on lifestream

Focus on communities and lifestreams.

RDF, RDFs, OWL

Middleware (WebOS)

Focus on Focus on companies communities HTML

XML, RRS, Wikis

Web 4.0.

Web Smart applications applications

Middleware and parallelized services

4.

Web forms

5.

Netscape

Google, Wikipedia

Dbpedia

--------

6.

It is like crawling

It is like walking

It is like running

It is running in highly supervised and intelligent way under supervision.

Vol 9 (16) | April 2016 | www.indjst.org

Figure 2. Stack Venn diagram of Semantic Web Architecture7.

87


Usha Yadav, Gagandeep Singh Narula, Neelam Duhan, Vishal Jain and B. K.Murthy

3. Ontology

range, discussed above, of the properties associated with any concepts.

The word Ontology is derived from two Greek words – onto that means “being” and logia which means “written or spoken discourse”. Ontology has wide range of definitions ranging from philosophy to artificial intelligence. Ontology is abbreviated as FESC which means formal, explicit, specification of shared conceptualization8.

3.2. Basic Steps for Building Ontologies • Determine Scope:- It includes defining structure and values associated with ontology. • Consider re-using:- Recent ontologies can be re-used for defining schema of new ontology. • Enumerate terms:- Clearly specify all the terms that specifies domain and range of ontology in structured list. • Define taxonomy:- After specifying terms it is necessary to organize them in hierarchical fashion. If A is subclass of B, then every instance of A must be an instance of B. • Define properties:- It is most important step to organize the properties that link the classes while organizing these classes in a hierarchy. • Define facets:- The ontology will only require the expressivity provided by RDF Schema and does not use any of the additional primitives in OWL. • Define instances:- Ontologies are being used to organize sets of instances9.

3.1. Components of Ontology • A set of concepts These can be the nodes in the representation of o ntologies. • A set of properties Every node or a concept or a class may or may not have properties related to it, properties can also be summarized as the values of the concepts. • A set of relational properties It implies relationship between two or more concepts or nodes. This generally generates a hierarchical way from one concept to another. • Hierarchy of concepts Sub concept/super concept relationships. • Hierarchy of properties Sub-property/super-property relationship. • A subset of symmetric properties It defines set of properties in a concept that have same values and same functionality. • Transitive property relation Transitive relation is defined as, if property A is related to property B and property B is related to property C then property A will be necessarily related to property C. • Symmetry and Inverse Symmetry relations among properties • Domain values related to properties • It defines the class n the level of the properties; concepts that share same property values have same domains. • Range values related to properties • Range is a characteristic of the concepts, which can be an interval, a list of elements or simply a character. • Minimum and Maximum cardinality for each concept-property pair • In Set theory cardinality is said to be the number of elements in a set, in this concept cardinality is a p ositive number that is associated with each concept and showing that how many properties are associated with that concept. Maximum and minimum cardinality is the


3.3 How to use Ontology Usage of ontologies depends on number of levels assigned. Level 1: As vocabulary language for interacting among multi agents in distributed scenario. Represented as database schema that holds Level 2: information about classes, properties and instances in it. Data can be retrieved easily from database by accessing its schema. Table 2. Steps for construction of ontologies i. Determine Scope ii. Consider Reuse iii. Enumerate Terms iv. Define Taxonomy v. Define Properties vi. Define Facts vii. Define Instances viii. Check for Anomalies

88


3


Level 3: As knowledge base that is created after deriving inferences rules in given ontology. Level 4: For handling complex queries and datasets. Level 5: Standardization • Standardization of structure of ontology. • Standardization of concepts hierarchy. • Standardization of domain ontology c omponents. • Standardization of tasks performed on o ntology. Level 6: For integration of ontologies to different systems like knowledge management, ERP systems, E-learning and many more.

Interchange Language. It is used for achieving s emantic interoperability among various resources. • CycL15: - It is one of formal languages that use p redicate logic to define concepts in domain. It comes under category of generic ontologies.

3.5 Ontology Development Tools In general, ontology development includes phases like specification, design and formalization phases. All these phases are treated as SDLC phases16. Table 4 lists differences among various ontology editors17. Table 4. A comparison among various ontology editors

3.4 Ontology Development Languages Following are types of ontology languages used in Semantic Web. • LOOM10:- It is one of knowledge representation languages that is based on description logics and rules to build concepts automatically. • SHOE11:- It is used to extract relevant information from web documents. It also combines knowledge representation data and ontological features. • OML12:- It stands for Ontology Markup Language that is treated as extension of SHOE. • XOL13:- It stands for Ontology Exchange Language that is based on XML and used for development of ontologies in any tool. • DAML+OIL14:- DAML stands for DARPA Agent Markup Language and OIL stands for Ontology

Yes

No

yes

No

yes

Instance attributes

Yes

yes

yes

yes

yes

No

yes

yes

Yes

yes

n-ary relations

yes

Yes

yes

No

No

Cardinality constraints

yes

No

No

No

yes

Concept instances

yes

yes

yes

yes

yes

Rules

yes

yes

yes

no

no


Owner / Developer

Adaptiva

–

Sheffield University

Knowledge Acquisition

Java

Yes

Altova

OWL+RDFS Editor

Java

No

2.2

Knowledge Management Research Group

Concept Browser

Java

Yes

HOZO

5.01

Osaka University

Role concept; Userfriendly

Java

Yes

OWL Editor

0.2.0.36

Model Futures

Tree-based

Other

Yes

OntoTrack

–

Ulm University

Fast browsing & Easy editing

Java

Yes/No

OWL-S Editor

23

Linkoping Semantic University Web Services

Java

Yes

Stanford Multiple Medical Inheritance Informatics

Java

Yes

Java

Yes

Java

Yes

Conzilla2

XOL DAML+OIL

Concept documentation.

Class attributes

4

LOOM SHOE OML

Version

Semantic2008 Works sp1 2008

Table 3. A Comparison among ontology development languages Features

Tool

FOSS (free Features / Primary open Limitation Language source software)

Protégé 3.4 beta

WebSWOOP 2.3 beta MINDSWAP browser look & feel Web Onto

89

–

Open University

Knowledge Modelling


Usha Yadav, Gagandeep Singh Narula, Neelam Duhan, Vishal Jain and B. K.Murthy

Besides this, there are various versions of Protégé like 2000, 3.1, 3.2, 3.4, 3.4 beta, 4.0, 4.0 beta and 5.0 desktop. Table 5 lists differences between most common versions of Protege 18.

Figure 5 displays references of given ontology like Fare Enquiry class is direct super-class of CLASS which further has instances AC Chair Car, First AC and so on. Figure 6 displays classes corresponding to RES ontology in form of graph by using TGViz tab. TGViz stands for Touch Graph Visualization tab that visualizes classes and instances in developed ontology.

4. Case Study The paper presents Railway Enquiry System (RES) ontology that describes terms involved in a railway reservation system. A person can see the train or can see the seat availability or also can see the fare, but a person can’t book the ticket. Developed ontology is partial (as it only shows the terms used in ontology) that describes real-world phenomena – Railway Enquiry System (RES).

4.2 Code Snippet RDF/XML source code
4.1 Screen shots Tool used: Protege 3.4 beta. It is created at Stanford University19 and acts as an open-source knowledge requisition system that is written in Java20. In Figure 3, Railway Enquiry System is marked as super class and it consists of various sub-classes like Fare Enquiry, Find Your Train, PNR Status and Seat Availability. Fare Enquiry class is futher divided into classes like CLASS,Concession, Train Number etc. Figure 4 displays slots of one of classes named CLASS under Fare Enquiry of RES ontology. It holds type of values in CLASS whether it is AC Chair, First AC, Second AC etc.

Figure 3. Super class-sub class hierarchy of RES ontology.

Table 5. Differences between 3.4.1 and 3.4 beta Features

Protégé 3.4.1

Protégé 3.4 beta

Compression Algorithm for Client server communication

Yes

No

Memory leaks in database mode

Yes

No

Inheritance of browser slot Patterns by subclasses

No

Yes

OWL file to OWL database conversion

Slow

Fast

Debug and performance

No

Yes

Support for Derby Database

No

Yes

Protégé script console support for manipulation of ontologies

No

Yes

Database inclusion

No

Yes


Figure 4. Slots class “CLASS” of RES ontology.

Figure 5. References of RES ontology.

90


5


Figure 6. Graph corresponding to RES Ontology using TGViz Tab.

]>

6


5. Conclusion and Future Scope Ontology is treated as main constituent of Semantic Web that allows explicit well defined understanding of concepts among agents and analyzes domain knowledge. The paper firstly describes evolution of www from web 1.0 to web 4.0. Concept of Semantic Web and ontology is being described. In addition to this, d ifferences among various ontology development tools and languages are listed. Lastly, the paper presents case study on Railway Enquiry System (RES), defines its classes, properties and instances by developing o ntology on Protege 3.4 beta and visualizing it using TGViz tab. As a future work, knowledge can be extracted from developed ontology by importing in any IDE like Eclipse, NetBeans and IntelliJ etc. with the help of some open source framework like Jena and Sesame. A user GUI can be designed which helps in document classification21 as well as promoting E-learning with the help of Semantic Web technologies22.

6. References 1. Getting B. Basic Definitions: Web 1.0, Web.2.0,Web 3.0”. 2007. Available from: http://www.practicalecommerce. com/articles/464-Basic-Definitions-Web-1-0-Web-2-0Web-3-0 2. Murugesan S. Understanding Web 2.0. Journal IT Professional. 2007 Jul-Aug; 9(4):34–41.

91


Usha Yadav, Gagandeep Singh Narula, Neelam Duhan, Vishal Jain and B. K.Murthy 3. Palmer SB. The Semantic Web: An Introduction. 2001 Sep. Available from: http://infomesh.net/2001/swintro 4. Hemnath. Web 4.0 - A New Web Technology. 2010 Dec. Available from: http://websitequality. blogspot. com/2010/01/web 40-new-web-technology.html 5. Lee TB. The Semantic Web. Scientific American; 2007 May. 6. Greenberg J, Sutton S, Campbell DG. Metadata: A fundamental component of the Semantic Web. Bulletin of the American Society for Information Science and Technology.2003 Apr/May; 29(4):16–8. 7. Lee B, Lassila J. Ontologies in Semantic Web. Scientific American; 2001 May. p. 34–43. 8. Singh S, Jain V. Information Retrieval (IR) through Semantic Web (SW): An overview. Proceedings of CONFLUENCE 2012 - The Next Generation Information Technology Summit at Amity School of Engineering and Technology; 2012 Sep. p. 23–7. 9. Antoniou G, Von F. A Semantic Web Primer. London, England: The MIT Press Cambridge; 2000. 10. MacGregor R. Inside the LOOM classifier. SIGART bulletin. 1991; 2(3):70–6. 11. Luke S, Heflin J. SHOE 1.01 Proposed Specification. Parallel Understanding Systems Group. 2000 Feb. Available from: http://www.cs.umd.edu/projects/plus/SHOE/ spec1.01. html 12. Kent RE. Conceptual Knowledge Markup Language. Proceeding of the Twelfth Workshop on Knowledge Acquisition, Modelling and Management; Banff, Alberta, Canada.1999 Oct. 13. Karp R, Chaudhri V, Thomere J. XOL. An XML-based Ontology Exchange Language (version 0.4). Artificial Intelligence Center. SRI International: 1999 Aug.


14. Horrocks F, Van H. Reference description of the DAMLOIL (March 2001) Ontology Markup Language. 2001. Available from: http://www.daml.org/2001/03/ reference.html 15. Corcho O, Fernandez-Lopez M, Gomez-Perez A. Methodologies, tools and languages for building ontologies. Where Is Their Meeting Point? Data and Knowledge Engineering. 2003 Jul; 46(1):41–64. 16. Singh G, Jain V, Singh M. Ontology development using Hozo and semantic analysis for information retrieval in Semantic Web. ICIIP-2013 IEEE Second International Conference on Image Information Processing; Jaypee Univ. Shimla. 2013 Dec. p. 9–11. 17. Suresh K, Malik SK, Rizvi SAM. A case study on role of ontology editors in National Conference on Advancements in Information and Communication Technology (NCAICT). CSI, Allahabad. 2008 May 15-16. 18. Sivakumar R, Arivoli PV. Ontology visualization protege tools - A Review. IJAIT. 2011 Aug; 1(4):1–11. 19. Slimani T. Ontology development: A comparing study on tools, languages and formalisms. IJST. 2015 Sep; 8(24):1–12. Doi: 10.17485/ijst/2015/v8i34/54249. ISSN 0974-5645. 20. Jain V, Singh M. Ontology development and query retrieval using protege tool. IJISA. Hong Kong. 2013 Aug; 5(9):67– 75. ISSN No. 2074-9058. Doi: 10.5815/ijisa.2013.09.08. 21. Ramana AV, Reddy EK. OCCSR: Document classification by order of context, concept and semantic relations. Indian Journal of Science and Technology. 2015 Nov; 8(30):1–8. Doi no:10.17485/ijst/2015/v8i1/75398. 22. Manickasankari N, Arivazhagan D, Vennila G. Ontology based Semantic Web technologies in E-learning environment using protege. Indian Journal of Science and Technology. 2014 Oct; 7(S6):64–7. Doi no: 10.17485/ ijst/2014/v7iS6/54591.

92


7

INTERNATIONAL JOURNAL FOR RESEARCH IN EMERGING SCIENCE AND TECHNOLOGY, VOLUME-3, ISSUE-1, JAN-2016

E-ISSN: 2349-7610

A Survey of Intrusion Detection Systems and Secure Routing Protocols in Wireless Sensor Networks Prachi Dewal1, Gagandeep Singh Narula2 and Vishal Jain3 1

Prachi Dewal

Research Scholar, M.Tech (IV Sem IT), C-DAC Noida, India [email protected] 2

Gagandeep Singh Narula and 3Vishal Jain

2

Research Scholar, M.Tech (IV Sem CSE), C-DAC Noida, India 2 [email protected] 3 Assistant Professor, Bharati Vidyapeeth‟s Institute of Computer Applications (BVICAM), New Delhi, India 3 [email protected]

ABSTRACT Wireless Sensor Networks are typically bridging the gap between cyber-world of computing and communications in the physical world. The proliferation of such digital devices in ecosystem has resulted in prominent security concerns and management complexity issues. The attacks against any wireless ad-hoc and peer-to-peer networks can also be adapted into powerful attacks against wireless sensor networks. Many routing protocols for Wireless Sensor Networks have been proposed, but very few of them have been designed keeping security in mind. In this paper, various types of attacks in sensor networks and few Intrusion Detection Systems are surveyed and presented briefly. Furthermore, this work also discusses few secure routing protocols with Intrusion Detection mechanism for wireless networks that further helps researchers to develop the modified and new routing schemes. Index terms- Wireless Sensor Networks (WSN‟s), Sensor Attacks, Intrusion Detection and Routing Mechanism.

1. INTRODUCTION

sensor nodes. Such individual nodes are capable of sensing

Wireless Sensor Networks (WSNs) are a new member of

their environments processing the information data, and

wireless networks‟ family and have been recognized as one of

sending data to one or more collection points in a WSN [1].

the most evolving technologies of the 21st Century.

Wireless Sensor Networks (WSNs) have appealing features

The nodes of WSN have special characteristics and are

like low installation cost, unattended network operation. In

deployed for a series of specific needs. Sensor nodes collect

WSNs the collected data is sent to the sink nodes so Quality of

information such as temperature, pressure, humidity, noise,

Service (QoS) parameters like delay limit, packet loss, etc. is

light, etc. from the areas where they are located. So, the

application dependent and acceptable within certain limits.

Wireless Sensor Networks comprised of wireless nodes to

For the applications where stability and confidentiality has

monitor physical or environmental conditions, such as sound,

prime importance, the security of such networks is a major

temperature, motion and now behaviors also. A mobile ad-hoc

concern. It is difficult in WSNs, because to monitor the

network (MANET) is a mobile wireless network of

information flow there is no gateways or switches. Therefore,

communication

for

any kind of intrusions should be detected before intruders can

communication among them without the use of any preset

harm the network and information in order to operate WSNs in

infrastructure. The communicating nodes can be wireless

a secure way. In wireless sensor networks, the attack detection

nodes,

which

have

to

cooperate

93

VOLUME-3, ISSUE-1, JAN-2016

COPYRIGHT © 2016 IJREST, ALL RIGHT RESERVED

16


E-ISSN: 2349-7610

problem is defined as identifying an intruder, which is an

WSN. Finally, the paper is concluded with conclusion and

attacker who has injected false or repeated packets into the

remarks for future work in Section 5.

network and has control over the nodes [2]. Intrusion in a network can either be achieved passively or actively.

Intrusion Detection Systems (IDSs)

information

to

the

other

supportive

2. ATTACKS AND SIGNS OVER WSN

provides

systems

2.1. Attack Classification

about

In wireless sensor networks, a contender may inject malicious

identification of the intruder, location of the intruder, date/time

nodes into the communication network and introduce various

of the intrusion, layer where the intrusion occurs, type of

attacks. These nodes are called compromised nodes. In passive

intrusion activity, type of attack. Such information would be

attacks, the attackers are hidden and tap the communication

helpful in designing the solution for remedial measures against

link to collect data and

intrusion. So, IDSs plays important role for network security.

damage the functioning of the network. Such attacks are

The routing protocols in WSNs might differ as per the network

eavesdropping, node malfunctioning, node tampering /

architecture and application. The major categories include

destruction and traffic analysis types. In active attacks, an

data-centric, hierarchical and location-based protocols. In

opponent actually affects the function in the attacked network.

some protocols energy awareness is an essential consideration

Such attacks may degrade and terminate networking services

and security in few.

and can be detected. Several classifications of Active attacks

In the rest of the paper, next section briefly classifies the

exist in the literature.

attacks and their symptoms / signs for WSN, on basis of

Table 1: Summary of Attacks over WSN

various layers and its category. Section 3 presents the survey

Generally active attacks are: Network jamming, Denial-of-

of IDSs for WSNs. Section 4 briefly discusses few secure routing protocols with Intrusion Detection

Service, black hole attack, wormhole attack, flooding and

mechanism for

Sybil. Classification of attacks for WSN, on basis of various

Security Attacks

layers and attack types, is done and shown in table 1.

Layers Active Physical

Passive

Jamming

Eavesdropping

Data

MAC and

Traffic analysis,

Link

WEP targeted

monitoring

Others

The authors in [5] categorized network layer attacks in Sensor

Physical

Networks. Few attacks within each layer and their signs are

tampering

defined below:

2.2. Description of Attacks and their Signs Wormhole, Black hole,



Byzantine,

In this attack, a harmful node purposefully tries to interfere

Information,

with transmission and reception of communication. To disrupt

Routing Network

attacks, Location

Jamming

Snooping

network communication malicious node emits random signal

Sybil

of high strength.

disclosure, resource



consumption, IP Spoofing,

In routing attacks, attacker aims at route discovery or

Selective

maintenance phase maliciously. The attacker does not follow

forwarding

the specification of the routing protocols. Such attacks

Session Transport

Application

Routing Attacks

hijacking,

includes flooding attacks such as Hello flooding, RREQ, and

SYN flooding

ACK flooding. Others are route table overflow, route cache

Repudiation,

poisoning, and route loop that target the route discovery phase.

Viruses, Data

Certain attacks are targeted for particular routing protocols.

Corruption

Here only Hello flood attack is defined. 94



16






Hello flood

E-ISSN: 2349-7610

IP spoofing

In this attack, a malicious node sends hello packets by

In such attacks a source IP address is forged and is used to

more powerfully than a general sensor node. The receiving

create a one-way connection with a remote host with the

nodes may assume that they are within the range of sender and

purpose of executing malicious code at the host. This attack

keep on forwarding their packets through this malicious node.

can be particularly hazardous if there exists relationship

The packets forwarded will be lost since they are not even

between the host and the victim‟s machine.

reaching to malicious node.  

Traffic analysis

Selective forwarding

In this attack, the malicious node captures and investigates the

In this attack a node selectively forwards some

messages in order to deduct the information from various patterns in communication.

packets and drops certain packets. That node is malicious node. 



Wormhole Attack

Denial of Service (DoS)

In this attack, an attacker collects the packets at one point

In this attack, the malicious node attempt to make resource

in the network, „tunnels‟ them to another point in the network,

unavailable to its users. DoS attacks can be attempted on

and then replays them into the network from that point [6]

multiple layers with different motives. Usually it is the intensive effort of a person or persons to prevent a Website or



web service from functioning with full efficiency. The effects

Black Hole Attacks

In this attack malicious node uses the routing protocol to

can be temporary or indefinite. The sites such as banks, credit

advertise itself being the shortest path to the destination node

card payment gateways, and even root name servers are targets

even though that is a false route. These nodes would drop all

for such attacks. SYN flooding attack is example of DoS

the data packets received that they need to forward during

attack at Transport Layer [9].

whole simulation. Such attack can be made by a single black 

hole node or in Collaboration by such nodes.

Impersonation

An attacker puts himself between the sender node and the 

receiver node and sobs information which is sent between two

Sinkhole Attacks

Sinkhole attacks are designed at network layer to attract

ends. Also, the attacker may masquerade as the sender node to

traffic to the malicious nodes so they can carry out malicious

communicate with the receiver node, or masquerade as the

operations. In this attack, malicious nodes make all nodes to

receiver node to reply to the sender node.

believe that they have the shortest paths to the base station, so 

that all nodes will forward packets to the malicious node.

Replay attack

In this attack, identification or authentication information is 

stored by means of some similar or copied web page, and then

Sybil Attacks [8]

In this attack, the malicious node takes more than one ID by

transmitted back to trick the receiver into operations which are

creating similar IDs randomly or takes the IDs of other legal

unauthorized like duplicate transaction in favor of attacker.

nodes. One malicious node has more than one ID. The node

Phishing attacks can be kept in this category.

can broadcast several times with high power, and other nodes will think of it as cluster-head and join in the cluster.

3. INTRUSION DETECTION SYSTEM

Therefore, the chances of malicious node being chosen as

Intrusion Detection Systems provide information to the other

cluster-head increases. After being chosen as cluster-head, the

supportive structure in a security system: like identified

malicious node collects the data from all other nodes, to

intruder its location, time, date, type of intrusion activity, type

destroy the data to ultimately destroy the network.

of attack and the layer where the intrusion occurred. The 95



17


E-ISSN: 2349-7610

information provided by IDSs would be very helpful in

it is a specific kind of IDS that focuses on the accuracy of the

justifying the third line of defense in performing remedial and

data gathered rather than the security of the nodes or the links.

preventive measures against attacks. Thus, IDSs are vital for

Cooperative intrusion detection has been discussed in [14] and

network security. A centralized IDS architecture provides

the solution for this problem has been proposed. Local detector

sufficient protection against some security threats, while a

modules are attached with the node and they identify the

distributed ID is essential for some applications. [10]

intruder in a distributed way.

In [11] the differences between MANETs and WSNs are

For exposing the attacker, the authors presented necessary and

discussed. Detailed information about intrusion detection

sufficient conditions and an algorithm that is shown to work

system is provided. Along with a brief survey of intrusion

under a general threat model.

detection system proposed for MANETs, and the applicability

The Smart Detect WSN Team [15] presented the architecture

of those systems to WSNs is also discussed.. The advantages

of an intrusion detection application called as Smart Detect,

and disadvantages of various IDSs schemes are given.

for border and perimeter security. The implementation of

Generally IDSs are classified in two categories:- Network

Smart Detect is done on tiny Os. Implementation covers a suite

based IDS and Host based IDS[12]. In network-based IDS, the

of algorithms to address specific requirements of the intrusion

sensors are located at the end points (subnet borders) in

detection application. Its features offer power-optimality, low

network to be monitored. The sensor captures all traffic

cost, self-organization, reliable message delivery, security, low

coming to the subnet and senses all the packets for identifying

false alarm event detection, sleep-wake scheduling etc. A

malicious one. In a host-based system, software agent is there

prototype of Smart Detect has been implemented and tested in

in sensor node to monitor all activities of the host on which it

an outdoor environment comprising of 25-30 nodes.

is installed.

Lei Li et al. [16] presented an intrusion detection model based

The authors‟ [12] believes that the single firewall cannot solve

on hierarchical structure in wireless sensor network. The

the problem of intrusion entirely and an Intrusion detection

model employs many nodes with the joint collaboration of

system is helpful. A distributed IDS (DIDS) is presented in the

intrusion detection nodes with anomaly detection algorithm.

work. These co-operative agents are distributed across the

NS-2 has been used as a simulation platform. Although the

network for incident analysis and reporting any unintended

results are promising but such IDSs are not much reliable in

activity. But, finding correct attack detection ratio and

case, any node fails to fulfill its responsibility.

reporting to the firewall is left as future work.

The work [26] proposed Hybrid Intrusion Detection System (HIDS). It consists of anomaly detection module and misuse detection module. It aims to raise the detection rate and lower the false positive rate by using the advantages of misuse

Response Component

Monitor System

detection and anomaly detection. In first step, large numbers of packets are filtered by using the anomaly detection module,

Knowledge Base

and then in the second step, the misuse detection module is used to complete the whole detection process. With the help of rules, the anomaly detection module detects the normalcy of current behavior, by training the mode of normal behavior.

Sensor

The misuse detection module determines if the current

Detector

behavior is an attack, and the Back Propagation Network (BPN) is used to classify the attacks.

Figure 1: Intrusion Detection System Structure

4. SECURE ROUTING MECHANISM FOR A hidden Markov model is applied to detect any unusual

WSNs

activity in sensor network [13]. The algorithm proposed is statistical approach to find out any unusual readings. Therefore 96



18


E-ISSN: 2349-7610

This section discusses few secure routing protocols with

Today‟s networks including WSN are more prone to intrusions

Intrusion Detection mechanism for WSN. The IDS employed

and security threats. As the nodes of WSN have special

in routing protocols to detect attacks either becomes

characteristics and are deployed for a series of specific needs;

computationally intensive or computationally limited.

Stability, confidentiality and the security are of prime

The authors in [18] proposed two approaches to improve the

importance in such networks. A lot of research works in past

security of cluster based sensor networks. One was

focuses on developing IDSs for WSN to meet the security

Authentication based intrusion prevention and the other was

needs of this special network.

energy saving intrusion detection. In the first approach, several

In this article a survey of common attacks in WSNs and their

authentication mechanisms were used for two common packet

signs, of two categories (active and passive) are briefly

categories in generic sensor networks to save the energy of

discussed. The attacks are also classified on basis of

each node. In the second approach, several monitoring

networking layers.

mechanisms were used to monitor cluster heads and member

As a future work, performance of WSNs will be tested and

nodes according to the priority.

evaluated under attacks. Additionally performance of such

Shanshan Chen et al. [8] proposed a security mechanism

networks can be evaluated after applying various IDSs. The

against Sybil attack which is based on LEACH protocol. The

IDSs can be tested for detection accuracy. Furthermore, novel

work based on setting up of Sybil attack detection mechanism

methods for intrusion detection in WSNs will be studied to get

based on received signal strength indicator. The mechanism

more appropriate solutions against newer attack types.

starts up when the number of cluster heads are more than

REFERENCES

optimal threshold. network. The authors of [19] analyzed the unusual features of WSNs

[1] T. Hara, V.I. Zadorozhny, and E. Buchmann, “Wireless Sensor

and discussed the challenges for malicious nodes detection.

Network Technologies for the Information Explosion Era”, Studies in Computational Intelligence, vol. 278. Springer, 2010.

Also, a sinkhole attack detection algorithm is proposed for

[2] Yang Liu, Kai Han,“Behavior-based Attack Detection and

WSNs based on CPU usage analysis of each node. The CPU

Reporting in Wireless Sensor Networks”, Third International

usage of each sensor node is analyzed for consistency. The

Symposium on Electronic Commerce and Security, IEEE

basic idea is to detect the change-point of the CPU usage than

Computer Society, IEEE 2010

threshold. MATLAB is used to evaluate the performance of

[3] A. Stetsko, L. Folkman, and V. Matyas, “Neighbor-based

the proposed algorithm.

intrusion detection for wireless sensor networks,” Faculty of

The authors [24] classify various methodologies for various

Informatics, Masaryk University, May 2010. [4] Anli Yu, Keqiu Li, Wanlei Zhou, Ping Li, ”Trust mechanisms

intrusions against wireless industrial sensor networks. In this,

in

author proposed an intrusion detection protocol and intrusion

wireless

sensor

countermeasures”,

prevention protocol. Besides, adaption of both detection and

networks:

Journal

of

Attack Network

analysis and

and

Computer

Applications, Volume 35, Issue 3, May 2012.

symmetric cryptography techniques it spends less time

[5] Chris Karlof , David Wagner, “Secure routing in wireless

executing them.

sensor networks: attacks and countermeasures”, Ad Hoc

Xiao and Chen [23] proposed a cluster based WSN

Networks 1, 2003 Elsevier B.V.

architecture, three models have been proposed: (i) energy

[6] A. Perrig, Y-C Hu, and D. B. Johnson, “Wormhole Protection

prediction model, (ii) key management model, (iii) flow

in Wireless Ad Hoc Networks,” Technical Report, Dept. of Computer Science, Rice University, 2001.

prediction model are applied to detect attacks in cluster head

[7] Wazir Zada Khan, Yang Xiang, Mohammed Y Aalsalem, and

election phase, cluster formation and in traffic in routing phase

Quratulain Arshad, “Comprehensive Study of Selective

respectively to detect and prevent attacks at each phase. This

Forwarding Attack in Wireless Sensor Networks”, IJCNIS Vol.

study aimed at improving security of WSN routing protocol.

3, No. 1, Feb. 2011.

NS2 as simulation tool is used.

[8] Shanshan Chen, Geng Yang, Shengshou Chen, “A Security

5. CONCLUSION & FUTURE WORK

Routing Mechanism against Sybil Attack for Wireless Sensor

97



19


Networks”, International Conference on Communications and

E-ISSN: 2349-7610

[22] Nischay Bahl,Ajay K. Sharma,Harsh K. Verma, “ Impact of

Mobile Computing, IEEE Computer Society, IEEE 2010.

Physical Layer Jamming on Wireless Sensor Networks with

[9] TEODOR-GRIGORE LUPU, “Main Types of Attacks in

Shadowing and Multicasting” IJCNIS Vol. 4, No. 7, July 2012

Wireless Sensor Networks”, Recent Advances in Signals and

[23] Xiao Zhenghong, Chen Zhigang, “A Secure Routing Protocol

Systems.

with Intrusion Detection for Clustering Wireless Sensor

[10] Alvaro A. Cárdenas, Robin Berthier, Rakesh B. Bobba, Jun Ho

Networks”, International Forum on Information Technology

Huh, Jorjeta G. Jetcheva, David Grochocki, and William H.

and Applications, IEEE Computer Society IEEE 2010.

Sanders, “A Framework for Evaluating Intrusion Detection

[24] Sooyeon Shin, Taekyoung Kwon, Gil-Yong Jo, Youngman

Architectures in Advanced Metering Infrastructures ”,IEEE

Park, and Haekyu Rhy, “An Experimental Study of

Transactions On Smart Grid, Vol. 5, No. 2, Ieee 2014

Hierarchical Intrusion Detection for Wireless Industrial Sensor

[11] Ismail Butun, Salvatore D. Morgera, and Ravi Sankar, “A

Networks”, IEEE Transactions On Industrial Informatics, Vol.

Survey of Intrusion Detection Systems in Wireless Sensor

6, No. 4, November 2010.

Networks”, IEEE Communications Surveys & Tutorials, Vol.

[25] Ibrahim S. I. Abuhaiba,Huda B. Hubboub, ” Reinforcement

16, No. 1, First Quarter 2014.

Swap Attack against Directed Diffusion in Wireless Sensor

[12] Hongmin Cai, Naiqi Wu, “Design and Implementation of a

Networks”, IJCNIS Vol. 5, No. 3, March 2013. [26] K.Q. Yan, S.C. Wang, S.S. Wang and C.W. Liu, “Hybrid

DIDS”, IEEE 2010 [13] Doumit, S.S, Agrawal, D.P., "Self-organized criticality and

Intrusion Detection System for Enhancing the Security of a Cluster-based Wireless Sensor Network”, IEEE 2010

stochastic learning based intrusion detection system for

[27] Abdoulaye Diop, Yue Qi, Qin Wang, “ Efficient Group Key

wireless sensor networks," in Military Communications Conference, vol. no. 1 pp .609-614, IEEE 2003

Management

[14] I. Krontiris, Z. Benenson, T. Giannetsos, F. Freiling and T.

using

Symmetric

Key

and

Threshold

Cryptography for Cluster based Wireless Sensor Networks”,

Dimitriou, “Cooperative intrusion detection in wireless sensor

IJCNIS Vol.6, No.8, July 2014

networks”, Springer , pp. 263-278, 2009. [15] The SmartDetect WSN Team, “SmartDetect: An Efficient WSN Implementation for Intrusion Detection”, IEEE 2010 [16] Lei Li, Yan-hui Li, Dong-yang Fu, Ming Wan, “Intrusion detection model based on hierarchical structure in Wireless sensor networks”, International Conference on Electrical and Control Engineering, IEEE 2010 [17] Joyce Jose,M. Princy,Josna Jose, “ Integrity Protecting and Privacy Preserving Data Aggregation Protocols in Wireless Sensor Networks: A Survey”, IJCNIS Vol. 5, No. 7, June 2013. [18] Chien-Chung Su; Ko-Ming Chang; Yau-Hwang Kuo; MongFong Horng, "The new intrusion prevention and detection approaches for clustering-based sensor networks [wireless sensor

networks],"

in

Wireless

Communications

and

Networking Conference, vol.4, no., pp.1927-1932, IEEE 2005. [19] Changlong Chen, Min Song, and George Hsieh, “Intrusion Detection of Sinkhole Attacks In Large-scale Wireless Sensor Networks”, IEEE 2010. [20] Ibrahim S. I. Abuhaiba,Huda B. Hubboub, “ Swarm Flooding Attack against Directed Diffusion in Wireless Sensor Networks”, IJCNIS Vol. 4, No. 12, November 2012

[21] Luigi Coppolino, Salvatore D‟Antonio, Luigi Romano, and Gianluigi Spagnuolo, “An Intrusion Detection System for Critical Information Infrastructures Using Wireless Sensor Network Technologies”, IEEE 2010. 98



20

I.J. Modern Education and Computer Science, 2015, 6, 50-58 Published Online June 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijmecs.2015.06.08

Perspective of Database Services for Managing Large-Scale Data on the Cloud: A Comparative Study Narinder K. Seera Assistant Professor, Bharati Vidyapeeth‘s Institute of Computer Applications and Management, New Delhi (INDIA) Email: [email protected]

Vishal Jain Assistant Professor, Bharati Vidyapeeth‘s Institute of Computer Applications and Management, New Delhi (INDIA) Email: [email protected]

can readily provide. Cloud-based big data services offer significant advantages in reducing the overhead of configuring and tuning your own database servers. As the size of data is increasing exponentially, researchers have begun to focus on how ‗big data‘ can potentially benefit digital world of organizations [4]. Although managing the increasing complexity of data is critical to the success of the organization, new initiatives should deliberate on how to mine the information to generate high revenues from the businesses. The data driven decision making requires the use of excellent technologies to capture, store and efficiently process the big data, which is often unstructured [5]. Converting big data programs into successful activities that deliver meaningful business insight and provide sustained highquality customer relationships can be costly, risky and sometimes unproductive. There have been various techniques and algorithms devised for big data analytics [6]. Mining big data and applying effective algorithms to produce productive results is out of scope of this paper and can be found in the existing literature in this filed [7][8][9]. This survey intends to find out various existing systems for big data management, comparing their key features and the techniques they adhere to. The remainder of this paper is organized as follows – We begin in Section II with the motivation behind the survey. Related work is given in Section III. A detailed discussion on the existing cloud based database systems and different query processing techniques is presented in Section IV and V respectively. We discuss the opportunities and applications of big data in various fields in Section VI. At the end we conclude the paper with the prospect of future work in this field.

Abstract—The influx of Big Data on the Internet has become a question for many businesses of how they can benefit from big data and how to use cloud computing to make it happen. The magnitude at which data is getting generated day by day is hard to believe and is beyond the scope of a human‘s capability to view and analyze it and hence there is an imperative need for data management and analytical tools to leverage this big data. Companies require a fine blend of technologies to collect, analyze, visualize, and process large volume of data. Big Data initiatives are driving urgent demand for algorithms to process data, accentuating challenges around data security with minimal impact on existing systems. In this paper, we present many existing cloud storage systems and query processing techniques to process the large scale data on the cloud. The paper also explores the challenges of big data management on the cloud and related factors that encourage the research work in this field. Index Terms—Big Data, Query Processing, Cloud Computing, Distributed Storage

I. INTRODUCTION The problem with a traditional database management system starts when the quantity of data gets beyond the storage capacity of the disk, the queries start trouncing the CPU for resources and the result sets go out of RAM. The database systems need to be re-engineered using innovative technologies to handle this growing volume of information. Rise of such problems in the organizations facilitated the emergence of cloud data stores and big data. Big data [1][2][3] refers to high volume, velocity and variety of information asset which requires new forms of processing to facilitate enhanced decision making, insight discovery and process optimization. Big data may come from a variety of sources including web logs, business information, sensors, social media, remote device and data collected through wireless sensor networks. Big data needs a cluster of servers for its processing, which cloud Copyright © 2015 MECS

II. MOTIVATION OF THE SURVEY In early days of computing flat file systems were used to organize the data pertaining to the organization. But due to lack of standards and decentralization of data in flat files, database management systems came into

I.J. Modern Education and Computer Science, 2015, 6, 50-58 99

Perspective of Database Services for Managing Large-Scale Data on the Cloud: A Comparative Study

collection of key-value pairs. Graph models represent the data in tree- like structures with nodes and edges connecting each other through links (or relations). Fig 1 summarizes different categories of database management systems as relational and non relational.

existence. The main advantages offered by these systems are centralization of data, storing relationships between different objects or entities and easy retrieval of data from the databases. But these traditional relational database systems are not capable to processing large data sets like 10TB of data or hundreds GB of images. As a result nonrelational databases were evolved that can handle large scale data and can process terabytes or petabytes of data efficiently. They are also referred to as NoSQL databases [10 ]. The non-relational data model is not a replacement of relational data model; they are the alternatives of each other. The biggest difference between relational and nonrelational data models is the underlying data structures used by them. The relational model stores the data in tabular form and uses SQL language to fetch the data from databases. On the other hand, non-relational databases do not rely on the concept of tables and keys; they require data manipulation techniques and processes to handle unstructured or semi-structured big data. The performance of these databases can be evaluated on three main aspects – elasticity, fault tolerance and scalability. NoSQL (Not Only SQL) [11] is an approach to data management useful for very large sets of data which is dispersed over multiple nodes in the cloud. It encompasses a broad range of technology and architectures that seek to resolve the scalability and big data performance. A NoSQL database provides a mechanism for storage and retrieval of unstructured data that is modeled using some data structures other than the tabular structure used in relational databases. It provides finer control over availability and simplicity of design. NoSQL approach is especially useful when:     

Database Management Systems Non-Relational (NoSQL)

To handle unstructured data Relational

To handle structured data

Column Oriented

Document Oriented

Key Value Store

Graph Store

Fig. 1. Data models – Relational vs. NoSQL

Current studies reveal that many researchers have proposed different systems that provide scalable services for data management on the cloud and offer high data processing capabilities [12]. Here we present the essentials of cloud based database systems that are vital for both data storage and data processing. Requisites of data management and data processing in Cloud 

Data is unstructured Data is stored remotely on multiple virtual servers in the cloud. Constraints and validation checks are not required. Dealing with growing list of elements such as twitter posts, internet server logs etc. Storing relationship among data elements is not important.

  

The NoSQL databases can be roughly categorized in four categories i.e. Column oriented, document oriented, key value based databases and graph data stores. Column oriented databases store data in columns instead of rows. The goal is to efficiently read and write data from hard disk in order to speed up the time it takes to return the result of a query. There are two main advantages of this approach First, data is highly compressed that permits aggregate functions to be performed very rapidly. Second, data is self-indexing, so it uses less space. Document oriented databases stores semi-structured data in form of records in documents in less rigid from. Documents are addressed in the database via a unique key that represents the document. This key can be used later to access the document from the database. To speed up document retrieval, the keys are further indexed. Key-Value store is a non-trivial data model where the data is represented as a Copyright © 2015 MECS

51

  



High Performance–The workload is distributed among multiple nodes to perform high speed computing. Researchers have proved that adding more servers to process the data, linearly improves the performance. Scalability – The cloud systems are flexible enough to scale up and scale down to meet your demands [14]. Availability – of data Resource sharing - On-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) is highly desirable with minimal management effort or service provider interaction. Fault Tolerance- In case of failure of any node, the other available nodes should take over the control and do not let the whole system goes down. Aggregation of applications and data is also an important aspect of cloud systems. Elasticity [15] –The cloud database systems are capable to acclimatize the changes in the workload by wisely allocating and de-allocating the available resources on demand. Query Processing – Although we have various providers that provide databases services on cloud but most of them do not support all the features of SQL like joins and nested or complex queries. The cloud databases should be capable of handling users‘ SQL queries that may require results from


52






On the processing aspect, some database systems provide Databases-as-a-Service (DaaS) [24] capabilities with either full SQL query support or NoSQL support. Examples of systems that fully support SQL include Amazon Simple RDS, Cloudy and Microsoft SQL Azure. Popular NoSQL databases are SimpleDB [42], MemCacheDB, Voldemort, Cassandra [46] etc. And the companies that are using NoSQL are NetFlix, LinkedIn, Twitter etc. AmazonDB does not fully support SQL, it supports only a subset of SQL queries. Most NoSQL systems use MapReduce techniques [26] for parallel query processing [25] where the queries are divided into smaller set of operations and the answers of sub queries are joined back to get the end results. Although this reduces the query processing time by parallelizing the tasks and offer high scalability but it has certain limitations too. [27] and [28] identified some drawbacks of MapReduce techniques and proposed some alternative solutions to overcome them. Comparison between mapreduce systems and parallel databases [29][30] highlights following points:

different data sources. Representation of heterogeneous data [4] - Sheer size, huge volume and velocity are some of the terms that define Big Data. The main issue of cloud databases is to deal with such heterogeneous data – to design schemas for it and to provide services to manage and process it. Security and privacy [13][17] – As with other aspects of the cloud, high-security computing infrastructure is required to secure data, network and host in the cloud environment. [16] Physical security defines how one control physical access to the servers that support your infrastructure. To provide data security on someone else‘s server in cloud, two things can be done; either encrypt the sensitive and confidential data in the separate database or keep off-site backups. Network security can be implemented in terms of firewalls. EC2 provides security groups that define traffic rules that govern what traffic can reach virtual servers in the cloud. Host security describes how the host system is configured to handle attacks and how the effects of an attack on the overall system can be minimized. It is always good to have a complete suite of antivirus software or some other tools with no security holes.





The existence of a variety of systems and processing techniques to support big data and to analyze it; forms the foundation of this paper and motivated us to give a comprehensive overview of different kinds of cloud based databases systems with their key features.

 III. RELATED WORK 

A large work in the area of database systems has been already done by various researchers in this field. The work in the existing literature spans from query processing techniques to query optimization, storage systems to data access methods and from centralized systems to distributed data processing systems etc. Today hundreds of cloud based database systems exist which can be roughly categorized over two main dimensions – storage system aspect and query processing aspect. On the storage system aspect, the widely accepted file storage available on the cloud is distributed file storage that allows many clients to access same data simultaneously from remote machines. Two such widely accepted distributed file storage are GFS (Google File Storage [18]) and HDFS (Hadoop Distributed File System) [19]. Google App Engine [20][21] and Amazon EC2 [22], are two widely used data stores which are used to manage data through web applications. Application developers and many technologists from different industries are taking advantage of Amazon Web Services [23] to meet the challenges of storing digital information. Amazon Web Services offer an end-to-end range of cloud computing resources to manage big data by reducing costs and gaining a competitive advantage. Copyright © 2015 MECS



Map-Reduce systems do not take the advantage of indexing procedures for selective access of data. It is desirable to optimize the data modeling to improve data retrieval performance. Parallel databases systems use relational data model to manage the data where applications can use SQL programs whereas map-reduce systems are schema free and users are independent to code their own map and reduce functions. Map-reduce does not have any query optimization plans to minimize data transfer across nodes. This results in lower performance and low efficiency. Parallel systems are designed to minimize the query processing time by carefully distributing the task into available machines. On the other hand, map reduce systems are weak in load balancing. The partitioning and distribution of input data set among n number of nodes does not always guarantee equal share of work. Compared to parallel systems, map reduce systems are more fault tolerant and gracefully handles the node failure by assigning the process to another available node.

Many researchers have designed cloud based database systems to overcome the limitations of traditional systems such as managing huge volume of unstructured or semistructured data, storing data on distributed storage, processing the data present on multiple nodes (or parallel processing), providing high availability and scalability to the clients and few others. But most of them do not support the features of full relational data model and they lack the querying capabilities of powerful SQL language. In contrast, Parallel database systems are robust and highperformance data processing platforms designed to handle clusters with a static number of nodes but their main drawback is that they do not have elasticity feature I.J. Modern Education and Computer Science, 2015, 6, 50-58 101




which is an essential aspect of cloud based systems. These days distributed data processing has become a major requirement for all businesses and companies. Almost all major DBMS vendors offer distributed data processing to support various web services and new applications that run on cloud. The major reason for the exponential growth of distributed and parallel processing systems lies in the potential benefits offered by them such as managing large-scale data, providing scalable and high performance services to applications etc. Several cloud service providers provide platform-as-a-service (PaaS) [31] solutions that eliminate the need to configure databases manually, thus reducing maintenance work of the organization. The primary providers of such big data platform are Amazon, Google and Microsoft.

 

In this section we present some cloud based database systems with their key components and their features. These features form the basis of the survey and are summarized in Table 1 given at the end of the paper. epiC [32] is an elastic and efficient power-aware dataintensive cloud based system that supports both OLAP and OLTP jobs [33]. It has the potential of parallel systems that have processing engine as well as storage engine. It has three main components – the Query Interface, the Elastic Execution Engine (E3) and the Elastic Storage System (ES2).





The main function of Query Interface is to determine the type of query fired by the user and to monitor its processing status. If the query is an analytical query then it is forwarded to OLAP controller where they are processed via parallel scans and if it is a simple select query then the OLTP controller processes it using indexing and query optimization plans. Both OLAP and OLTP controller use underlying ES2 to provide transactional support. Elastic Execution Engine has a master node that takes the analytical tasks from OLAP controller and distributes the task among all the available nodes for parallel execution. Elastic Storage System has further sub modules for data partitioning, load adaptive replication, transaction management and managing secondary indexes.





SQLMR [34] is mainly designed to bridge the gap between traditional SQL like queries and the data processing capabilities of MapReduce. It combines the scalability and fault tolerance feature of MapReduce with easy to use SQL language. There are four main components of SQLMR – 



A Master Server – to assign tablets to tablet servers. A tablet is a dynamic partition of a row called a row range. The master keeps track of the workload of tablet servers and the tablets which are unassigned. Tablet Servers – to manage and handle read or write request to a set of tablets assigned to it. These tablet servers can be dynamically added or removed depending upon the workload. A library which is linked to every client.

Big Integrator [37] facilitates the queries to access data from both relational databases and cloud databases. To achieve this functionality it has two plug-ins called absorbers and finalizers. For each kind of data source the system has separate wrapper modules that generate queries for them. For example, the RDBMS wrapper accepts the query from the system and translates it into

SQL-to-MapReduce Compiler that inputs SQL queries from the users and translates them into a sequence of MapReduce jobs.

Copyright © 2015 MECS

Query Result Manager that maintains a result cache to store the results of the queries. When a query is fired, first the cache is examined to compare the results otherwise it is parsed to generate optimized MapReduce code. Database Partitioning and Indexing Manager that manages data files and indexes. The Optimized Hadoop System framework which is responsible for distributed processing of large data sets on clusters.

Distributed Key Value Store (KVS) [29] is another distributed data store on the cloud which uses a special data structure to store the data on multiple servers. It is composed of data items and associated keys that redirects read and write requests to the appropriate servers. The advantages offered by distributed KVS include high performance and speed, aggregation processing in which multiple data are aggregated to produce results and resistance to failure as multiple copies of the same data is kept at multiple servers. BigTable [35] is a distributed storage system to manage voluminous structured data with a simple data model. It does not support all the features of relational databases but tries to fulfill all major requirements of cloud systems such as high availability, performance and scalability. The well-known applications of BigTable are Google Earth, Google Finance, Personalized search, web indexing etc [36]. BigTable is basically a sparse and persistent multidimensional sorted map which is indexed by a row key, a column key and a timestamp. Column keys are grouped into sets called column families where each column family can have one or more named columns of same type. Each row in the table contains one or more named column families. Each cell in the BigTable contains multiple versions of the same data and each version is assigned a unique timestamp. BigTable uses Google‘s file system to store BigTable data, log and data files. It has three major components:

IV. CLOUD BASED DATABASE SYSTEMS



53


54


has further sub modules with defined functions – importer, absorbers, finalizers and interfaces, to carry out query processing. The results returned by different data sources are joined to get the end result.

SQL query which is sent back to the underlying Relational database. Similarly the BigTable wrapper converts the query into corresponding GQL query for BigTable data store. The BigIntegrator wrapper module

Table 1. Characteristics of existing cloud based systems Database Systems

Data Storage Used

Processing Engine Used

epiC [32]

ES2 (Elastic Storage System)

E3 (Elastic Execution Engine)

SQLMR [34]

Database Indexing Manager

SQL-toMapReduce Compiler

Distributed KVS [29] BigTable [35]

Distributed Data Manager Distributed Storage System (GFS)

BigIntegrator [37]

Optique [38]

Microsoft SQL Azure [41] Others like Mongo DB, Cassandra, DynamoDB [42][46][47]

Cloud databases relational databases Shared databases

&

&

Cloud based Storage system Cloud based Storage system

Query Language support SQL like language used

Data methodology

SQL programs translated into mapReduce jobs DISPEL used

Database partitioning & indexing used

GAE (Google Application Engine)

GQL (Supports query by row key)

BigIntegrator Query Processor

SQL / GQL

Asynchronous Execution Engine

 

Distributed keys are used to access values Row keys, column keys and timestamps used

Query Processing Technique For OLAP queries – parallel scans OLTP queries – indexing & local query optimization Query Result Manager used to cache results

Features

Aggregation Processing

High Performance & availability High Performance, Scalability and availability

----

SQL

OBDA (Ontology based data access)

T-SQL

Key value access

NoSQL

Key based access

----



RDBMS wrappers and BigTable wrappers Distributed query optimization & processing

Highly scalable

Highly Scalable

---Elastic & scalable query execution High availability &performance

---Map and reduce techniques

Highly scalable

Distributed Query Optimization and Processing component- that optimizes and executes the queries produced by the Query Transformation component.

Microsoft Azure SQL Database [41] provides relational database-as-a-service capabilities to application developers and users. The application code and database can reside on same physical location or in distributed environment with tightly coupled servers and storage systems. To make the system work efficiently, it is divided into three layers - the client layer, the services layer and the platform layer. The client layer is used by the application to interact directly with the database. The services layer act as a gateway between the client layer and the platform layer. It manages all the connections between the users‘ application and the physical servers that contain data. T he

Query Formulation component- to formulate queries. Ontology and Mapping Management component– to bootstrap ontologies and mappings during the installation of the system. Query Transformation component- that rewrites users‘ queries into queries over the underlying data sources,


Distributed secondary indexes – B+ trees & multidimensional indexes used

Algebraic expressions & access filters used

The BigTable wrapper has two components for server and clients. The BigTable wrapper server is a web application that accepts the client requests, translates it into GQL and returns the result to BigTable wrapper client. Optique [38] platform has a distributed query processing engine to provide execution of complex queries on the cloud. The system is developed to deal with two different scenarios. First, to process the queries that need to access terabytes of temporal data coming from sensors with data stored in relational databases. Second, to process the queries that access data from multiple databases with different schemas. For this purpose, it uses OBDA (Ontology Based Data Access) [39][40] architecture which has four major components: 

Access

platform layer is the main layer where the data actually resides. It consists of many instances of SQL Server.

Besides above mentioned cloud database systems there are several other systems that have been successfully adopted by various companies. Some of them include Amazon SimpleDB, CouchDB, MongoDB [42], Cassandra, Splunk, Apache HadoopDB etc. A precise I.J. Modern Education and Computer Science, 2015, 6, 50-58 103


55

This technique is organized around two functions – map and reduce. The ―map‖ function divides a large data set of input values into smaller data sets where individual elements are broken down into a number of key-value pairs. Each of these elements will then be sorted by their key and are sent to the same node, where a ―reduce‖ function is use to merge the values (of the same key) into a single result. As shown in Figure 2, in the splitting phase, a given input is first divided into three sets by three map functions map1, map2 and map3 where they are processed. Then the sort function sorts these smaller sets of input by their keys and send them to the next phase, called the merge phase, where the reduce functions merge the values of the same key into a single output.

comparison of the features and data models deployed by theses systems is briefed in Table1.

V. QUERY PROCESSING LANGUAGES AND TECHNIQUES In this section we will briefly discuss the types of query processing methods available on the cloud to manage big data. Map-Reduce techniques [25][27][32][44] – MapReduce is a programming paradigm that processes massive amounts of unstructured data in parallel across a distributed cluster of processors. It offers no benefit over non-distributed architectures.

sorting map1( ) reduce() Input Data

map2( )

Output Data reduce()

map3( )

splitting phase

merging phase

Fig. 2. Map-Reduce functionalities

SQL (Structured Query Language) –It is a most widely used and powerful query language designed to handle data in Relational Database Management Systems. It has been universally accepted by a wide range of traditional relational database systems as well as cloud based database systems. Some of the database systems that use SQL are Oracle, Microsoft SQL Server, Ingres, Simple DB etc. Although it is originally based on the notion of relation algebra and tuple calculus, it also has procedural capabilities to write complete programs to retrieve data from databases. There are two approaches to use SQL with applications : 1.

2.

Second, same query can process large number of records with almost same speed. Others - Even most database systems use SQL, most of them also have their own additional proprietary extensions that are used only on their systems. For example, Transact-SQL (T-SQL) is Microsoft's and Sybase‘s proprietary extension to SQL. Transact-SQL is an essential to Microsoft SQL Server. All applications that communicate with an instance of SQL Server do so by sending Transact-SQL statements to the server, despite of the user interface of the application. Besides SQL, T-SQL includes the concept of procedural programming, local variables, temporary objects, system and extended stored procedures, scrollable cursors, various other string handling functions, date processing and mathematical functions and changes to the delete and update statements etc. There are some other SQL-like languages like HiveQL which is purposely used by Hadoop [44][19] Hive [45] database system. This language also allows traditional map - reduce programmers to plug-in their custom map and reduce functions when it is not convenient to express the logic in HiveQL. It is also a primary data processing method for Treasure data. Treasure Data is a cloud platform that permits its users to store and analyze the data on the cloud. It manages its own Hadoop cluster, which accepts queries from users and executes them using the Hadoop MapReduce framework.

Call SQL commands from within a host language like C# or Java. For this purpose special APIs (Application Program Interface) are created such as in JDBC. EEmbedded SQL - Embed SQL in host language where a preprocessor converts the SQL statements into special API calls and then a regular compiler is used to compile the program.

GQL (Google Query Language) – It is a SQL-like language which is used to fetch entities and keys from ‗Google Application Engine‘ (GAE) [20][21] data store. It does not support ‗joins‘ due to two main reasons: First, it becomes inefficient when queries span more than one machine. Copyright © 2015 MECS


56


Secondly, to influence large amounts of consumer data across multiple service delivery channels to uncover consumer behavior patterns. Third, to improve enterprise transparency and audit-ability. And lastly, to deal with the acute stress of economic uncertainty and seeking new revenue opportunities.

VI. OPPORTUNITIES AND APPLICATIONS OF BIG DATA Today almost all organizations use Big Data for business intelligence to analyze inside and outside business data to perform risk assessment, brand management, customer acquisition and many other managerial tasks [4]. Besides all these activities Big Data has also proved its importance in various other fields. In the field of Astronomy, with the advent of Sloan Digital Sky Survey [48], the astronomer‘s job is greatly influenced from taking pictures of the astronomical bodies to find and analyze interesting objects and patterns in the sky. SDSS is one of the most striving and influential surveys in the history of astronomy that obtained deep, multi-colored images of galaxies, stars, quasars etc and created 3-dimensional maps of those captured objects. It has greatly influenced the field of Biological Sciences or bioinformatics [49]. Genetic research is now driving towards Big Data to seek solutions for acquiring and generating volume of DNA-sequence that is the basic genetic study. With suitable analytical tools, genetic research may answer everything like curing cancer, developing superior crop varieties and increased treatment efficiency and medical advancements. Big Data has also marked significant benefits in the field of Research and education [50][51]. Educating individual student is one of the biggest advantages of technology and big data help teachers personalize learning. Using Big Data for data-driven teaching increases transparency and accountability in evaluating trend in education. Digital systems facilitate real time assessment for mining information. Students can be given personalized quizzes and lessons that try to find their weaknesses. The answers can be analyzed to track whether students have mastered the concepts. A pattern of wrong answers give clues on why students selected the incorrect answers. It allows teachers to mine learning patterns of the students and pinpoint the problem areas and enables them to do a better job. The field of The Urban planning [54][55] or geographical sciences is also not untouched with the potential benefits of Big Data. Big data produced by so many places and processes (or Geographic data) contain either explicit or implicit spatial information which can be mapped and analyzed to provide new insights to urban developments. Geo-located information contains data accessed from sensors, social networking sites, online booking of various transport modes, mobile phone usage, online credit and debit card transactions etc. One of the major applications of Big Data in the field of geographical sciences is in the transport sector which aims to control congestion in road traffics so as to control pollution and accidents. Talking about Financial systems [50][56][57], banking and finance industries are also taking a business-driven and realistic approach to big data. For Financial institutions big data is imperative due to four main reasons. First, companies require larger market data sets and fine granularity of data to make their forecasts. Copyright © 2015 MECS

VII. CONCLUSION & FUTURE SCOPE In this paper we compared the features of various distributed database systems that use underlying cloud storage to store data and parallel database systems which work on clusters of nodes based on shared nothing architecture. We also compared the approaches followed by various relational and NoSQL database management systems in terms of their query processing strategies and found that many systems exist that aim to fill the gap between SQL queries and map reduce paradigm by converting users‘ queries into map-reduce tasks. We concluded that although map reduce technique speeds up the query processing by parallelizing the tasks and provide scalability, it provides limited performance capabilities over different class of problems and offer few facilities over ad-hoc queries. It neither uses any specific data model to manage data nor does it use any indexing method to access data, therefore how to increase the efficiency of map-reduce in terms of I/O costs is still a research problem that needs to be addressed. Precisely, different systems target different design problems and opt different processing methods but realizing the potential of parallel systems with map-reduce programming is a new hope for the next decade. We also discussed about cloud computing which is a perfect match for Big Data as it provides high availability of resources and scalability. The technology innovations in the field of data mining are remarkable and will continue to prevail in the next generation of computing. The demand is arising to devise effective algorithms to extract knowledge and interested patterns from big data. The extensive exploitation of Big Data and data mining in digital world will definitely harmonize the growth of each other to become a dominant technology of the future. REFERENCES [1]

[2]

[3]

[4] [5] [6]

SteveLaValle, Eric Lesser, Rebecca Shockley, Michael S. Hopkins and Nina Kruschwitz, ―Big data, Analytics and the Path from Insights to Value‖, December 2010. James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, Angela Hung Byers, ―Big data: The next frontier for innovation, competition, and productivity‖, May 2011. Divyakant Agarwal, S.Das, S.E. Abbadi, ―Big Data and Cloud Computing: Current State and Future Opportunities‖ EDBT 2011, March 22–24, 2011, Uppsala, Sweden. Arup Dasgupta, ―Big Data-The future is in Analytics‖ published in Geospatial World April 2013. Divyakant Agrawal, Elisa Bertino, Michael Franklin, ―Challenges and Opportunities with Big Data‖. Van Renesse, R., Birman, K.P., Vogels, W.: Astrolabe: A robust and scalable technology for distributed system



[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21] [22] [23] [24]

monitoring, management, and data mining. ACM Trans. Comput. Syst. 21(2) (2003) A.N.Paidi, ―Data mining: Future trends and applications‖, International Journal of Modern Engineering Research, vol 2, Issue 6, Nov-Dec 2012, pp 4657-4663. Venkatadri M., L.C.Reddy, ―A review on data mining from past to the future‖, International Journal of Computer Applications (0975-8887), Volujme 15, No. 7, Feb 2011. Hans-Peter Kriegel, Karsten M. Borgwardt, Peer Kröger, Alexey Pryakhin, Matthias, Schubert, Arthur, ―Future trends in Data Mining‖, Springer Science+Business Media, LLC 2007 Katarina Grolinger, Wilson A Higashino, Abhinav Tiwari and Miriam AM Capretz, ―Data Management in Cloud environments: NoSQl and NewSQL data stores‖ Journal of Cloud Computing: Advances, Systems and Applications 2013, pp. 2-22. Phyo Thandar Thant, ―Improving the availability of NoSQL databases for Cloud Storage‖ available online at http://www.academia.edu/4112230/Improving_the_Availa bility_of_NoSQL_Databases_for_Cloud_Storage. A. Pavlao, E.Paulson, A. Rasin, D.Abadi, S.Madden, M.Stonebraker, ―A Comparison of approaches to largescale data analysis‖ SIGMOD‘09, June 29–July 2, 2009, Providence, Rhode Island, USA. R. Gellman, ―Privacy in the clouds: Risks to privacy and confidentiality from cloud computing”, Prepared for the World Privacy Forum, online at http://www.worldprivacyforum.org/pdf/WPF Cloud Privacy Report.pdf,Feb 2009. Pawel Jurczyk and Li Xiong, ―Dynamic Query Processing for P2P data services in the Cloud‖. Emory University, Atlanta GA 30322, USA Ioannis Konstantinou, Evangelos Angelou, Christina Boumpouka, Dimitrios Tsoumakos, Nectarios Koziris, ―On the Elasticity of NoSQL Databases over Cloud Management Platforms (extended version)‖, CIKM Oct 2011, Glasgow UK. W. Itani, A. Kayssi, A. Chehab, ―Privacy as a Service: Privacy-Aware Data Storage and Processing in Cloud Computing Architectures,‖ Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, Dec 2009 M. Jensen, J. Schwenk, N. Gruschka, L.L. Iacono, ―On Technical Security Issues in Cloud Computing‖, IEEE International Conference on Cloud Computing, (CLOUD II 2009), Banglore, India, September 2009. S. Ghemawat, H. Gobioff, and S.-T. Leung, ―The google file system,‖ in Proceedings of the nineteenth ACM symposium on Operating systems principles‖. D.Borthakur. ―The Hadoop Distributed File System: Architecture and Design”, Apache software Foundation, 2007. Google Inc. Google App Engine. [Online] 2010. [Cited: 07 17, 2010.] http://code.google.com/intl/deDE/appengine/ Severance, C. Using Google App Engine. Sebastopol: O‘Reilly Media, 2009. Daniel J. Abadi, ―Data Management in the Cloud: Limitations and Opportunities‖, IEEE 2009. Jinesh Varia, ―Cloud Architectures”, Amazon Web Services, June 2008. C.Curino, E.P.Jones, R.A.Popa, N.Malviya, E.Wu, S.Madden, H.Balakrishnan, N.Zeldovich, ―Relational Cloud: A Database-As-A-Service For The Cloud‖.


57

[25] R. Chaiken, B. Jenkins, P.A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou, ‖Easy and efficient parallel processing of massive data sets‖. [26] Jimmy Lin, ―MapReduce is good enough? If all you have is a hammer, throw away everything That’s not a nail!‖ arXiv:1209.2191v1 [cs.DC], Sep 2012. [27] Christos Doulkeridis, Kjetil N., ―A Survey of large scale Analutical Query processing in MapReduce‖, VLDB Journal. [28] K. Lee, Y. Lee, H. Choi, Y. Chung, N. Moon, ―Parallel Data Processing with MapReduce: A Survey”, SIGMOD Record, Dec 2011 (Vol 40 No. 4) [29] Patrick Valdureiz, ―Parallel database systems: Open Problems and New issues‖, Kluwer Academic Publishers, Boston, 1993 pp 137-165. [30] D.Dewitt, Jim Gray, ―Parallel database systems: The future of high performance database systems‖, Comm of the ACM, June 1992, Vol 35 No. 6. [31] Shyam Kotecha, ―Platform-as-a-Service‖, available online at http://www.ieee.ldrp.ac.in/index.php? option=com_phocadownload&view=category&download =4:pdf&id=1:workshop&Itemid=216 [32] Chun Chen, Gang Chen, Dawei Jiang, Beng Chin Ooi, Hoang Tam Vo, Sai Wu, and Quanqing Xu, ― Providing Scalable Database Services on the Cloud‖. [33] Y. Cao, C. Chen, F. Guo, D. Jiang, Y. Lin, B. C. Ooi, H. T. Vo, S. Wu, and Q. Xu., ―A cloud data storage system for supporting both OLTP and OLAP‖, Technical Report, National University of Singapore, School of Computing. TRA8/10, 2010. [34] Meng-Ju Hsieh, Chao-Rui Chang , Li-Yung Ho , Jan-Jan Wu , Pangfeng Liu ,―SQLMR : A Scalable Database Management System for Cloud Computing‖. [35] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, ―Bigtable : A distributed storage system for structured data,” ACM Trans Comput. Syst., vol. 26, June 2008. [36] Xiao Chen, ―Google BigTable‖, available online at http://www.net.in.tum.de/fileadmin/TUM/NET/NET2010-08-2/NET-2010-08-2_06.pdf. [37] Minpeng Zhu and Tore Risch, ―Querying Combined cloud-based and Relational Databases‖ in International Conference on Cloud and service computing 2011. [38] Herald Kllapi, Dimitris Bilidas, Ian Horrocks, Yannis Ioannidis,Ernesto Jimenez-Ruiz, Evgeny Kharlamov, Manolis Koubarakis, Dmitriy Zheleznyakov, ―Distributed Query Processing on the Clou: the Optique point of View‖. [39] R. Kontchakov,C. Lutz, D. Toman, F. Wolter and M. Zakharyaschev, ―The Combined approach to Ontology based database access‖. [40] Mariano Rodruez-Muro, Roman Kontchakov and Michael Zakharyaschev, ―Ontology based database access: Ontop of databases‖ available online at http://www.dcs.bbk.ac.uk/~roman/papers/ISWC13.pdf [41] D. Campbell, G. Kakivaya and N. Ellis, ―Extreme scale with full SQL language support in Microsoft SQL Azure,‖ in SIGMOD, 2010. [42] Chad DeLoatch and Scott Blindt, ―NoSQL databases: Scalable Cloud and Enterprise Solutions”, Aug 2012. [43] Christos Doulkeridis, Kjetil Nervag, ‖ A Survey of LargeScale Analytical Query Processing in MapReduce‖. [44] J. Dittrich and J.A. Quian, ―Efficient Big Data processing in Hadoop MapReduce‖, Proceedings of the VLDB Endowment.


58


[45] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff and Raghotham Murthy, ―Hive - A Warehousing Solution Over a Map-Reduce Framework‖, VLDB‗09, August 24-28, 2009, Lyon, France. [46] ―Evaluating Apache Cassandra as a Cloud database”, White Paper by Datastax Corporation, Oct 2013. [47] Kristóf Kovács, ―Cassandra vs MongoDB vs CouchDBvs Redis vs Riak vs HBase comparison‖, available online at http://kkovacs.eu/cassandra vs mongodb vs couchdb vs redis. [48] http://www.sdss.org [49] Eve S. McCulloch, ―Harnessing the Power of Big Data in Biological Research‖, AIBS Washington Watch, September 2013. [50] Spotfire Blogging Team, ―10 trends shaping big Data in financial services‖, January 2014. [51] Richard Winter, ―Big Data : Business Opportunities, Requirements and Oracle’s Approach”, December 2011. [52] Lisa Fleisher, ―Big Data Enters the Classroom: Technological Advances and Privacy Concerns Clash‖. [53] Darrell M. West, ―Big Data for Education: Data Mining, Data Analytics, and Web Dashboards‖, Governance Studies at Brookings. [54] Taylor Shelton and Mark Graham, ―Geography and the future of Big Data, Big Data and the future of Geography‖, December 2013. [55] Joan Serras, Melanie Bosredon, Ricardo Herranz & Michael Batty, ―Urban Planning and Big Data – Taking LUTi Models to the Next Level?‖ Nordregio News Issue 1, 2014 [56] An Executive Report by IBM Institute for Business Value ―Analytics: The real world use of Big Data in financial services‖. [57] A Deloitte Analytics paper, ―Big Data – Time for a lean approach in fininacial services‖, online available at http://www2.deloitte.com/content/dam/Deloitte/ie/Docum ents/Technology\ 2012_big_data_deloitte_ireland.pdf [58] A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz and A. Rasin, ―Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads,‖ Proc. VLDB Endow., vol. 2, August 2009. [59] N.Samatha1, K.Vijay Chandu, P.Raja Sekhar Redd,‖ Query Optimization Issues for Data Retrieval in Cloud Computing‖. [60] M. Tamer Oezsu, Patrick Valduriez ``Principles of Distributed Database Systems, Second Edition'' Prentice Hall, ISBN 0-13-659707-6, 1999 [61] W. Itani, A. Kayssi, A. Chehab, ―Privacy as a Service: Privacy-Aware Data Storage and Processing in Cloud Computing Architectures,‖ Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, Dec 2009

[62] L. Haas, D. Kossmann, E.Wimmers, and J. Yang, ―Optimizing queries across diverse data source,‖ in Proc. VLDB 1997, Athens, Greece. [63] Edd Dumbill, ―Big Data in the Cloud‖, Feb 2011 available online at http://www.o‘reilly.com [64] Vishal Jain, Dr. Mayank Singh, ―Ontology Development and Query Retrieval using Protégé Tool‖, International Journal of Intelligent Systems and Applications (IJISA), Hongkong, Vol. 5, No. 9, August 2013, page no. 6775, having ISSN No. 2074-9058, DOI: 10.5815/ijisa.2013.09.08 and index with Thomson Reuters (Web of Science), EBSCO, Proquest, DOAJ, Index Copernicus. [65] Vishal Jain, Dr. Mayank Singh, ―Ontology Based Information Retrieval in Semantic Web: A Survey‖, International Journal of Information Technology and Computer Science (IJITCS), Hongkong, Vol. 5, No. 10, September 2013, page no. 62-69, having ISSN No. 20749015, DOI: 10.5815/ijitcs.2013.10.06 and index with Thomson Reuters (Web of Science), EBSCO, Proquest, DOAJ, Index Copernicus.

Authors’ Profiles N.K.Seera is working as an Assistant Professor in Bharati Vidyapeeth College in the department of Computer Applications and Management, New Delhi. She is doing her Ph.D on Big Data on Cloud. Her areas of interest are Databases, Query Processing and Cloud Computing. She has presented many papers in National and International conferences.

V.Jain is working as an Assistant Professor in Bharati Vidyapeeth College in the department of Computer Applications and Management, New Delhi. He is doing his Ph.D from Lingaya‘s University, Faridabad. His areas of interests are Information retrieval, ontology and semantic web. He is a lifetime member of CSI (Computer Society of India) and ISTE (Indian Society of Technical Education).

How to cite this paper: Narinder K. Seera, Vishal Jain,"Perspective of Database Services for Managing Large-Scale Data on the Cloud: A Comparative Study", IJMECS, vol.7, no.6, pp.50-58, 2015.DOI: 10.5815/ijmecs.2015.06.08



INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 3, ISSUE 11, NOVEMBER 2014

ISSN 2277-8616

Mapping Between RDBMS And Ontology: A Review Vishal Jain, Dr. S. V. A. V. Prasad Abstract: Today Semantic web is playing a key role in the intelligent retrieval of information. It is the new-generation Web that tries to represent information such that it can be used by machines not just for display purposes, but for automation, integration, and reuse across applications. It allows the representation and exchange of information in a meaningful way. Ontologies form the backbone of the Semantic Web; they allow machine understanding of information through the links between the information resources and the terms in the ontologies. Ontology describes basic concepts in a domain and defines relations among them. An ontology together with a set of individual instances of classes constitutes a knowledge base. An effort has been made by the Semantic Web community to apply its semantic techniques in open, distributed and heterogeneous Web environments, and for sharing the knowledge in the semantic web. For sharing the knowledge ontologies were introduced, and have grown considerably in number. Building ontology for a specific domain may be start from scratch or by modifying or using an existing ontology. The term Semantic Web (SW) given by Tim Berners Lee is considered as vast concept within itself. Semantic Web (SW) is defined as collection of information linked in a way so that it can be easily processed by machines. It is information in machine form. It contains Semantic Web Documents (SWD‘s) that are written in RDF or OWL languages. They contain relevant information regarding user‘s query. Crawlers play vital role in accessing information from SWD‘s. . Index Terms: WWW, Semantic Web, Ontology, Ontology Mapping, OWL, RDBMS ————————————————————

1 INTRODUCTION Information Retrieval is the retrieval of information or data, either structured or unstructured. It retrieves in response to query statement which may be unstructured or structured also. Unstructured Query is like sentence which is written in common understandable language while structured query is in form of expression which is combination of equations and operands. IR deals with fusion of streams of output documents produced from multiple retrieval methods. They combined to form single ranked stream which is shown to user. There are two methods for solving queries:

Information Retrieval mainly focuses on retrieval of unstructured documents (natural text language documents). These documents may include videos, photos and audios etc. IR addresses retrieval of documents from an organized well defined huge collection of documents available on net which may be email, maps, news etc. Various goals of IR are described below:   

a)

By submitting a given query to multiple document collections. b) By submitting a given query through multiple IR methods. Traditional text search engines fails for finding optimal documents because of following reasons:   

Improper style of natural language: - These engines are not capable of understanding complex way of writing documents. High level unclear concepts: - Some concepts which are included in document but present search engines can‘t find those words. Semantic Relations: - We can‘t find relevant documents for word specified in part of document. E.g. If we have searched for Juice, then it will not find type or part of Juice.

_______________________________ 



Vishal Jain, Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad, Haryana, INDIA. E-mail: [email protected] Dr. S. V. A. V. Prasad, Professor, Electronics and Communication Department, Lingaya’s University, Faridabad, Haryana, INDIA. E-mail: [email protected]

IR aims on retrieving unstructured documents. IR engine may produce collection of relevant documents to user according to specified query entered by user. IR engine also arranges documents according to its rank which involves Page Rank algorithm. If a document ‗A‘ has more effective results than document ‗B‘, then ‗A‘ will organized first. It has been discussed in further sections.

2 SEMANTIC WEB Semantic Web (SW) came into existence due to problem in conventional search engines that dissatisfies users by retrieving inadequate and inconsistent results. The documents retrieved by conventional search engines are like horse of different colors. These engines work on predefined standard terms that work in centralized environment, thus accessing standard Ontologies. With advent of SW and Ontology, users are able to develop new facts and use their own keywords/terms in different environments. With use of ontology, user can perform following tasks: (a) Users can use Interface Description Languages (IDL) and services for different environments. IDL means defining new data objects and their relations. (b) Users can communicate with different agents using shared ontology like FOAF (Friend of a Friend). Semantic Web (SW) is combination of SWD‘s that are expressed in ontology languages (RDF, OWL). Ontology refers to categorization of concepts and relationships between terms in hierarchical fashion Although SWD‘s retrieves relevant information because they are characterized by semantic methods and ideas, but it is tedious job to find URL‘s of SWD‘s.

108 IJSTR©2014 www.ijstr.org

307


3 WEB ONTOLOGY LANGUAGE (OWL) W3C‘s Web Ontology Working Group defined OWL as three different sublanguages: 1. 2. 3.

ISSN 2277-8616

knowledge (e.g., space and time). They are shared by large number of users across distinct domains. Examples are WordNet and CYC. Application: they capture the knowledge necessary for a particular application, e.g., ontology representing the structure of a particular web site .

OWL Lite OWL DL (includes OWL Lite) OWL Full (includes OWL DL)

The W3C-endorsed OWL specification includes the definition of three variants of OWL, with different levels of expressiveness.

In enterprises, Google and Yahoo!, the major web search services, are using ontology-based approaches to find and organize the contents on the Web.

OWL Lite was originally intended to support those users primarily needing a classification hierarchy and simple constraints. For example, while it supports cardinality constraints, it only permits cardinality values of 0 or 1. It was hoped that it would be simpler to provide tool support for OWL Lite than its more expressive relatives, allowing quick migration path for systems utilizing thesauri and other taxonomies. In practice, however, most of the expressiveness constraints placed on OWL Lite amount to little more than syntactic inconveniences: most of the constructs available in OWL DL can be built using complex combinations of OWL Lite features. Development of OWL Lite tools has thus proven almost as difficult as development of tools for OWL DL, and OWL Lite is not widely used. OWL DL was designed to provide the maximum expressiveness possible while retaining computational completeness (all conclusions are guaranteed to be computed), decidability (all computations will finish in finite time), and the availability of practical reasoning algorithms. OWL DL includes all OWL language constructs, but they can be used only under certain restrictions (for example, number restrictions may not be placed upon properties which are declared to be transitive). OWL DL is so named due to its correspondence with description logic, a field of research that has studied the logics that form the formal foundation of OWL. OWL Full is based on a different semantics from OWL Lite or OWL DL, and was designed to preserve some compatibility with RDF Schema. For example, in OWL Full a class can be treated simultaneously as a collection of individuals and as an individual in its own right; this is not permitted in OWL DL. OWL Full allows an ontology to augment the meaning of the pre-defined (RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to support complete reasoning for OWL Full. Each of these sublanguages is an extension of its simpler predecessor, both in what can be legally expressed and in what can be validly concluded. Usually, Ontologies are defined to consist of abstract concepts and relationships (or properties) only. In some rare cases, Ontologies are defined to also include instances of concepts and relationships. The following three types of Ontologies are common in literature and are classified on the basis of their generality: Domain: they are domain-specific and are used capture knowledge in a particular domain, e.g., engineering, medicine, e-commerce, etc. Generic:

they

capture

general,

domain-independent 109 IJSTR©2014 www.ijstr.org

308


ISSN 2277-8616

Table 1`: Current Web and Semantic Web

Table 2: RDBMS Vs Ontology

4 ONTOLOGY MAPPING Michael Wick, Khashayar Rohanimanesh, Andrew McCallum, AnHai Doan [11] presented a fully supervised statistical model for ontology mapping based on conditional random fields. This model accounts for uncertainty in both the data and the data's structure. Results on two domains and showed that our supervised model is able to generalize across them has been evaluated. Yuan An, Alex Borgida and John Mylopoulos [12] discussed about the different mapping methods from database to ontologies. Here, author focused on semi-automatic tool, called MAPONTO, that assists users to discover plausible semantic relationships between a database schema (relational or XML) and an ontology, expressing them as logical formulas/rules. Raji Ghawi and Nadine Cullot [13], focus on a component of the architecture which is a tool, called DB2OWL, that automatically generates ontologies from database schemas as well as mappings that relate the ontologies to the information sources. The mapping process starts by detecting particular cases for conceptual elements in the database and accordingly converts database components to the corresponding ontology components. A prototype of DB2OWL tool is implemented to create OWL ontology from relational database. Table 3 depicts the comparative study of various approaches to convert database management system to ontology. Mostafa E. Saleh [15], presented an approach for semantic query in traditional relational database by

establishing ontological layer. In this paper, author has been described following rules for converting database to ontology: a)

b)

c)

d)

If the primary key of more than one relation is the same, then they should be merged in one ontological class, and their attributes should be merged. If the primary key of one relation is unique for that relation, and not contain the primary key in another relation, then that relation will be considered as one ontological class. If the foreign key in a relation Ri is a primary key in another relation Rj, then there is an object property (named by its name in Ri) from Ri to Rj, and the domain is Ri, and range is Rj. If the relation primary key consists of two other primary keys, then that relation is a property between two classes (resources), the classes are the two relations denoted by the two primary keys.

Wei Hu and Yuzhong Qu [16], propose a new approach to discovering simple mappings between a relational database schema and ontology. It exploits simple mappings based on virtual documents, and eliminates incorrect mappings via validating mapping consistency. Man Li, Xiao-Yong Du, Shan Wang [17], described the learning rules from relational database to OWL ontology. In this paper, an ontology learning approach has been proposed to construct OWL ontology


309


automatically based on data in relational database. The related learning rules are discussed in detail. It can be seen that the approach is practical and helpful to the automation of ontology building. Guntars Bumans [18], demonstrated on a simple yet completely elaborated example how mapping information stored in relational tables can be processed using SQL to generate RDF triples for OWL class and property instances. Noreddine Gherabi, Khaoula Addakiri [19], prototype has been implemented, which migrates a RDB into OWL structure, for demonstrate the practical applicability of approach by showing how the results of reasoning of this technique can help improve the Web systems. Authors have presented a new approach for mapping relational database into Web ontology. It captures semantic information contained in the structures of RDB, and eliminates incorrect mappings by validating mapping consistency. Secondly, we have proposed a new algorithm for constructing contextual mappings, respecting the rules of passage, and integrity constraints. Fuad Mire Hassan, Imran Ghani, Muhammad Faheem, Abdirahman Ali Hajji [20], in this authors has been reviewed and presented a number of articles for Human Resource Ontology in eRecruitment domain. The papers described the human resource ontology used within ontology matching approach, which provides means for semantic matching approach to match job seekers and job advertisements in a recruitment domain. Marc Ehrig and Steffen Staab [21], considered QOM, Quick Ontology Mapping, as a way to trade off between effectiveness (i.e. quality) and efficiency of the mapping generation algorithms and demonstrated that QOM has lower run-time complexity than existing prominent approaches. Jesús Barrasa, Óscar Corcho, Asunción GómezPérez [22], in this paper authors has been illustrated ―R2O, an Extensible and Semantically Based Database to-ontology Mapping Language‖. Authors presented R2O, an extensible and declarative language to describe mappings between relational DB schemas and ontologies implemented in RDF(S) or OWL. R2O provides an extensible set of primitives with well defined semantics. Michal Laclavık [23], presented the approach for creating semantic metadata from relational database data. When building ontology based information systems, it is often needed to convert or replicate data from existing information systems such as databases to the ontology based information systems, if the ontology based systems want to work with real data. RDB2Onto converts selected data from a relational database to a RDF/OWL ontology document based on a defined template. Carlos Eduardo Pires, Damires Souza, Thiago Pachêco, Ana Carolina Salgado [24], has been presented a tool SemMatcher, for matching ontology-based peer schemas, combining different matching strategies (e.g., linguistic, structural, and semantic). SemMatcher allows the identification of semantic correspondences between two peer ontologies using domain ontology as background knowledge. Also, the tool determines a global similarity measure between the matching ontologies that can be used for peer clustering. Nikolaos Konstantinou, Dimitrios-Emmanuel Spanos, Michael Chalas, Emmanuel Solidakis and Nikolas Mitrou [25], presented a VisAVis, an approach to mapping relational database contents to ontologies. Authors has shown the key idea, instead of storing instances along with the ontology terminology, keep them stored in a database and maintain a link to the dataset.

ISSN 2277-8616

5 NEW PROPOSED FRAMEWORK WHICH CAN BE DEVELOPED In order to overcome the problems of the existing frameworks and tools, it is critical to design an appropriate framework. This paper proposes a framework similar to DB2OWL but has additional features that address the identified problems and deficiencies. This implies that the proposed framework can support many databases as possible and the most used programming languages. In addition, the new framework has the capacity to output information in different formats, which non-programmers can under re-use without the need for an expert. However, the proposed framework will not depend on particular table cases. It is a general framework that is applicable to all tables, whatever the case. The proposed mapping process involves converting tables into classes, which have several properties, as well as relationships. The conversion process will start when the user uses a welldesigned user interface to send queries to the database. The proposed visualization service must be able to present the required queries in a suitable manner. In order to consider the requirements of different users, including those who do not have programming skills, the visualization service should have an interface that has select option for users to key-in commands in a desired language. This should consider all the available programming languages as well as human language, which diverse users can understand. Therefore, it is extremely critical to include a module that translates the input text and instruction into different programming languages. In addition, the new framework ought to incorporate a module that enables users to export information into different formats apart from the default format. Users must be able to output information in the form of text files, tables and datasets among others.


310


ISSN 2277-8616

Table 3: “Features of different database-to-ontology mapping approaches” [13]

Table 4: Ontologies based-on newly-built approaches and its associated matching algorithms [19, 20]


311


6 COMPARISON OF ALREADY DEVELOPED TOOLS AND FRAMEWORKS List of Tools 1. RDB To Onto

2. Asio Semantic Bridge

3. Data Grid Semantic Web Kit

4. DB2OWL

5. SOAP (Simple object Access protocol) 6. R2O

7. Triplify

Features

predict

Computer Applications and Management (BVICAM), New Delhi, for giving me opportunity to do Ph.D from Linagay‘s University, Faridabad, Haryana.

REFERENCES

(a) Design and implements ontology based on relational database (b) User-oriented i.e. user can modify and access records. (a) Creates ontology and represent rows of table as classes and columns as their properties. (b) Allows updating SPARQL queries to SQL and performs its execution. (a) Performs mapping as well as querying RDF triples. (b) User defined tool that uses GUI (visual interface) to define individual classes. (c) Generates SPARQL queries and translates them into SQL queries. (a) Automatically creates ontology by converting each component of databases into classes, properties and relations. (b) Represents developed ontology in OWL-DL (Description Logic) language. (c) Supports only MySql, Oracle databases. (a) Predictive in Nature. (b) Uses classes to ontologies.

ISSN 2277-8616

nature

[1].

Sakthi Murugan R, P. Shanthi Bala and Dr. G. Aghila, ― Ontology Based Information Retrieval - An Analysis‖, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 10, October 2013, page no. 486 to 493.

[2].

Swathi Rajasurya , Tamizhamudhu Muralidharan , Sandhiya Devi and Dr. S. Swamynathan, ―Semantic Information Retrieval Using Ontology In University Domain‖, International Journal of Web & Semantic Technology (IJWesT), Taiwan, Vol.3, No.4, October 2012.

[3].

Zhanjun Li, Victor Raskin and Karthik Ramani, A Methodology Of Engineering Ontology Development For Information Retrieval‖, International Conference On Engineering Design, ICED‘07, 28 - 31 August 2007, Cite Des Sciences Et De L'industrie, Paris, France.

[4].

Dongpo Deng, ―Building Ontology of Place Name for Spatial Information Retrieval‖, http://www.gisdevelopment.net/technology/gis/ma072 59pf.html

[5].

Christopher S.G. Khoo, Jin-Cheon Na, Vivian Wei Wang, and Syin Chan, ―Developing an Ontology for Encoding Disease Treatment Information in Medical Abstracts‖, DESIDOC Journal of Library & Information Technology, Vol. 31, No. 2, March 2011, pp. 103-115.

[6].

Valentina Cordi, Viviana Mascardi, Maurizio Martelli, Leon Sterling, ―Developing an Ontology for the Retrievalof XML Documents: A Comparative Evaluation of Existing Methodologies‖,

[7].

Naveen Malviya, Nishchol Mishra, Santosh Sahu, ―Developing University Ontology using protégé OWL Tool: Process and Reasoning‖, International Journal of Scientific & Engineering Research Volume 2, Issue 9, September-2011.

[8].

Lakshmi Palaniappan, N. Sambasiva Rao, G. V. Uma, ―Development of Dining Ontology Based On Image Retrieval‖, International Journal of Scientific Engineering and Technology (ISSN : 2277-1581), Volume No.2, Issue No.6, pp : 560-566

[9].

Norasykin Mohd Zaid and Sim Kim Lau, ―Development of OntologyInformation Retrieval System for Novice Researchers in Malaysia‖, IBIMA Publishingn Journal of Software and Systems Development, Vol. 2011 (2011), Article ID 611355, 11 pagesDOI: 10.5171/2011.611355 ttp://www.ibimapublishing.com/journals/JSSD/jssd.ht ml

of

(a) Uses XML for expressing elements of database and ontology. (b) Detects ambiguities between classes and their properties. (a) Represents data that is also present in other databases. (b) Generates SQL queries by linking and converting requests from various databases connected on remote hosts. (c) Does not support SPARQL and is easy to use in various applications. Table 5: “Comparative Study of Tools”

7 CONCLUSION This paper emphasis on the concept of Ontology Mapping, discuss various approaches for converting relational database to ontology and vice-versa. It is evident the conversion of relational databases to ontology is a diverse process and the frameworks and tools used are diverse. These frameworks and tools have their merits and demerits. Data presentation and output formats and languages are crucial concerns. The proposed frameworks will ensure that there is maximum data integrity in after conversion. In addition, it offers users the ability to customize queries depending on their literacy level. Automation is also a critical part of the proposed framework.

ACKNOWLEDGMENT I, Vishal Jain, would like to give my thanks to Mr. Gagandeep Singh Narula for helping me in comparison of these tools and Dr. M. N. Hoda, Director, Bharati Vidyapeeth‘s Institute of 113 IJSTR©2014 www.ijstr.org

312


[10]. Sanjay K. Dwivedi and Anand Kumar, ―Development of University Ontology for a SPOCMS‖, Journal Of Emerging Technologies In Web Intelligence, Vol. 5, No. 3, August 2013, Page no 213-221. [11]. Michael Wick, Khashayar Rohanimanesh, Andrew McCallum, AnHai Doan, ―A Discriminative Approach to Ontology Mapping‖ VLDB ‗08, August 2430, 2008, Auckland, New Zealand. [12]. Yuan An, Alex Borgida and John Mylopoulos, ―Building Semantic Mappings from Databases to Ontologies‖, American Association for Artificial Intelligence, Year 2006. [13]. Raji Ghawi and Nadine Cullot, ―Database-to-Ontology Mapping Generation for Semantic Interoperability‖, VLDB ‘07, September 23-28, 2007, Vienna, Austria. [14]. Nadine Cullot, Raji Ghawi, and Kokou Yétongnon, " DB2OWL: A Tool for Automatic Database-to-Ontology Mapping‖, Laboratoire LE2I, Université de Bourgogne, Dijon, FRANCE [15]. Mostafa E. Saleh , ―Semantic-Based Query in Relational Database Using Ontology‖, Canadian Journal on Data, Information and Knowledge Engineering Vol. 2, No. 1, January 2011 [16]. Wei Hu and Yuzhong Qu, ―Discovering Simple Mappings Between Relational Database Schemas and Ontologies‖, School of Computer Science and Engineering, Southeast University, Nanjing 210096, P.R. China [17]. Man Li, Xiao-Yong Du, Shan Wang, ―Learning Ontology from Relational Database‖, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005. [18]. Guntars Bumans, ―Mapping between Relational Databases and OWL Ontologies: an Example‖, Scientific Papers, University of Latvia , Computer Science and Information Technologies, 2010. Vol. 756. [19]. Noreddine Gherabi, Khaoula Addakiri, ―Mapping Relational Database into OWL Structure with Data Semantic Preservation‖, (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 1, January 2012. [20]. Fuad Mire Hassan, Imran Ghani, Muhammad Faheem, Abdirahman Ali Hajji ― Ontology Matching Approaches for eRecruitment‖, International Journal of Computer Applications (0975 – 8887), Volume 51– No.2, August 2012. [21]. Marc Ehrig and Steffen Staab, ―QOM - Quick Ontology Mapping‖, Institute AIFB, University of Karlsruhe.

ISSN 2277-8616

[22]. Jesús Barrasa, Óscar Corcho, Asunción GómezPérez, ―R2O, an Extensible and Semantically Based Databaseto-ontology Mapping Language‖, Ontology Engineering Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Spain [23]. Michal Laclavık, ―RDB2Onto: Relational Database Data to Ontology Individuals Mapping‖, Institute of Informatics, Slovak Academy of Sciences, Dúbravská cesta 9, 845 07 Bratislava, Slovakia. [24]. Carlos Eduardo Pires, Damires Souza, Thiago Pachêco, Ana Carolina Salgado, ―SemMatcher: A Tool for Matching Ontology-based Schemas‖. [25]. Nikolaos Konstantinou, Dimitrios-Emmanuel Spanos, Michael Chalas, Emmanuel Solidakis and Nikolas Mitrou, ―VisAVis: An Approach to an Intermediate Layer between Ontologies and Relational Database Contents References‖, Web Information Systems Modeling, Page no. 1050-1062.

Authors Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing P.hD from Computer Science and Engineering Department, Lingaya‘s University, Faridabad. Presently he is working as Assistant Professor in Bharati Vidyapeeth‘s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE. Dr. S. V. A. V. Prasad has completed M.Tech., Ph.D. Presently working as professor and Dean (Academic Affairs), Dean (R&D, IC), Dean School of Electrical Sciences. Sir has actively participated and organized many refresher courses, seminars, workshops on ISO, ROHS, Component Technology, WEEE, Organizational Methods, Time Study, Productivity enhancement and product feasibility etc. Sir has developed various products like 15 MHz dual Oscilloscope, High Voltage Tester, VHF Wattmeter, Standard Signal, Generator with AM/FM Modulator, Wireless Becom, High power audio Amplifier, Wireless microphone and many more in the span of 25 years (1981 – 2007). Sir has awarded for excellence in R&D in year 1999, 2004 and National Quality Award during the Year 1999, 2000, 2004 and 2006.Sir is Fellow member of IEEE and life member of ISTE, IETE, and Society of Audio & Video System. Sir has published more than 90 research papers in various National & International conferences and journals. Sir research area includes Wireless Communication, Satellite Communication & Acoustic, Antenna, Neural Networks, and Artificial Intelligence.


313

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET

ANALYSIS OF RDBMS AND SEMANTIC WEB SEARCH IN UNIVERSITY SYSTEM Vishal Jain1 and S. V. A. V. Prasad2 1

Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad, India 2 Professor, Electronics and Communication Engineering Department, Lingaya’s University, Faridabad, India

ABSTRACT To retrieve information from documents, there are many Information Retrieval (IR) techniques. Current IR techniques are not so advanced that they can be able to exploit semantic knowledge within documents and give precise results. IR technology is major factor responsible for handling annotations in Semantic Web (SW) languages. With the rate of growth of web and huge amount of information available on the web which may be in unstructured, semi structured or structured form, it has become increasingly difficult to identify the relevant pieces of information on the internet.In this paper, implementation of new proposed model, “Mining in Ontology with Multi Agent Systems” has been discussed and analyzed the model for comparative study in the search of RDBMS system and Ontology based system. In this model, the Semantic Web addresses the first part of this challenge by trying to make the data also machine understandable in the form of Ontology, while MultiAgent addresses the second part by semi-automatically extracting the useful knowledge hidden in these data, and making it available. KEYWORDS: Information Retrieval, Semantic Web, Ontology, Multi Agent Systems.

I.

INTRODUCTION

The Semantic Web is an evolution of the current Web that represents information in a machinereadable format, while maintaining the human-friendly HTML representation and it avoid key word searching. Data handling and retrieving has an essential importance where the data size is larger than a certain amount. Storing transactional data in relational form and querying with Structured Query Language (SQL) is very preferable because of its tabular structure. Data may be also stored in an ontological form if it includes numerous semantic relations. This method is more suitable in order to infer information from relations. When transactional data contain many semantic relations inside as in our problem, it is not easy to decide the method of storing and querying. Improving Information retrieval by employing the use of Ontologies to overcome the limitations of syntactic search has been one of the inspirations since its emergence. Ontologies are important components of web-based applications. While the Web makes an increasing number of ontologies widely available for applications, how to discover ontologies in the Web becomes a more challenging issue. Existing approaches are mainly based on keywords and metadata information of Ontologies, rather than semantic entailments of ontologies. The current search engines performs search based on the syntax not on semantics. The keyword based search engines fails to understand and analyze the con-text in which keywords are used. The situation worsens when the search phrase is a combination of keywords. The quality of the results degrades with irrelevant results of documents which contains only the part of search phrase leaving the meaning aside. A semantic search –Tim Berner Lee’s unrealized dream - resolves this issue by analyzing the contexts and the relationships between the key words and thus producing a “quality high” and “quantity less” results. The basic idea of semantic web is to enrich the current Web with machine cognitive information about the semantics of information

604 115

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET content [17]. The remaining sections of paper are as follows. Section 2 makes readers aware of related work. In this section, we have discussed about importance of Semantic Web over Current Web. Section 3 defines proposed solution for making information retrieval more relevant and fast. In section 4, proposed model has been discussed with the help of a case study on University System. In further part of this paper, discusses the development part of the model and shows the query result.

II.

LITERATURE SURVEY

Keyword based Search engines are not able to provide relevant search result because they suffer from the fact that they do not know the meaning of the terms and expression used in the web pages and the relationship between them. This paper compares the semantic search performance of both keywordbased and semantic web based search engines [3, 4]. Initially, two keyword based search engines (Google and Yahoo) and three semantic search engines (Hakia, DuckDuckGo and Bing) are selected to compare their search performance on the basis of precision ratio and how they handle natural language queries. Ten queries, from various topics was run on each search engine, the first twenty documents on each retrieval output was classified as being “relevant” or “non-relevant”. Afterwards, precision ratios were calculated for the first 20 document retrieved to evaluate performance of these search engines. In our study it was found that relevant document retrieved by Bing is more (145 out of 200) than any other search engine. Hakia overall performance in terms of average percentage is lowest (56%). Figure 3 present overall graphical representation of mean precision ratio of search engine for first 20 documents. While figure 4 represents graphical representation for queries (number 1 to number 10). 80 Google

Precision

60

Yahoo

40

Bing

20 0

Hakia Mean Precision Ratio of Search Engine

DuckDuckGo

Figure 1: Precision ratio of search engines for first 20 documents [3, 4]

A number of ontology reasoning systems have been developed for reasoning and querying the semantic web. Since they implement different reasoning algorithms and optimization techniques, they differ in a number of ways. Previous attempts at comparing performance of ontology reasoning systems have mainly considered performances of individual query requests. In this paper, authors, Chulki Lee, Sungchan Park, Dongjoo Lee, Jae-won Lee, Ok-Ran Jeong, Sang-goo Lee, presented the results of testing four of the most popular ontology reasoning systems on query sequences that reflect real world use cases. We believe that using query sequences is a more effective way to evaluate ontology reasoning systems [6]. Vipin Kumar.N, Archana P. Kumar, Kumar Abhishek presented a comparative study on SPARQL and SQL [7]. In this paper authors presented the benefits of SPARQL over SQL. Abrar Ahmad H, Muhammad Ruknuddin Ghalib [8] presented a system framework for building the semantic web support for intelligent search using RDF, ontology and SPARQL queries. J. Uma Maheswari, G. R. Karpagam, proposed a framework for Ontology based information retrieval model [9]. Venkata Sudhakara Reddy .Ch, Hemavathi .D proposed an new algorithm, Stemming Algorithm for semantic web search. In this paper, authors have presented a comparative study of RDBMS and Semantic Web search. Their experiments show that in the outcome of deployment of a new module, our incremental extraction approach minimizes the processing time by 92 percent as compared to a traditional pipeline approach. By using our methods to a corpus of 20 million biomedical abstracts, experiments indicate that the query performance is efficient for real-time applications. Experiments also uncovered that our approach achieves high quality extraction results [10]. Chintan Patel, Kaustubh Supekar, Yugyung Lee, E.K. Park, worked on OntoKhoj: A Semantic Web Portal for Ontology Searching, Ranking and Classification [11]. The methodology in developing OntoKhoj is based on algorithms used for searching, aggregating, ranking and classifying ontologies

605 116

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET in Semantic Web. Jeff Z. Pan, Edward Thomas and Derek Sleeman, worked on Ontosearch2: Searching and Querying Web Ontologies [12]. ONTOSEARCH2 is able to reliably query large data sets faster than comparable database driven knowledge management systems. The recall and precision figures from the tests performed are encouraging but there are situations in which incomplete results can be returned, They have evaluated the ONTOSEARCH2 system using the Lehigh University Benchmark (LUBM) [13] to measure its performance on large data sets. They have run benchmarks using generated data sets representing 1, 5, 10, 20 and 50 universities, these are generated using the same seed and index values as used in [14]. Mariano Rodr guez-Muro, Roman Kontchakov and Michael Zakharyaschev [15], worked on OBDA with Ontop. Ontop (ontop.inf.unibz.it) is an ontology-based data access (OBDA) system implemented at the Free University of Bozen-Bolzano. Hyun Hee Kim, Soo Young Rieh, Tae Kyoung Ahn, Woo Kwon Chang implemented an ontologybased knowledge management system which makes knowledge assets intelligently accessible to Korean financial firms [16]. This paper introduced the ontology model by illustrating the four components and reports on the implementation and evaluation of the information ontology for the searching of web resources. Based on the content analysis of eight international bank web sites, a pilot system of information ontology for web resources consisting of Publication, Project, Member, Person, and Organization was constructed. A comparative experiment was conducted to evaluate the information ontology for web resources. The performance of the ontology-based system was compared with that of web search engines in terms of relevance and search time. Ten researchers from an economic research institution were recruited in October, 2002, to conduct experiments in the researchers’ offices except on two occasions when both of which were conducted in the library. Before the participants conducted their searches, they were provided by the experimenter with a halfhour presentation about the system. The participants were given a list of twenty tasks and asked to perform both on a search engine of their choice and the ontology-based pilot system. The tasks included answering questions about the locating of scholarly literature, statistical data, conference information, news, people searches, and project searches. Sareena Rose [17] presented a conceptual study on new searching techniques for Ontology based search engines. This paper focused on a study and improvisation of searching techniques used in semantic search engines keeping time complexity as the major factor. For analysis purpose, the keyword “Blood Cancer” was selected. After the processing of first and second phase the out-put was semantically expanded query combined with OR and AND operators. Li Ma, Yang Yang, Zhaoming Qiu, Guotong Xie, Yue Pan, Shengping Liu [18], build a complete benchmark for better evaluation of existing ontology systems, we extend the well-known Lehigh University Benchmark in terms of inference and scalability testing. The extended benchmark, named University Ontology Benchmark (UOBM), includes both OWL Lite and OWL DL ontologies covering a complete set of OWL Lite and DL constructs, respectively. Oracle 11g OWL is a scalable, efficient, forward chaining based reasoner that supports an expressive subset of OWL-DL. Oracle evaluated the inference performance of RDF and OWL Prime on the LUBM dataset [19]. Oracle evaluated this performance on database installed on machine “semperf3”. M. Rodriguez-Muro, R. Kontchakov, M. Zakharyaschev [20], here authors has provided the detailed information for the LUBMex20 experimentation. In this case there is no given SQL schema; and hence, no mappings. Ontop creates a relational schema together with mappings itself using the "Semantic Index" technique described in the paper. Here, a detailed comparison has been described on ontop-DB2, ontop-MySQL, OWLIM and Stardog with the use of LUBMex20 (Lehigh University Benchmark, Extended) dataset scenario. Anarosa Alves, Franco Brandão, Viviane Torres da Silva, Carlos José Pereira de Lucena [21], presented a work on a model driven approach to develop multi-Agent Systems. In this paper, authors described a model driven approach to develop multi-agent systems that begins with an ontology based on the TAO conceptual framework.Quynh-Nhu Numi Tran, Graham Low [22], worked on MOBMAS: A methodology for ontology-based multi-agent systems development. In this authors, authors proposed a new framework and compared MOBMAS against sixteen well known methodologies: MaSE, MASSIVE , SODA, GAIA, MESSAGE , Methodology for BDI Agent, INGENIAS, Methodology with High-Level and Intermediate Levels, Methodology for Enterprise Integration, PROMETHEUS, PASSI, ADELFE, COMOMAS , MAS-Common, KADS , CASSIOPEIA and TROPOS. Pakornpong Pothipruk and Pattarachai Lalitrojwong [23], worked on an Ontology-based Multi-agent System for Matchmaking. Csongor Nyulas, Martin J. O’Connor,

606 117

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET Samson Tu1, David L. Buckeridge, Anna Akhmatovskaia, Mark A. Musen [24], presented their work on An Ontology-Driven Framework for Deploying JADE Agent Systems. Authors described a methodology and suite of tools to support the modeling and deployment of agents on the JADE platform. These models are encoded using the Semantic Web ontology language OWL and provide detailed computer-interpretable specifications of agent behavior in a JADE system. Gajun Ganendran, Quynh-Nhu Tran, Pronab Ganguly, Pradeep Ray and Graham Low [25], proposed a methodology on An Ontology-driven Multi-agent approach for Healthcare. In this paper, authors described an ontology-driven multi-agent approach to the development of healthcare systems, with a case study in diabetes management. Wongthongtham, P., Chang, E., Dillon, T.S. [26], worked on Ontology-based Multi-agent system to Multi-site Software Development. Authors described software agent utilized ontology as its intelligence in MSSD and found that it has benefits. Ontology gives computers more knowledge that the agent can utilize. Maja Hadzic, Elizabeth Chang [27], worked on use of ontologybased multi-agent systems in the biomedical domain. Authors have shown how the ontologies can be used by multi-agent systems in intelligent information retrieval processes. The ontologies can be used to support some important processes involved in the information retrieval such as posing queries by the user, problem decomposition and task sharing among different agents, result sharing and analysis, information selection and integration, and structured presentation of theassembled information to the user.

III.

PROPOSED SOLUTION

In order to implement semantic search, new Ontology based information retrieval model; “Mining in Ontology based Multi-Agent System for Information Retrieval” has been proposed in this paper. Proposed model has been implemented in University System.

IV.

CASE STUDY

Details of the Proposed System Responsibilities: 1. Functionalities: - To search Universities/Colleges for taking admission in Ph.D, M.Tech, B.Tech, MCA, MBA, BCA, BBA - To Search particular Book, Journal, Magazine, Periodicals in College/University Library - To search person either Student, Teaching Staff, Non Teaching Staff in any mentioned universities - To search Learning Material/Study Material of any particular subject. 2. Details of Universities: - University Name - Type of Universities (Central Govt./State Govt./Deemed/Private) - Address - Affiliated Colleges - Approved Courses o Course Duration o Mode of Admission (Process) 3. Details of Library - Books o Title o Author o Publisher o Pages o Year of Publication - Journals o Title o Type (National/International) o Topic (Engineering, Humanities, Arts) o Issue No.

607 118

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET - Magazine - Periodicals 4. Details of Person - Person Name - Category (Student/Staff) - College/University Name - Contact Details o Address o Contact Number o E-Mail Id 5. Details of Course Material - Branch Name (CSE, ECE, EEE, IC, MAE, Civil) - Subject Name - Format (PPT/PDF) 6. Working Scenario of the Project - Ontology can be used as database in the backend - JSP/Java based web site to design the GUI, where user will give query - JADE will retrieve data of query, fired by the user - Mining can be used to extract pattern from the database 5. TECHNOLOGY STACK

Figure 2: Technology Stack

1. OWL - Web Ontology Language is used here to represent the university domain including, Universities, Courses, People, Library and Materials. 2. Protégé - A free, open-source ontology editor and framework for building intelligent systems. 3. OWLIM - OWLIM is a family of semantic repositories, or RDF database management systems, with the following characteristics:  native RDF engines, implemented in Java  delivering full performance through both Sesame and Jena  robust support for the semantics of RDFS, OWL 2 RL and OWL 2 QL  best scalability, loading and query evaluation performance 4. WEKA - Apriori Algorithm - Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules. 5. Empire - Empire provides Java developers an easy way to integrate and start using SPARQL & RDF via JPA persistence annotations. 6. JADE - Java Agent Development Framework, (JADE) is used for the development of agents, implemented in Java. JADE system supports coordination between several agents FIPA and provides

608 119

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET a standard implementation of the communication language FIPA-ACL, which facilitates the communication between agents and allows the services detection of the system

V.

DEVELOPMENT

5.1 Ontology Development

Figure 3: Ontology Entities

Figure 4: Ontology Classes

Figure 5: Data Properties

609 120


Figure 6: Individuals

Figure 7: Ontology Graph

5.2 Agents Development

Figure 8: JADE Screen

610 121


Figure 9: Add Agents

Figure 10: Load Agents

Figure 11: University Agent Details

611 122

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET 5.3. Screenshots 1. Login

Figure 12: Proposed University System

Figure 13: Search in University

Figure 14: Search Person in Uniersity

VI.

ANALYSIS

Analysis of Proposed System by giving following queries: Query 1: Universities within Chennai Solution: University(x) ^ state(x, y) ^ address(y, “chennai”) Query 2: Universities offering MCA course

612 123

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET Solution: University(x) ^ offeredCourses(x,y) ^ courseDec(y, “MCA”) Query 3: Professor handling MBA course Solution: Person(x) ^ type(x, Staff) ^ associatedCourse(x, MBA) Query 4: Students taking BE course Solution: Person(x) ^ type(x, Student) ^ takingCourse(x, BE) Query 5: Books published in the year of 2014 Solution: LibraryItem(x) ^ type(x, Book) ^ publishedYear(x, 2014) Query time in seconds Table 1: Comparison between RDBMS and Semantic Web Search Data 103K 2.8M 13.9M

System SW RDBMS SW RDBMS SW RDBMS

Q1 22.820 28.410 176.972 260.350 308.198 1382.824

Q2 2.604 3.014 8.312 9.300 65.713 91.452

Q3 7.105 9.215 27.789 34.021 5.105 176.516

Q4 0.287 0.216 1.531 0.816 3.968 5.368

Q5 3.879 4.109 16.889 18.313 111.368 133.887

Figure 15: Comparison between RDBMS and Semantic Web Search (103K Data)

Figure 16: Comparison between RDBMS and Semantic Web Search (2.8M Data)

Figure 17: Comparison between RDBMS and Semantic Web Search (13.9M Data)

613 124


ACKNOWLEDGEMENTS I, Vishal Jain, would to give my thanks to Mr. Anandam Raj for help me in programming for the model and sincere thanks to Dr. M. N. Hoda, Director, Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi, for giving me opportunity to do Ph.D from Linagay’s University, Faridabad..

VII.

CONCLUSIONS

A new model has been defined with the use of Mining in Ontology with Multi Agent system for information retrieval, whereas ontology can be used as a repository, mining for data extraction and multi agent system can be used for data representation. This paper will also helpful to other researchers, who would like to do work in this area.In this paper, Analysis of proposed model has been shown that Semantic Web search may be better than RDBMS search.

REFERENCES [1].

[2].

[3].

[4]. [5]. [6].

[7].

[8].

[9]. [10].

[11]. [12]. [13]. [14]. [15]. [16].

[17].

Vishal Jain, Dr. Mayank Singh, “Ontology Based Information Retrieval in Semantic Web: A Survey”, International Journal of Information Technology and Computer Science (IJITCS), China, Vol. 5, No. 10, September 2013, page no. 62-69. Vishal Jain, Dr. Mayank Singh, “Ontology Development and Query Retrieval using Protégé Tool”, International Journal of Intelligent Systems and Applications (IJISA), China, Vol. 5, No. 9, August 2013, page no. 67-75. Yogender Singh Negi, Suresh Kumar, “A Comparative Analysis of Keyword Based and Semantic Based Search Engines”, Ambedkar Institute of Advanced communication Technologies & Research, Delhi, India. Jagendra Singh, Dr. Aditi Sharan, “A Comparative Study between Keyword and Semantic Based Search Engines”, International Conference on Cloud, Big Data and Trust 2013, Nov 13-15, RGPV. Cios, Kryzysztof. Data mining: A knowledge discovery approach. Berlin: Springer, 2010. Chulki Lee, Sungchan Park, Dongjoo Lee, Jae-won Lee, Ok-Ran Jeong, Sang-goo Lee, “A Comparison of Ontology Reasoning Systems Using Query Sequences”, School of Computer Science and Engineering, Seoul National University Seoul,, Republic of Korea . Vipin Kumar.N, Archana P. Kumar, Kumar Abhishek, “A Comprehensive Comparative study of SPARQL and SQL”, International Journal of Computer Science and Information Technologies, Vol. 2 (4) , 2011, 1706-1710. Abrar Ahmad H, Muhammad Ruknuddin Ghalib, “A Novel Framework of Semantic Web Based Search Engine”, International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 5, May 2012. J.Uma Maheswari, Dr. G.R.Karpagam, “A Conceptual Framework for Ontology Based Information Retrieval”, International Journal of Engineering Science and Technology, Vol. 2(10), 2010, 5679-5688. Venkata Sudhakara Reddy .Ch, Hemavathi .D, “Information Extraction Using RDBMS and Stemming Algorithm”, International Journal of Science and Research (IJSR) ISSN (Online): Volume 3 Issue 4, April 2014, 2319-7064. Chintan Patel, Kaustubh Supekar, Yugyung Lee, E.K. Park, worked on OntoKhoj: A Semantic Web Portal for Ontology Searching, Ranking and Classification. Jeff Z. Pan, Edward Thomas and Derek Sleeman, “ Ontosearch2: Searching and Querying Web Ontologies”, University of Aberdeen, Aberdeen, UK Guo Y, Pan Z, and Heflin J, 2005. LUBM: A Benchmark for OWL Knowledge Base Systems. Journal of Web Semantics, 3(2), pp. 158-182. Guo Y, Pan Z, Heflin J, 2004. An Evaluation of Knowledge Base Systems for Large OWL Datasets. Third International Semantic Web Conference, Hiroshima, Japan, LNCS 3298, 2004, pp. 274-288. Mariano Rodr guez-Muro, Roman Kontchakov and Michael Zakharyaschev, “OBDAwith Ontop”, Faculty of Computer Science, Free University of Bozen-Bolzano, Italy. Hyun Hee Kim, Soo Young Rieh, Tae Kyoung Ahn, Woo Kwon Chang, “Implementing an OntologyBased Knowledge Management System in the Korean Financial Firm Environment”, Korea Research Foundation Grant (KRF-2002-005-B20006) Sareena Rose, “New Searching Techniques for Ontology Based Search Engines – A Conceptual Study”, International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014.

614 125

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET [18].

[19]. [20].

[21].

[22]. [23]. [24]. [25].

[26]. [27]. [28].

Li Ma, Yang Yang, Zhaoming Qiu, Guotong Xie, Yue Pan, Shengping Liu, “Towards A Complete OWL Ontology Benchmark”, IBM China Research Laboratory, Building 19, Zhongguancun Software Park, ShangDi, Beijing, P.R. China. Zhe Wu, Oracle, “A Scalable RDBMS-Based Inference Engine for RDFS/OWL”, Oracle New England Development Center, http://search.oracle.com. M. Rodriguez-Muro, R. Kontchakov, M. Zakharyaschev, "Ontology-Based Data Access: Ontop of Databases", LUBMex20, ontopISWC13, Department of Computer Science and Information Systems Birkbeck, University of London, Malet Street, London, https://sites.google.com/site/ontopiswc13/home/lubmex20. Anarosa Alves, Franco Brandão, Viviane Torres da Silva, Carlos José Pereira de Lucena , “A Model Driven Approach to Develop Multi-Agent Systems”, Monografias em Ciência da Computação, No. 09/05. Quynh-Nhu Numi Tran, Graham Low, “MOBMAS: A Methodology for Ontology-Based Multi-Agent Systems Development”, Information and Software Technology 50 (2008) 697–722. Pakornpong Pothipruk and Pattarachai Lalitrojwong, “An Ontology-based Multi-agent System for Matchmaking” Nida Jawre, Kiran Bhandari, "User Authentication using Temporal Information", International Journal of Advances in Engineering & Technology (IJAET), Volume 6 Issue 4, pp. 1733-1739, Sept. 2013. Csongor Nyulas, Martin J. O’Connor, Samson Tu1, David L. Buckeridge, Anna Akhmatovskaia, Mark A. Musen, “An Ontology-Driven Framework for Deploying JADE Agent Systems”, Stanford University, Stanford Center for Biomedical Informatics Research. Gajun Ganendran, Quynh-Nhu Tran, Pronab Ganguly, Pradeep Ray and Graham Low, “An Ontologydriven Multi-agent approach for Healthcare”, HIC 2002, 0958537097. Wongthongtham, P., Chang, E., Dillon, T.S., “Ontology-based Multi-agent system to Multi-site Software Development”, Workshop QUTE-SWAP@ACM/SIGSOFT-FSE12, November 5, 2004. Maja Hadzic, Elizabeth Chang, “Use Of Ontology-Based Multi-Agent Systems in the Biomedical Domain”, Curtin University of Technology, School of Information Systems Perth, Western Australia, 6845, Australia.

AUTHORS Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing P.hD from Computer Science and Engineering Department, Lingaya’s University, Faridabad. Presently he is working as Assistant Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE.

S. V. A. V. Prasad has completed M.Tech., Ph.D. Presently working as professor and Dean (Academic Affairs), Dean (R&D, IC), Dean School of Electrical Sciences. Sir has actively participated and organized many refresher courses, seminars, workshops on ISO, ROHS, Component Technology, WEEE, Organizational Methods, Time Study, Productivity enhancement and product feasibility etc. Sir has developed various products like 15 MHz dual Oscilloscope, High Voltage Tester, VHF Wattmeter, Standard Signal, Generator with AM/FM Modulator, Wireless Becom, High power audio Amplifier, Wireless microphone and many more in the span of 25 years (1981 – 2007). Sir has awarded for excellence in R&D in year 1999, 2004 and National Quality Award during the Year 1999, 2000, 2004 and 2006.Sir is Fellow member of IEEE and life member of ISTE, IETE, and Society of Audio & Video System. Sir has published more than 90 research papers in various National & International conferences and journals. Sir research area includes Wireless Communication, Satellite Communication & Acoustic, Antenna, Neural Networks, and Artificial Intelligence.

APPENDIX - A Code Snippet 1. University Ontology package Vishal; import java.util.Collection; import org.protege.owl.codegeneration.WrappedIndividual;

615 126

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET import org.semanticweb.owlapi.model.OWLNamedIndividual; import org.semanticweb.owlapi.model.OWLOntology; /** * *
* Generated by Protege (http://protege.stanford.edu).
* Source Class: University
* @version generated on Thu Aug 28 20:39:13 IST 2014 by Charvi */ public interface University extends WrappedIndividual { /* *************************************************** * Property http://localhost/Institute#centretHead */ /** * Gets all property values for the centretHead property.
* * @returns a collection of values for the centretHead property. */ Collection

619 130

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET 4. WEKA Configuration 4.1 Connect import weka.core.*; /** * Generates code based on the provided arguments, which consist of * classname of a scheme and its options (outputs it to stdout). * The generated code instantiates the scheme and sets its options. * The classname of the generated code is OptionsTest. * * @author FracPete (fracpete at waikato dot ac dot nz) */ public class OptionsToCode { /** * Generates the code and outputs it on stdout. E.g.:
* java OptionsToCode weka.classifiers.functions.SMO -K "weka.classifiers.functions.supportVector.RBFKernel" > OptionsTest.java */ public static void main(String[] args) throws Exception { // output usage if (args.length == 0) { System.err.println("\nUsage: java OptionsToCode [options] > OptionsTest.java\n"); System.exit(1); } // instantiate scheme String classname = args[0]; args[0] = ""; Object scheme = Class.forName(classname).newInstance(); if (scheme instanceof OptionHandler) ((OptionHandler) scheme).setOptions(args);

620 131

International Journal of Engineering Sciences & Emerging Technologies, Oct. 2014. ISSN: 22316604 Volume 7, Issue 2, pp: 604-621 ©IJESET // generate Java code StringBuffer buf = new StringBuffer(); buf.append("public class OptionsTest {\n"); buf.append("\n"); buf.append(" public static void main(String[] args) throws Exception {\n"); buf.append(" // create new instance of scheme\n"); buf.append(" " + classname + " scheme = new " + classname + "();\n"); if (scheme instanceof OptionHandler) { OptionHandler handler = (OptionHandler) scheme; buf.append(" \n"); buf.append(" // set options\n"); buf.append(" cheme.setOptions(weka.core.Utils.splitOptions(\"" + Utils.backQuoteChars(Utils.joinOptions(handler.getOptions())) + "\"));\n"); buf.append(" }\n"); } buf.append("}\n"); // output Java code System.out.println(buf.toString()); } } 4.2 Add arbitrary weights Import weka.core.converters.ConverterUtils.DataSource; import weka.core.converters.XRFFSaver; import weka.core.Instances; import java.io.File; /** * Loads file "args[0]", sets class if necessary (in that case the last * attribute), adds some test weights and saves it as XRFF file * under "args[1]". E.g.:
* AddWeights anneal.arff anneal.xrff.gz * * @author FracPete (fracpete at waikato dot ac dot nz) */ public class AddWeights { public static void main(String[] args) throws Exception { // load data DataSource source = new DataSource(args[0]); Instances data = source.getDataSet(); if (data.classIndex() == -1) data.setClassIndex(data.numAttributes() - 1); // set weights double factor = 0.5 / (double) data.numInstances(); for (int i = 0; i < data.numInstances(); i++) { data.instance(i).setWeight(0.5 + factor*i); } // save data XRFFSaver saver = new XRFFSaver(); saver.setFile(new File(args[1])); saver.setInstances(data); saver.writeBatch(); }}

621 132

The International Journal of Multimedia & Its Applications (IJMA) Vol.6, No.5, October 2014

MINING IN ONTOLOGY WITH MULTI AGENT SYSTEM IN SEMANTIC WEB : A NOVEL APPROACH Vishal Jain1 and Dr. S. V. A. V. Prasad2 1

2

Research Scholar, Lingaya’s University, Faridabad, Haryana, INDIA Professor, Electronics and Communication Engineering Department, Lingaya’s University, Faridabad, Haryana, INDIA

ABSTRACT A large amount of data is present on the web. It contains huge number of web pages and to find suitable information from them is very cumbersome task. There is need to organize data in formal manner so that user can easily access and use them. To retrieve information from documents, there are many Information Retrieval (IR) techniques. Current IR techniques are not so advanced that they can be able to exploit semantic knowledge within documents and give precise results. IR technology is major factor responsible for handling annotations in Semantic Web (SW) languages. With the rate of growth of web and huge amount of information available on the web which may be in unstructured, semi structured or structured form, it has become increasingly difficult to identify the relevant pieces of information on the internet. IR technology is major factor responsible for handling annotations in Semantic Web (SW) languages. Knowledgeable representation languages are used for retrieving information. So, there is need to build an ontology that uses well defined methodology and process of developing ontology is called Ontology Development. Secondly, Cloud computing and data mining have become famous phenomena in the current application of information technology. With the changing trends and emerging of the new concept in the information technology sector, data mining and knowledge discovery have proved to be of significant importance. Data mining can be defined as the process of extracting data or information from a database which is not explicitly defined by the database and can be used to come up with generalized conclusions based on the trends obtained from the data. A database may be described as a collection of formerly structured data. Multi agents data mining may be defined as the use of various agents cooperatively interact with the environment to achieve a specified objective. Multi agents will always act on behalf of users and will coordinate, cooperate, negotiate and exchange data with each other. An agent would basically refer to a software agent, a robot or a human being Knowledge discovery can be defined as the process of critically searching large collections of data with the aim of coming up with patterns that can be used to make generalized conclusions. These patterns are sometimes referred to as knowledge about the data. Cloud computing can be defined as the delivery of computing services in which shared resources, information and software’s are provided over a network, for example, the information super highway. Cloud computing is normally provided over a web based service which hosts all the resources required. As, the knowledge mining is used in many fields of study such as in science and medicine, finance, education, manufacturing and commerce. In this paper, the Semantic Web addresses the first part of this challenge by trying to make the data also machine understandable in the form of Ontology, while Multi-Agent addresses the second part by semi-automatically extracting the useful knowledge hidden in these data, and making it available.

DOI : 10.5121/ijma.2014.6504

45

133


KEYWORDS Information Retrieval, Semantic Web, Ontology, Ontology Mapping, Indexing with Ontology, JADE, MultiAgent Systems.

1. INTRODUCTION A large amount of data is present on the web. It contains huge number of web pages and to find suitable information from them is very cumbersome task. There is need to organize data in formal manner so that user can easily access and use them. To retrieve information from documents, there are many Information Retrieval (IR) techniques. Current IR techniques are not so advanced that they can be able to exploit semantic knowledge within documents and give precise results. IR technology is major factor responsible for handling annotations in Semantic Web (SW) languages [2]. With the rate of growth of web and huge amount of information available on the web which may be in unstructured, semi structured or structured form, it has become increasingly difficult to identify the relevant pieces of information on the internet. The need is to organize this data in a formal system which results in more relevant end useful and structured information. So, it has become necessary for users to utilize automated tools in finding the desired information resources, and to track and analyze their navigation patterns. In this context, the role of user modelling and personalized information access is increasing. Ontology may be a mechanism for obtaining the information on web in a more structured way through Semantic Web. This work focuses on the problem of choosing a representation of documents that can be suitable to induce concept-based user profiles as well as to support a content-based retrieval process. It is at this juncture that the science of Agent Mining comes to rescue. The Semantic Web addresses the first part of this challenge by trying to make the data also machine understandable in the form of Ontology, while Multi-Agent addresses the second part by semi-automatically extracting the useful knowledge hidden in these data, and making it available. As current web 2.0 is shifting towards web 3.0, there is need to upgrade technologies according to the same.

Figure 1: Current Web to Semantic Web

46

134


2. LITERATURE SURVEY Anarosa Alves, Franco Brandão, Viviane Torres da Silva, Carlos José Pereira de Lucena [10], presented a work on a model driven approach to develop multi-Agent Systems. In this paper, authors described a model driven approach to develop multi-agent systems that begins with an ontology based on the TAO conceptual framework. Quynh-Nhu Numi Tran, Graham Low [11], worked on MOBMAS: A methodology for ontologybased multi-agent systems development. In this authors, authors proposed a new framework and compared MOBMAS against sixteen well known methodologies: MaSE, MASSIVE , SODA, GAIA, MESSAGE , Methodology for BDI Agent, INGENIAS, Methodology with High-Level and Intermediate Levels, Methodology for Enterprise Integration, PROMETHEUS, PASSI, ADELFE, COMOMAS , MAS-Common, KADS , CASSIOPEIA and TROPOS. Pakornpong Pothipruk and Pattarachai Lalitrojwong [12], worked on an Ontology-based Multiagent System for Matchmaking. Csongor Nyulas, Martin J. O’Connor, Samson Tu1, David L. Buckeridge, Anna Akhmatovskaia, Mark A. Musen [13], presented their work on An OntologyDriven Framework for Deploying JADE Agent Systems. Authors described a methodology and suite of tools to support the modeling and deployment of agents on the JADE platform. These models are encoded using the Semantic Web ontology language OWL and provide detailed computer-interpretable specifications of agent behavior in a JADE system. Gajun Ganendran, Quynh-Nhu Tran, Pronab Ganguly, Pradeep Ray and Graham Low [14], proposed a methodology on An Ontology-driven Multi-agent approach for Healthcare. In this paper, authors described an ontology-driven multi-agent approach to the development of healthcare systems, with a case study in diabetes management. Wongthongtham, P., Chang, E., Dillon, T.S. [15], worked on Ontology-based Multi-agent system to Multi-site Software Development. Authors described software agent utilized ontology as its intelligence in MSSD and found that it has benefits. Ontology gives computers more knowledge that the agent can utilize Maja Hadzic, Elizabeth Chang [16], worked on use of ontology-based multi-agent systems in the biomedical domain. Authors have shown how the ontologies can be used by multi-agent systems in intelligent information retrieval processes. The ontologies can be used to support some important processes involved in the information retrieval such as posing queries by the user, problem decomposition and task sharing among different agents, result sharing and analysis, information selection and integration, and structured presentation of the assembled information to the user.

3. PROBLEM DEFINITION Internet is considered as repository of collection of web documents. There is plethora of documents available on web are very difficult to access and extract relevant information from them. The reason is uncertainty of huge documents that confuse users by providing several keywords and results in context of given query. The documents may be unstructured, structured and semi-structured. It is known that mostly of these documents are unstructured. Although there is larger number of documents residing on web storage house, still the users are unable to find 47

135


relevant information about given domain. The reason is the uncertainty of documents that creates users in dilemma by providing hundreds of results and keywords in response to given query. When a user wants to find some information, he/she enters a query and results are produced via hyperlinks linked to various documents available on web. But the information that is retrieved to us may or may not be relevant. This irrelevance is caused due to huge collection of documents available on web. Traditional search engines are based on keyword based searching that is unable to transform raw data into knowledgeable representation data. It is a cumbersome task to extract relevant information from large collection of web documents. These shortcomings have led to the concept of Semantic Web and Ontology into existence.

4. INFORMATION RETRIEVAL Information Retrieval involves identifying and extracting relevant pages containing that specific information according to predefined guidelines. There are many IR techniques for extracting keywords like NLP based extraction techniques which are used to search for simple keywords. Then we have Aero Text system for text extraction of key phrases from text documents. Information Retrieval, retrieve only structured data. But IR provides all types of information access such as analysis, organization, storage, searching, retrieval of information and structured data. As according to Slaton’s classic textbook: “Information retrieval is a field concerned with the structured, analysis, organization, storage, searching, and retrieval of information”. Information retrieval is defined by Carlo Meghini et al. “Information Retrieval as the task of identifying documents in a collection on the basis of properties described to the documents by the user requesting the retrieval” . Information retrieval is a process of retrieval-structured data from the unstructured data.

5. SEMANTIC WEB The idea of Semantic Web (SW) as envisioned by Tim Bermers Lee came into existence in 1996 with the aim to translate given information into machine understandable form [7]. The Semantic Web (SW) is an extension of current www in which documents are filled by annotations in machine understandable markup language. It is defined as framework of expressing information to develop various languages and approaches for increasing IR effectiveness. Semantic Web uses documents called Semantic Web Documents (SWD’s) that are written in SW languages like OWL, DAML+OIL. SW is an XML (Extensible Markup Language) application. Its aim is to maintain coordination between users and software agents so that they can find answers to their queries clearly.

6. ONTOLOGY The term ontology can be defined in many different ways. Genersereth and Nilsson defined Ontology as an explicit specification of a set of objects, concepts, and other entities that are presumed to exist in some area of interest and the relationships that hold them [8]. Ontology defines a common vocabulary for researchers who need to share common information in a domain”. It enables the Web for software components can be ideally supported through the use of Semantic Web technologies . This helps in understanding the concepts of the domain as well as helps the machine to interpret the definitions of concepts in the domains and also the relations between them. 48

136


Ontologies can be broadly divided into two main types: lightweight and heavyweight. Lightweight Ontologies involve taxonomy (or class hierarchy) that contains classes, subclasses, attributes and values. Heavy weight Ontologies model domains in a deeper way and include axioms and constraints. Ontology layer consists of hierarchical distribution of important concepts in the domain and describing about the Ontology concepts, relationships and constraints. Figure 1.9 displays the Ontology and its Constituents parts.

Figure 1: “Ontology and its Constituents [3]”

7. MULTI AGENT SYSTEM Multi-agent systems have been successful in the distributed implementation of KM processes. Knowledge acquisition agents have been one of the most successful applications of software agents, specifically in the Internet, where knowledge-collector agents operate within available information resources and validate them in accordance with the users' interests. On the other hand, knowledge transfer lies in an end-to-end routing of knowledge that is generated by some actor, and it is another typical task that has been realized by software agents. Therefore, it is reasonable to approach the multi-agent paradigm for knowledge production. Knowledgeproducing agents need to do formulations in keeping with a validation scheme that supports the knowledge construction. Multi-agent systems can support the coordinated interaction needed to achieve an agreement on the knowledge that is eventually generated. They can also support the validation scheme. Advances in multi-agent systems as an alternative way to build distributed systems have made agents a facilitator of human-computer interaction. Agents have been proven as helpful tools for the coordination of people who are performing a given task. Agent interaction protocols govern the exchange of a series of messages among agents, in accordance with the interaction style among actors. The interaction styles of individual agents can be competitive, cooperative or negotiating. But, both the group behavior and the authoring of knowledge objects are social, group-level issues, more than individual subjects, and it is there where multi-agent systems can give their 49

137


contribution. The architecture presented in this work is a bottom-up, multi-agent approach for knowledge production. Our working hypothesis is that a group of agents can help in the collaborative production of knowledge by coordinating their creation activities. Therefore, different agents can act as representatives of knowledge-producing actors, according to the following principles: •

Agents can be structured into separable knowledge domains of interaction. This structuring reflects the knowledge differences between developers.

•

A dynamic rethinking of the structure of interactions in different domains can help to reduce conflicts during.

8. PROPOSED SOLUTION Semantic Web is a well defined portal that helps in extracting relevant information using many Information Retrieval techniques. Current Information Retrieval techniques are not so advanced that they can be able to exploit semantic knowledge within documents and give precise result. The terms, Information Retrieval, Semantic Web and Ontology are used differently but they are interconnected with each other. Information Retrieval technology and Web based Indexing contributes to existence of Semantic Web. Use of Ontology also contributes in building new generation of web- Semantic Web. With the help of ontologies, make content of web as it will be markup with the help of Semantic Web documents can be possible. Ontology is considered as backbone of Software system. It improves understanding between concepts used in Semantic Web. So, there is need to build an ontology that uses well defined methodology and process of developing ontology is called Ontology Development. Ontology can be used to build structured data and Multi-Agent system for information extraction from the ontology database. In order to realize with Multi Agent Systems and interact with their agents, a framework is developed that is called JADE (Java Agent Development Environment). It is considered as middleware that implies agent platform and development framework. The study of the related works and literature review that in order to retrieve meaningful information from the data warehouse through the help of a multi-agent system and data mining techniques in a cloud computing environment, the following architecture is designed [20]:The Infrastructure-as-a-Service (IaaS) provides a virtual environment with storage and network without having physical hardware. Therefore, Infrastructure cloud computing provides a data warehouse for storage of data for further analysis. The user will submit a flat-text based request on the IaaS for information retrieval from its integrated data warehouse that has gathered data from numerous areas of business to present to the user a wide variety to choose from. The IaaS will forward this request to the MAS to find the information as requested. However, MAS does not have the ability to find the information that has large amounts of data from a data warehouse. Therefore, it will use the data mining algorithm to analyze the large amount of data from the data warehouse that is residing in the IaaS.As a pre-processing stage, the MAS will first develop a target data set which will be large enough to contain all the possible data patterns and send it into he system. Then the processing will begin where the data will be analyzed through anomaly detection, clustering, classification, regression and summarization. The result of the process will be shown on the screen to the user.

50

138


Figure 2: “Information Retrieval through Multi-Agent System with Data Mining in Cloud Computing” [20]

9. TECHNOLOGY STACK 1. OWL: Web ontology languageis used here to represent the university domain including, Universities, Courses, People, Library and Materials. 2. Protégé: A free, open-source ontology editor and framework for building intelligent systems 3. OWLIM: OWLIM is a family of semantic repositories, or RDF database management systems, with the following characteristics: a. native RDF engines, implemented in Java b. delivering full performance through both Sesame and Jena c. robust support for the semantics of RDFS, OWL 2 RL and OWL 2 QL d. best scalability, loading and query evaluation performance 4. WEKA - Apriori Algorithm: Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules. 5. Empire: Empire provides Java developers an easy way to integrate and start using SPARQL & RDF via JPA persistence annotations. 6. JADE: Java Agent DEvelopment Framework, or JADE, is a software for the development of agents, implemented in Java. JADE system supports coordination between several agents FIPA and provides a standard implementation of the communication language FIPA-ACL, which facilitates the communication between agents and allows the services detection of the system

51

139


Figure 3: “Propose Framework”

10. CONCLUSION The information retrieval practical model through the multi-agent system with data mining in a cloud computing environment has been proposed. It is however, recommended that users should ensure that the request made to the IaaS is within the scope of integrated data warehouse and is clear and simple. Thus, making the work for the multi-agent system easier through application of the data mining algorithms to retrieve meaningful information from the data warehouse. In this proposed research model/architecture, the use of cloud computing allows the users to retrieve meaningful information from virtually integrated data warehouse that reduces the costs of infrastructure and storage. In this paper, new proposed model has been proposed; “Mining in Ontology with Multi Agent Systems” has been discussed. In this model, the Semantic Web addresses the first part of this challenge by trying to make the data also machine understandable in the form of Ontology, while Multi-Agent addresses the second part by semi-automatically extracting the useful knowledge hidden in these data, and making it available. This model can be developed, which may give better results as compare to RDBMS. This paper will also helpful to other researchers, who would like to do work in this area.

ACKNOWLEDGEMENTS I, Vishal Jain, would like to give my thanks to Dr. M. N. Hoda, Director, Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi, for giving me opportunity to do Ph.D from Linagay’s University, Faridabad, Haryana.

REFERENCES [1]

Vishal Jain, Gagandeep Singh, Dr. Mayank Singh, “Ontology Development Using Hozo and Semantic Analysis for Information Retrieval in Semantic Web”, 2013 IEEE Second International Conference on Image Information Processing (ICIIP -2013) held on December 9 - 11, 2013,organized 52

140


[2]

[3]

[4]

[5]

[6] [7] [8]

[9] [10] [11]

[12]

[13]

[14]

[15]

by Jaypee University of Information Technology, Waknaghat, Shimla, Himachal Pradesh, INDIA and proceeding published by the IEEE.(Available at: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6707566&url=http%3A%2F%2Fieeexplore.ie ee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6707566) Gagandeep Singh, Vishal Jain, “Information Retrieval through Semantic Web: An Overview”, Confluence 2012, held on 27th and 28th September, 2012 page no.114-118, at Amity School of Engineering & Technology, Amity University, Noida. Vishal Jain, Dr. Mayank Singh, “Ontology Development and Query Retrieval using Protégé Tool”, International Journal of Intelligent Systems and Applications (IJISA), Hongkong, Vol. 5, No. 9, August 2013, page no. 67-75, having ISSN No. 2074-9058, DOI: 10.5815/ijisa.2013.09.08 and index with Thomson Reuters (Web of Science), EBSCO, Proquest, DOAJ, Index Copernicus.(Available at: http://www.mecs-press.org/ijisa/ijisa-v5-n9/IJISA-V5-N9-8.pdf) Vishal Jain, Dr. Mayank Singh, “Ontology Based Information Retrieval in Semantic Web: A Survey”, International Journal of Information Technology and Computer Science (IJITCS), Hongkong, Vol. 5, No. 10, September 2013, page no. 62-69, having ISSN No. 2074-9015, DOI: 10.5815/ijitcs.2013.10.06 and index with Thomson Reuters (Web of Science), EBSCO, Proquest, DOAJ, Index Copernicus Anarosa Alves, Franco Brandão, Viviane Torres da Silva, Carlos José Pereira de Lucena , “A Model Driven Approach to Develop Multi-Agent Systems”, Monografias em Ciência da Computação, No. 09/05. Quynh-Nhu Numi Tran, Graham Low, “MOBMAS: A Methodology for Ontology-Based MultiAgent Systems Development”, Information and Software Technology 50 (2008) 697–722. Pakornpong Pothipruk and Pattarachai Lalitrojwong, “An Ontology-based Multi-agent System for Matchmaking” Csongor Nyulas, Martin J. O’Connor, Samson Tu1, David L. Buckeridge, Anna Akhmatovskaia, Mark A. Musen, “An Ontology-Driven Framework for Deploying JADE Agent Systems”, Stanford University, Stanford Center for Biomedical Informatics Research. Gajun Ganendran, Quynh-Nhu Tran, Pronab Ganguly, Pradeep Ray and Graham Low, “An Ontologydriven Multi-agent approach for Healthcare”, HIC 2002, 0958537097. Wongthongtham, P., Chang, E., Dillon, T.S., “Ontology-based Multi-agent system to Multi-site Software Development”, Workshop QUTE-SWAP@ACM/SIGSOFT-FSE12, November 5, 2004. Maja Hadzic, Elizabeth Chang, “Use Of Ontology-Based Multi-Agent Systems in the Biomedical Domain”, Curtin University of Technology, School of Information Systems Perth, Western Australia, 6845, Australia. Vishal Jain, Dr. Mayank Singh, “Architecture Model for Communication between Multi Agent Systems with Ontology”, International Journal of Advanced Research in Computer Science (IJARCS), Volume.4 No.8, May-June 2013, page no. 86-91 with ISSN No. 0976 – 5697 and indexed with EBSCO HOST Index Copernicus, DOAJ, ICV value is 5.47.(Available at: http://ijarcs.info/?wicket:interface=:2::::) Gagandeep Singh, Vishal Jain, Dr. Mayank Singh, “ An Approach For Information Extraction using Jade: A Case Study”, Journal of Global Research in Computer Science (JGRCS), Vol.4 No. 4 April, 2013, page no. 186-191, having ISSN No. 2229-371X , with impact factor (2012) 0.60.(Available at: http://jgrcs.info/index.php/jgrcs/article/view/640/501) Vishal Jain, Gagandeep Singh, Dr. Mayank Singh, “Implementation of Multi Agent Systems with Ontology in Data Mining”, International Journal of Research in Computer Application and Management (IJRCM) May, 2013 page no. 108-114 having ISSN No. 2231 – 1009 and index with Index Copernicus, DOAJ, J-Gate, Ulrich’s, EBSCO, Poland IC Value is 5.09. (Available at: http://ijrcm.org.in/comapp/index.php?type=Archives) Bouchiha Djelloul, Malki Mimoun, and Mostefai Abd El Kader, “Towards Reengineering Web Applications to Web Services”, The International Arab Journal of Information Technology (IAJIT), Vol. 6, No. 4, October 2009

53

141

The International Journal of Multimedia & Its Applications (IJMA) Vol.6, No.5, October 2014 [16] Sidi Benslimane, Mimoun Malki, and Djelloul Bouchiha, “Deriving Conceptual Schema from Domain Ontology: A Web Application Reverse Engineering Approach”, The International Arab Journal of Information Technology (IAJIT), Vol. 7, No. 2, April 2010 [17] Accessible from Mr.Lobo L.M.R.J, Sunita B Aher, “Data Mining in Educational System using WEKA”, “International Conference on Emerging Technology Trends (ICETT)”, 2001. [18] Accessible from Sonali Agarwal, Neera Singh, Dr. G.N. Pandey, “Implementation of Data Mining and Data Warehouse in E-Governance”, “International Journal of Computer Applications (IJCA) (0975-8887), Vol.9-No.4,” November 2010. [19] Fensel, Dieter. “Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce”. Heidelberg: Springer, 2003. [20] Vishal Jain, Mahesh Kumar Madan, “Information Retrieval through Multi-Agent System with Data Mining in Cloud Computing”, International Journal of Computer Technology and Applications (IJCTA) Volume 3 Issue 1, January-February 2012, page no. 62-66. [21] N Sivaram, K.Ramar, “Applicability of Clustering and Classification Algorithms for Recruitment Data Mining”, “International Journal of Computer Applications (IJCA), Vol.4, No-5”, March 2010. [22] Ferenc Bodon, “A fast Apriori implementation”, “In Proceedings of the IEEE ICDM Workshop on Frequent Item set Mining Implementations”, 2003. [23] Tan Pang-Ning, “An Introduction to Data Mining”. “Pearson Education”, 2007. [24] U.K. Pandey, and S.Pal, “Data Mining: A Prediction of performer using Classification”, “International Journal of Computer Science and Information Technology (IJCSIT), Vol.2(2), IISN:0975-9646”, pages 686-690, 2011. [25] K Saravana Kumar, R. Manicka Chezian, “A Survey on Association Rule Mining using Apriori Algorithm”, “International Journal of Computer Applications (IJCA), Vol.45-Number 5”, 2012. [26] Amirmahadi Mohammadighavam, Neda Rajabpour, Ali Naserasadi, “A Survey on Data Mining Approaches”, “International Journal of Computer Applications (IJCA) (0975-8887) Vol.36-Number 6”, 2011. [27] Dhanashree S. Deshpande, “A Survey on Web Data Mining Applications”, “International Journal of Computer Applications (IJCA), ETCSIT- Number 3”, 2012. [28] Antonopoulos, Nick and Lee Gillam. “Cloud Computing: Principles, Systems and Applications”. New York: Springer, 2010.

AUTHORS Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing P.hD from Computer Science and Engineering Department, Lingaya’s University, Faridabad. Presently he is working as Assistant Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE. Dr. S. V. A. V. Prasad has completed M.Tech., Ph.D. Presently working as professor and Dean (Academic Affairs), Dean (R&D, IC), Dean School of Electrical Sciences. Sir has actively participated and organized many refresher courses, seminars, workshops on ISO, ROHS, Component Technology, WEEE, Organizational Methods, Time Study, Productivity enhancement and product feasibility etc. Sir has developed various products like 15 MHz dual Oscilloscope, High Voltage Tester, VHF Wattmeter, Standard Signal, Generator with AM/FM Modulator, Wireless Becom, High power audio Amplifier, Wireless microphone and many more in the span of 25 years (1981 – 2007). Sir has awarded for excellence in R&D in year 1999, 2004 and National Quality Award during the Year 1999, 2000, 2004 and 2006.Sir is Fellow member of IEEE and life member of ISTE, IETE, and Society of Audio & Video System. Sir has published more than 90 research papers in various National & International conferences and journals. Sir research area includes Wireless Communication, Satellite Communication & Acoustic, Antenna, Neural Networks, and Artificial Intelligence. 54

142

Volume 4, Issue 8, August 2014

ISSN: 2277 128X

International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com

Ontology Based Information Retrieval Model in Semantic Web: A Review Dr. S. V. A. V. Prasad Professor, Electronics and Communication Department, Lingaya’s University, Faridabad, India

Vishal Jain Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad, India

Abstract— World Wide Web is the largest database in the Universe which is mostly understandable by human users and not by machines. It lacks the existence of a semantic structure which maintains interdependency of its components. Presently, search on web is keyword based i.e., information is retrieved on the basis of text search of all available matching URL’s / hyperlinks. This may result in the presentation of irrelevant information to the user. In the current web, resources are accessible through hyperlinks to web content spread throughout the world. These links make the physical connections and are not understood by the machines. So there is a lack of relationships which captures the meaning of the links for the machines to understand. The explosion of unstructured data on the world wide-web has generated significantly further interest in the extraction problem, and helped position it as central research goal in the Database, Artificial Intelligence, Data Mining, Information Extraction, Natural Language Processing, and Web communities. Hence information extraction is a logical step to retrieve structured data and the extracted information. Information retrieval is synonymous with “determination of relevance”. Information retrieval is described as the task of identifying documents in the collection on the basis of properties approved to the documents by the user requesting the retrieval. This paper presents the literature review Semantic Web Mining, our vision combining the three research areas Semantic Web, Mining and Multi Agent Systems. The basic idea is to improve the results of Mining by exploiting semantic structures in the Web and to make use of Mining techniques with Multi Agent Systems for building the Semantic Web. Here, presented techniques and tools for supporting these tasks. The research describe focuses on using semantics to understand Web navigation data, on using this knowledge for evaluating and improving Web sites and services, and on using mining for helping distributed content generation. Keywords— Semantic Web, Ontology, Multi Agent Systems, Mining, Information Retrieval I. ONTOLOGY BASED INFORMATION RETRIEVAL MODEL Latifur Khan, Dennis McLeod, Eduard Hovy [11] worked on the key problem in achieving efficient and user friendly retrieval is the development of a search mechanism to guarantee delivery of minimal irrelevant information (high precision) while insuring relevant information is not overlooked (high recall). To achieve this, they proposed a potentially powerful and novel approach for the retrieval of audio information. In their research they explained the development of an ontology-based model for the generation of metadata for audio, and the selection of audio information in a user customized manner. Also conclude how the ontology they proposed can be used to generate information selection requests in database queries. Vaclav Snasel, Pavel Moravec, Jaroslav Pokorny [12] presented a basic method of mapping LSI concepts on given ontology (WordNet), used both for retrieval recall improvement and dimension reduction. They offered experimental results for this method on a subset of TREC collection, consisting of Los Angeles Times articles. In their research they had shown, that mapping terms on WordNet hypernyms improves recall, bringing more relevant documents. The LSI filtration enhances recall even more, producing smaller index, too. The question is, whether use expensive method as LSI just for the term filtration. The third approach – using LSI on generated hypernym-bydocument matrix has yet to be tested. Sofia Stamou [13] had discussed keyword-based searching does not always result to the retrieval of qualitative data, basically due to the variety in the vocabulary used to convey alike information. In this paper, introduce a conceptbased retrieval model, which tackles vocabulary mismatches through the use of domain-dependent ontologies. In particular, our model explores the information encoded in domain ontologies for indexing documents according to their semantics rather than wordforms. To demonstrate the potential of proposed model built an experimental prototype which employs the topical ontologies for indexing Web documents in terms of their semantics. Zeng Dan [14] worked on Semantic Information Retrieval Based on Ontology to resolve the problem of the accuracy on traditional information retrieval, which brings ontology-based semantic information retrieval. The anthor wlilized the method of establishing the domain semantic model with ontology technology, the membership of concept added to the process of semantic modeling, and to provide semantic annotation to facilitate computer calculation processing. Qin Zhana Xia Zhang, Deren Li [15], proposed a approach to overcome the problems of semantic heterogeneity, the explication of knowledge by means of ontology, which can be used for the identification and association of semantically corresponding concepts © 2014, IJARCSSE All Rights Reserved

143

Page | 837

Jain et al., International Journal of Advanced Research in Computer Science and Software Engineering 4(8), August - 2014, pp. 837-842 because ontology can explicitly and formally represent concepts and relationships between concepts and can support semantic reasoning according to axioms in it. Ontology has been developed in the context of Artificial Intelligent (AI) to facilitate knowledge sharing and reuse. In this paper, an ontology-based semantic description model is put forward to explicitly represent geographic information semantics in abstract level and concrete level by introducing Ontologies. Sylvie Ranwez, Vincent Ranwez, Mohameth-François Sy, Jacky Montmain, Michel Crampes [16], in this paper described a request method and an environment based on aggregating models to assess the relevance of documents annotated by concepts of ontology. The selection of documents was then displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user’s query; this man/machine interface favored a more interactive exploration of data corpus. The RSV decomposition described in this paper is a good example of the benefit of simultaneously considering two related problems: i) how to rate documents w.r.t. a query ii) how to provide users feedback concerning rating of the documents. Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen Algarni [17], introduced a novel ontology-based approach in terms of a world knowledge base in order to construct personalized ontologies for identifying adequate concept levels for matching user search intents. An iterative mining algorithm is designed for evaluating potential intents level by level until meeting the best result. David Vallet, Miriam Fernández, and Pablo Castells [18], given an approach that can be seen as an evolution of the classic vector-space model, where keyword-based indices are replaced by an ontology-based KB, and a semi-automatic document annotation and weighting procedure is the equivalent of the keyword extraction and indexing process. This models shown that it is possible to develop a consistent ranking algorithm on this basis, yielding measurable improvements with respect to keyword based search, subject to the quality and critical mass of metadata. Proposed Model is an adaptation of the vector-based ranking model that takes advantage of an ontology based knowledge representation. Axel Reymonet, Jerome Thomas, Nathalie Aussenac-Gilles [19], presented a semantic search engine designed to handle within two separate tools both aspects of semantic IR: semantic indexing and semantic search. search engine only exploits knowledge explicitly mentioned in each request/document, the ability to express causal information in OWL could be taken into account in order to bring closer two symptoms apparently different but which share one (or more) fault(s) as potential origin for a given breakdown. Gaihua Fu, Christopher B. Jones and Alia I. Abdelmoty [20], the query expansion techniques presented in this paper are based on both a domain and a geographical ontology. Different from term-based query expansion techniques, the proposed techniques expand a query by trying to derive its geographical query footprint, and it is specially designed to resolve a spatial query. Various factors, such as types of spatial terms as encoded in the geographical ontology, types of non-spatial terms as encoded in the domain ontology, the semantics of the spatial relationships, their context of use, and satisfiability of initial search result, are taken into account to support expansion of a spatial query. The proposed techniques support the intelligent, flexible treatment of a spatial query when a fuzzy spatial relationship is involved. Some experiments have been carried out to evaluate the performance of the proposed techniques using sample realistic ontologies. Jan Paralic, Ivan Kostial [21], in the proposed model, a new, ontology-based approach to information retrieval (IR) is presented. The system is based on a domain knowledge representation schema in form of ontology. New resources registered within the system are linked to concepts from this ontology. In such a way resources may be retrieved based on the associations and not only based on partial or exact term matching as the use of vector model presumes. The ontology-based retrieval mechanism has been compared with traditional full text search based on vector IR model as well as with the Latent Semantic Indexing method. Zhanjun Li and Karthik Ramani [22], in this paper, described a framework for design information extraction and retrieval that aims at being effective with respect to the content-bearing phrases encountered in unstructured and textual design documents. The centerpiece of our method is the layered design ontology model, where the application ontology is automatically acquired using a shallow natural language processing technique as well as the taxonomies defined in the domain ontology. Sylvie Ranwez, Benjamin Duthil, Mohameth Francois Sy, Jacky Montmain, Patrick Augereau and Vincent Ranwez [23], shown that how can ontology based information retrieval systems may benefit from lexical text analysis and proposed a model, CoLexIR approach ((Conceptual and Lexical Information Retrieval). In CoLexIR visualization interface, retrieved documents are displayed in a semantic map and placed according to their relevance score w.r.t. the query represented as a probe (symbolized as a question mark). The result explanation focuses on both conceptual and passage levels. The higher the score, the closer the document is to the query probe in the semantic map. Stuart Aitken and Sandy Reid in this paper [24], evaluated the use of an explicit domain ontology in an information retrieval tool. The evaluation compares the performance of ontology-enhanced retrieval with keyword retrieval for a fixed set of queries across several data sets. The robustness of the IR approach is assessed by comparing the performance of the tool on the original data set with that on previously unseen data. The empirical evaluation of ontology-based retrieval in CB-IR has broadly confirmed the hypotheses about relative and absolute performance of the system and about the adequacy and robustness of the ontology. Asunción Gómez-Pérez, Fernando Ortiz-Rodríguez, Boris Villazón-Terrazas [25], worked on “Ontology-Based Legal Information Retrieval to Improvethe Information Access in e-Government”. In this paper, approach to an ontology-based legal IR, which aims to retrieve government documents in a timely an accurate way. Pablo Castells, Miriam Fernández, and David Vallet [26], proposed an approach for the adaptation of the Vector-Space Model for Ontology-Based Information Retrieval”. Approach could be seen as an evolution of the classic vector-space model, where keyword-based indices are replaced by an ontology-based KB, and a semi-automatic document annotation and weighting procedure is the equivalent of the keyword extraction and indexing process. In this model, shows that it is possible to develop a consistent ranking algorithm on this basis, yielding measurable improvements with respect to keyword-based search, subject to the quality and critical mass of metadata. Jouni Tuominen, Tomi Kauppinen, Kim Viljanen, and Eero Hyvonen [27], proposed an approach on Ontology-Based Query © 2014, IJARCSSE All Rights Reserved

144

Page | 838

Jain et al., International Journal of Advanced Research in Computer Science and Software Engineering 4(8), August - 2014, pp. 837-842 Expansion Widget for Information Retrieval. In this paper, they have implemented a web widget providing query expansion functionality to web-based systems as an easily integrable service with no need to change the underlying system. The widget uses ontologies to expand the query terms with semantically related concepts. The widget extends the previously developed ONKI Selector widget, which is used for selecting concepts especially for annotation purposes. There are various Ontology based information retrieval methods [28] to search information with enhanced semantics from the user query input to retrieve high relevant information: Vector Space Based Information Retrieval, Probabilistic Information Retrieval, Context Based Information Retrieval, Semantic Based Information Retrieval, Semantic Similarity Based Information Retrieval, Semantic Association Based Information Retrieval and Semantic Annotation Based Information Retrieval. Swathi Rajasurya , Tamizhamudhu Muralidharan , Sandhiya Devi and Dr. S. Swamynathan [29] proposed system Semantic Information Extraction in University Domain(SIEU) is designed. SIEU retrieves the semantically relevant results for the user query by considering the semantics and context of the query. In this thesis, proposed two models, first is on “Ontology Based Information Retrieval through Multi agent System” for content retrieval from the web .Stages involved in development of first model further divided into four parts: Ontology Development, Ontology Mapping, Ontology Mining and Ontology with Multi-Agent system. Literature of each stage classified and explained in next section. II. ONTOLOGY DEVELOPMENT Dongpo Deng [31], in this study, author reported the experience of creating the ontology of place name serving as a specification of domain knowledge, as well as used the ontology of place-name to information retrieval. The results show the geographic ontology can to rid of ambiguous of geospatial data. It is a common situation that a place name refers to different places and a place has different names. The ontology of place name might be a useful solution to provide exact result in the Web application. However, the ontology of place name built by feature type might solve the terminology problem of place name, but doesn’t figure out the spatial nature of place name. Christopher S.G. Khoo, Jin-Cheon Na, Vivian Wei Wang, and Syin Chan [32], in this a disease-treatment ontology developed to model and represent treatment information found in the abstracts of medical articles. In this paper, described the preliminary version of the disease treatment ontology that can be developed to encode treatment information reported in medical abstracts in the medicine database. The ontology was developed from an analysis of 40 abstract in the domain of colon cancer therapy. Valentina Cordi, Viviana Mascardi, Maurizio Martelli, Leon Sterling [33], discussed a framework for evaluating and comparing methodologies for ontology development and its application to the evaluation of three existing methodologies. The framework is characterised by a domain-independent step and by an application-driven step. It has been adopted to analyse and compare three methodologies, the “Ontology Development 101” methodology, the “Unified Methodology” and EXPLODE, in respect to the analysis, design, verification and implementation of an ontology for content-based retrieval of XML documents. Naveen Malviya, Nishchol Mishra, Santosh Sahu [34], in this paper authors explained the terms of university through university ontology. Also focused on creating the university ontology with the help of protégé tool. Rajiv Gandhi Technical University Bhopal, India had been taken an example for the ontology development and various aspects like: super class and subclass hierarchy, creating a subclass instances for class illustration, query retrieval process visualization view and graph view have been demonstrated. Lakshmi Palaniappan, N. Sambasiva Rao, G. V. Uma [35], described how ontologies can then be of help to the user in formulating the information need, the query, and the answers. As a proof of the concept, have shown photos of restaurant. In this system, images are annotated according to ontologies when generating answers to the queries, the ontology combined with the image data also facilitates. This paper showed that ontologies can be used not only for annotation and precise information retrieval , but also for helping the user in formulating the information need and the corresponding query. This is important in applications such as the promotion exhibition, where the domain semantics are complicated and not necessarily known to the user. Norasykin Mohd Zaid and Sim Kim Lau [36], described the “Development of Ontology Information Retrieval System for Novice Researchers in Malaysia”. In this paper, authors have proposed a framework that shows that ontology approach can help novice researchers to apply semantic search techniques to improve current search capabilities. Preliminary user interface with simple ontology has been designed. Sanjay K. Dwivedi and Anand Kumar [37], reveals the conceptualization of university knowledge through construction of university ontology. Generalized structure of Indian universities and workflow processes have been taken for ontology development by describing the class hierarchy, and demonstrate the graphical view of ontology. Authors also demonstrated the ability of university ontology to execute intelligent query to retrieve the information. III. ONTOLOGY MINING Amel Grissa Touzi, Hela Ben Massoud and Alaya Ayadi [38], presented their work on automatic ontology generation for data mining using FCA and clustering. In this paper, authors have combined Clustering, FCA and Ontology in order to improve it. For this, proposed a new approach for automatic generation of Fuzzy Ontology of Data Mining, called FODM. C. Antunes [39], discussed an ontology-based framework for mining patterns in the presence of background knowledge. Authors presented a framework that incorporates background knowledge in the core of the mining process, by using domain ontologies and by defining a set of constraints above them, which can guide the discovery process. Sachin Singh, Pravin Vajirkar, and Yugyung Lee [40], discussed an approach on Context-aware Data Mining using Ontologies. Here, authors introduced a context aware data mining framework which provides accuracy and efficacy to data mining outcomes. Context factors were modeled using Ontological representation. Majigsuren Enkhsaikhan, WilsonWong Wei Liu, Mark Reynolds [41], worked on Measuring Data-Driven Ontology Changes using © 2014, IJARCSSE All Rights Reserved

145

Page | 839

Jain et al., International Journal of Advanced Research in Computer Science and Software Engineering 4(8), August - 2014, pp. 837-842 Text Mining. In this paper, authors have presented an approach for semi-automatically generating ontology concept clusters at different time periods, measuring and visualising the changes in sensible ways to help understand the overall concept changes as well as the individual terms contributed to the change. Edmar Augusto Yokome e Flávia Linhalis Arantes [42], worked on Meta-DM: An ontology for the data mining domain. In this paper, authors presented the development of domain ontology for data mining. The main result present is the Meta-DM ontology, its conceptualization and implementation. Meta- DM intends to supply a common terminology that can be shared and understood by data mining tools. IV. ONTOLOGY WITH MULTI AGENT SYSTEMS Anarosa Alves, Franco Brandão, Viviane Torres da Silva, Carlos José Pereira de Lucena [43], presented a work on a model driven approach to develop multi-Agent Systems. In this paper, authors described a model driven approach to develop multi-agent systems that begins with an ontology based on the TAO conceptual framework. Quynh-Nhu Numi Tran, Graham Low [44], worked on MOBMAS: A methodology for ontology-based multi-agent systems development. In this authors, authors proposed a new framework and compared MOBMAS against sixteen well known methodologies: MaSE, MASSIVE , SODA, GAIA, MESSAGE , Methodology for BDI Agent, INGENIAS, Methodology with HighLevel and Intermediate Levels, Methodology for Enterprise Integration, PROMETHEUS, PASSI, ADELFE, COMOMAS , MAS-Common, KADS , CASSIOPEIA and TROPOS. Pakornpong Pothipruk and Pattarachai Lalitrojwong [45], worked on an Ontology-based Multi-agent System for Matchmaking. Csongor Nyulas, Martin J. O’Connor, Samson Tu1, David L. Buckeridge, Anna Akhmatovskaia, Mark A. Musen [46], presented their work on An Ontology-Driven Framework for Deploying JADE Agent Systems. Authors described a methodology and suite of tools to support the modeling and deployment of agents on the JADE platform. These models are encoded using the Semantic Web ontology language OWL and provide detailed computer-interpretable specifications of agent behavior in a JADE system. Gajun Ganendran, Quynh-Nhu Tran, Pronab Ganguly, Pradeep Ray and Graham Low [47], proposed a methodology on An Ontology-driven Multi-agent approach for Healthcare. In this paper, authors described an ontologydriven multi-agent approach to the development of healthcare systems, with a case study in diabetes management. Wongthongtham, P., Chang, E., Dillon, T.S. [48], worked on Ontology-based Multi-agent system to Multi-site Software Development. Authors described software agent utilized ontology as its intelligence in MSSD and found that it has benefits. Ontology gives computers more knowledge that the agent can utilize. Maja Hadzic, Elizabeth Chang [49], worked on use of ontology-based multi-agent systems in the biomedical domain. Authors have shown how the ontologies can be used by multi-agent systems in intelligent information retrieval processes. The ontologies can be used to support some important processes involved in the information retrieval such as posing queries by the user, problem decomposition and task sharing among different agents, result sharing and analysis, information selection and integration, and structured presentation of theassembled information to the user. V. CONCLUSIONS As per literature studied, Semantic Web and Ontology development have many benefits in the information retrieval area. Many researchers have worked on different technologies of Semantic Web and implemented on particular domain. In this paper, various approaches of Ontology Based Information Retrieval Model have been discussed. A new model can be defined with the use of Mining in Ontology with Multi Agent system for information retrieval, whereas ontology can be use as a repository, mining for data extraction and multi agent system can be use for data representation. This paper will also helpful to other researchers, who would like to do work in this area. ACKNOWLEDGMENT I, Vishal Jain, would to give my sincere thanks to Dr. M. N. Hoda, Director, Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi, for giving me opportunity to do Ph.D from Linagay’s University, Faridabad.. REFERENCES [1] Gagandeep Singh, Vishal Jain, “Information Retrieval through Semantic Web: An Overview”, Confluence 2012, held on 27th and 28th September, 2012 page no.114-118. [2] Vishal Jain, Dr. Mayank Singh, “Ontology Based Information Retrieval in Semantic Web: A Survey”, International Journal of Information Technology and Computer Science (IJITCS), Hongkong, Vol. 5, No. 10, September 2013, page no. 62-69. [3] Vishal Jain, Dr. Mayank Singh, “Ontology Development and Query Retrieval using Protégé Tool”, International Journal of Intelligent Systems and Applications (IJISA), Hongkong, Vol. 5, No. 9, August 2013, page no. 6775. [4] Accessible from T.Berners Lee. “The Semantic Web”. “Scientific American”, May 2007. [5] Urvi Shah, James Mayfield,” Information Retrieval on the Semantic Web”, “ACM CIKM International Conference on Information Management”, Nov 2002. [6] http://www.mpiinf.mpg.de/departments/d5/teaching/ss03/xmlseminar/talks/CaiEskeWang.pdf. [7] Berners Lee, J.Lassila, “Ontologies in Semantic Web”, “Scientific American”, May (2001) 34-43 © 2014, IJARCSSE All Rights Reserved

146

Page | 840

[8] [9] [10]

[11] [12] [13] [14] [15]

[16]

[17] [18] [19] [20] [21] [22]

[23]

[24] [25]

[26] [27] [28]

[29]

[30]

[31] [32]

[33] [34]

Jain et al., International Journal of Advanced Research in Computer Science and Software Engineering 4(8), August - 2014, pp. 837-842 David Vallet, M.Fernandes, “An Ontology-Based Information Retrieval Model”, “European Semantic Web Symposium (ESWS)”, 2006. http://www.daml.org/ontologies. Tim Finin, Anupam Joshi, Vishal Doshi, “Swoogle: A Semantic Web Search and Metadata Engine”, “In proceedings of the 13th international conference on Information and knowledge management”, pages 461-468, 2004. Latifur Khan, Dennis McLeod, Eduard Hovy, “Retrieval Effectiveness of an Ontology-Based Model for Information Selection”, Vaclav Snasel, Pavel Moravec, Jaroslav Pokorny, “WordNet Ontology Based Model for Web Retrieval”, Sofia Stamou, “Retrieval Effectiveness Of An Ontology-Based Model For Conceptual Indexing”, Zeng Dan, “Research on Semantic Information Retrieval Based on Ontology”, Proceedings of the 7th International Conference on Innovation & Management, Page No. 1582-1586. Qin Zhana Xia Zhang, Deren Li, “Ontology-Based Semantic Description Model For Discovery And Retrieval Of Geo-Spatial Information”, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B4. Beijing 2008, Page 141-146. Sylvie Ranwez, Vincent Ranwez, Mohameth-François Sy, Jacky Montmain, Michel Crampes, “User Centered and Ontology Based Information Retrieval System for Life Sciences”, BMC Bioinformatics 2012, 13(Suppl 1):S4, http://www.biomedcentral.com/1471-2105/13/S1/S4 Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen Algarni, “An Ontology-based Mining Approach for User Search Intent Discovery”, David Vallet, Miriam Fernández, and Pablo Castells, “An Ontology-Based Information Retrieval Model”, Axel Reymonet, Jerome Thomas, Nathalie Aussenac-Gilles, “Ontology Based Information Retrieval:an application to automotive diagnosis”, Gaihua Fu, Christopher B. Jones and Alia I. Abdelmoty, “Ontology-based Spatial Query Expansion in Information Retrieval”, Jan Paralic, Ivan Kostial, “Ontology-based Information Retrieval”, EC funded project IST-1999-20364, Webocracy (Web Technologies Supporting Direct Participation in Democratic Processes) Zhanjun Li and Karthik Ramani, “Ontology-Based Design Information Extraction and Retrieval”, Artificial Intelligence for Engineering Design, Analysis and Manufacturing ~2007!, 21, 137–154. Printed in the USA, Cambridge University Press 0890-0604007, DOI: 10.10170S0890060407070199 Sylvie Ranwez, Benjamin Duthil, Mohameth François Sy, Jacky Montmain, Patrick Augereau and Vincent Ranwez, “How ontology based information retrieval systems may benefit from lexical text analysis”, "New Trends of Research in Ontologies and Lexical Resources, Oltramari, Alessandro; Vossen, Piek; Qin, Lu; Hovy, Eduard (Ed.) (2013) 209-230". Stuart Aitken and Sandy Reid, “Evaluation of an Ontology-BasedInformation Retrieval Tool”, Asunción Gómez-Pérez, Fernando Ortiz-Rodríguez, Boris Villazón-Terrazas, “Ontology-Based Legal Information Retrieval to Improve the Information Access in e-Government”, WWW 2006, May 23–26, 2006, Edinburgh, Scotland. ACM 1-59593-323-9/06/0005. Pablo Castells, Miriam Fernández, and David Vallet, An Adaptation of the Vector-Space Model for OntologyBased Information Retrieval”, TKDE-0456-1005 .1 Jouni Tuominen, Tomi Kauppinen, Kim Viljanen, and Eero Hyvonen, “Ontology-Based Query Expansion Widget for Information Retrieval”, Sakthi Murugan R, P. Shanthi Bala and Dr. G. Aghila, “ Ontology Based Information Retrieval - An Analysis”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 10, October 2013, page no. 486 to 493. Swathi Rajasurya , Tamizhamudhu Muralidharan , Sandhiya Devi and Dr. S. Swamynathan, “Semantic Information Retrieval Using Ontology In University Domain”, International Journal of Web & Semantic Technology (IJWesT), Taiwan, Vol.3, No.4, October 2012. Zhanjun Li, Victor Raskin and Karthik Ramani, A Methodology Of Engineering Ontology Development For Information Retrieval”, International Conference On Engineering Design, ICED’07, 28 - 31 August 2007, Cite Des Sciences Et De L'industrie, Paris, France. Dongpo Deng, “Building Ontology of Place Name for Spatial Information Retrieval”, http://www.gisdevelopment.net/technology/gis/ma07259pf.html Christopher S.G. Khoo, Jin-Cheon Na, Vivian Wei Wang, and Syin Chan, “Developing an Ontology for Encoding Disease Treatment Information in Medical Abstracts”, DESIDOC Journal of Library & Information Technology, Vol. 31, No. 2, March 2011, pp. 103-115. Valentina Cordi, Viviana Mascardi, Maurizio Martelli, Leon Sterling, “Developing an Ontology for the Retrievalof XML Documents: A Comparative Evaluation of Existing Methodologies”, Naveen Malviya, Nishchol Mishra, Santosh Sahu, “Developing University Ontology using protégé OWL Tool: Process and Reasoning”, International Journal of Scientific & Engineering Research Volume 2, Issue 9, September-2011.

© 2014, IJARCSSE All Rights Reserved

147

Page | 841

[35]

[36]

[37] [38] [39] [40] [41]

[42] [43] [44] [45] [46]

[47] [48] [49]

Jain et al., International Journal of Advanced Research in Computer Science and Software Engineering 4(8), August - 2014, pp. 837-842 Lakshmi Palaniappan, N. Sambasiva Rao, G. V. Uma, “Development of Dining Ontology Based On Image Retrieval”, International Journal of Scientific Engineering and Technology (ISSN : 2277-1581), Volume No.2, Issue No.6, pp : 560-566 Norasykin Mohd Zaid and Sim Kim Lau, “Development of OntologyInformation Retrieval System for Novice Researchers in Malaysia”, IBIMA Publishingn Journal of Software and Systems Development, Vol. 2011 (2011), Article ID 611355, 11 pagesDOI: 10.5171/2011.611355 ttp://www.ibimapublishing.com/journals/JSSD/jssd.html Sanjay K. Dwivedi and Anand Kumar, “Development of University Ontology for a SPOCMS”, Journal Of Emerging Technologies In Web Intelligence, Vol. 5, No. 3, August 2013, Page no 213-221. Amel Grissa Touzi, Hela Ben Massoud and Alaya Ayadi “Automatic Ontology Generation For Data Mining Using FCA And Clustering”, Ecole Nationale d’Ingénieurs de Tunis, Tunisia. C. Antunes “An Ontology-Based Framework For Mining Patterns In the Presence of Background Knowledge”, Instituto Superior Técnico / Technical University of Lisbon, Portugal. Sachin Singh, Pravin Vajirkar, and Yugyung Lee [55], “An Approach On Context-Aware Data Mining Using Ontologies”, School of Computing and Engineering, University of Missouri–Kansas City Majigsuren Enkhsaikhan, WilsonWong Wei Liu, Mark Reynolds [56], worked on “Measuring Data-Driven Ontology Changes using Text Mining”. Sixth Australasian Data Mining Conference (AusDM 2007), Gold Coast, Australia. Edmar Augusto Yokome e Flávia Linhalis Arantes, “Meta-DM: An ontology for the data mining domain”, Revista de Sistemas de Informacao da FSMA n. 8 (2011) pp. 36-45 Anarosa Alves, Franco Brandão, Viviane Torres da Silva, Carlos José Pereira de Lucena , “A Model Driven Approach to Develop Multi-Agent Systems”, Monografias em Ciência da Computação, No. 09/05. Quynh-Nhu Numi Tran, Graham Low, “MOBMAS: A Methodology for Ontology-Based Multi-Agent Systems Development”, Information and Software Technology 50 (2008) 697–722. Pakornpong Pothipruk and Pattarachai Lalitrojwong, “An Ontology-based Multi-agent System for Matchmaking” Csongor Nyulas, Martin J. O’Connor, Samson Tu1, David L. Buckeridge, Anna Akhmatovskaia, Mark A. Musen, “An Ontology-Driven Framework for Deploying JADE Agent Systems”, Stanford University, Stanford Center for Biomedical Informatics Research. Gajun Ganendran, Quynh-Nhu Tran, Pronab Ganguly, Pradeep Ray and Graham Low, “An Ontology-driven Multi-agent approach for Healthcare”, HIC 2002, 0958537097. Wongthongtham, P., Chang, E., Dillon, T.S., “Ontology-based Multi-agent system to Multi-site Software Development”, Workshop QUTE-SWAP@ACM/SIGSOFT-FSE12, November 5, 2004. Maja Hadzic, Elizabeth Chang, “Use Of Ontology-Based Multi-Agent Systems in the Biomedical Domain”, Curtin University of Technology, School of Information Systems Perth, Western Australia, 6845, Australia. AUTHORS Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing P.hD from Computer Science and Engineering Department, Lingaya’s University, Faridabad. Presently he is working as Assistant Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE. Dr. S. V. A. V. Prasad has completed M.Tech., Ph.D. He is working as professor and Dean (Academic Affairs), Dean (R&D, IC), Dean School of Electrical Sciences, Lingaya’s University, Faridabad. Sir has actively participated and organized many refresher courses, seminars, workshops on ISO, ROHS, Component Technology, WEEE, Organizational Methods, Time Study, Productivity enhancement and product feasibility etc. Sir has developed various products like 15 MHz dual Oscilloscope, High Voltage Tester, VHF Wattmeter, Standard Signal, Generator with AM/FM Modulator, Wireless Becom, High power audio Amplifier, Wireless microphone and many more in the span of 25 years (1981 – 2007). Sir has awarded for excellence in R&D in year 1999, 2004 and National Quality Award during the Year 1999, 2000, 2004 and 2006.Sir is Fellow member of IEEE and life member of ISTE, IETE, and Society of Audio & Video System. Sir has published more than 90 research papers in various National & International conferences and journals. Sir research area includes Wireless Communication, Satellite Communication & Acoustic, Antenna, Neural Networks, and Artificial Intelligence.

© 2014, IJARCSSE All Rights Reserved

148

Page | 842

International Journal of Sciences: Basic and Applied Research (IJSBAR) ISSN 2307-4531 (Print & Online) http://gssrr.org/index.php?journal=JournalOfBasicAndApplied

---------------------------------------------------------------------------------------------------------------------------

Role of Ontology with Multi-Agent System in Cloud Computing Vishal Jain*a, Dr. S. V. A. V. Prasadb a b

Research Scholar, Lingaya’s University, Faridabad, Haryana

Professor, Electronics and Communication Department, Lingaya’s University, Faridabad, Haryana a

Email: [email protected]

b

Email: [email protected]

Abstract Information technology is playing a major role in revolutionizing how organizations operate, manage, as well as automate their processes. However, most of the systems today are not reusable because there is mixing the knowledge of the society and that of the processes. This is because the knowledge of societies is different from each other applications; hence, it is not reusable. This paper will address how dependent the applications are on societies, and it will separately define the processes of ontology, the knowledge of the agent, ontology of society, and the knowledge of the society [1]. This will be an introduction of ontology-based, process oriented, and an agent system that is independent of society that allows most if not all organizations to make use of it. This is by defining, as well as importing the ontology of the society and some process patterns, which may be instantiated from the ontology of the process into the system. This proposed system can be used on the platform of cloud computing. The evaluation is from two different perspectives: the quality of making use of the cohesion and the coupling measures. Coupling measures entails measuring the degree to which the system will focus on solving a problem in particular. Secondly, it focuses on the applicability, which is determined by evaluating how manageable and automobile the seven processes from three different societies are [2]. Keywords: Semantic Web; Ontology; Multi-Agent System; Cloud Computing.

-----------------------------------------------------------------------* Corresponding author. E-mail address: [email protected].

41

149

International Journal of Sciences: Basic and Applied Research (IJSBAR) (2014) Volume 15, No 2, pp 41-46

1. Introduction According to WhatIs.com, ontology is the working model of entities and interactions in some particular domain of knowledge or practices such as electronic commerce or "the activity of planning.” Simply defined, ontology is the set of concepts, for example, things, relations or even events specified in a particular way such a particular natural language. This is such that an agreed-upon vocabulary that will be used to exchange information is created. A multi-agent system either can be defined as “a computerized system composed of multiple interacting intelligent agents within an environment” or “A loosely coupled network of software agents that interact to solve problems that are beyond the individual capacities or knowledge of each problem solver.”Cloud computing is defined as storing and accessing data and programs over the Internet instead of one’s computer's hard drive. There are three types of cloud services: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS). When it comes to cloud computing, security is very important, and it is becoming harder to achieve due to the increasing number of users. The present approaches to have control on the cloud have not scaled well to the requirements of multi-tenancy. This is because of the fact that they are based on the user IDs of individuals at different levels of gratuity [3]. However, the number of users can be large, and this leads to significant overhead in managing the security. In order to provide an environment that provides automatic searching of services, resources ontology is used. There are a number of areas where ontology is used in cloud computing. They include the following: •

Intelligent Ontology-based registries are utilized for dynamic discovery of resources of cloud

computing across various platforms in the cloud computing world •

Ontology can be used for the benefit of SaaS to provide intelligent customization

•

Role based access control using ontology eases the design of the security system.

2. Inter-cloud directories and exchanges that are ontology based Directories and exchanges that are inter-cloud based are used usually to provide connectivity and collaboration among different cloud providers. This mechanism has a catalogue for the cloud that makes use of ontology in an effort to automate an environment in which software agents will discover and consume services. When this happens, n2 complexity is reduced resulting in one-many, as well as many-many models [4]. 3. Ontology-based cloud computing resource catalogue Many cloud providers advertise the capabilities of their resources in the cloud-computing catalogue. Hence, management needs careful planning in order to achieve the objectives of the business while avoiding errors. To achieve this, semantic web technologies are implemented in in service registries such as UDDI. Tmodel is a taxonomy that was used in UDDI, and it played the role of a proxy in providing a technical solution outside the registry. However, taxonomy just describes a class/subclass type of relationship while ontology, on the other hand, describes the domain entirely. Ontology aids in providing an accurate description of services using their

42

150


abilities to define properties for the class. Hence, the catalogue aids in capturing the computing resources across all clouds in terms of “capabilities,” “policies,” as well as “structural relationships” [5]. There is a need for a semantic model that is declarative to make, which will capture the requirements, as well as the constraints of the resources of computing. The main reason for this need is to make sure that the provided requirements by the cloud provider who is inter-cloud enabled match with the capabilities of the infrastructure in a schematic fashion, which is automated. The semantic model that is ontology-based makes use of RDF/OWL to show the features, as well capabilities in the infrastructure of the cloud provider. The capabilities captured are usually arranged logically, grouped, displayed as a standardized unit to provision and configure, as well as they are to be consumed by another cloud provider. After this is done, there is association with other policies to make sure that there is compliance to access the resources of the computer [6]. 4. Ontology-based intelligent customization framework for SAAS SaaS that has a multi-tenancy architecture is a model used for software deliveries while the software provider ensures to publish a copy of their software on the Internet in order to support multiple consumers in a cloud environment. MTA will allow more that two tenants to share a software service that has been customized so that each tenant may have their own Graphical User Interface, data, as well as user interaction. The consequence of this is that it may appear to each tenant like they are the only ones using the software. For example, it serves keep what is confidential, but it allows multiple tenants to use the same software [7]. The SaaS provider should be able to have customizations to meet a number of goals. For example, he needs to support the tenants with multiple options and variations with the use of a single code base in such a way that every user is able to have a unique configuration of the software. Additionally, he has to make sure that the configuration he used is simple and easy to meet the needs of the tenant without having to incur extra costs for development or operation. This customization not only relates to functionality, but also to the Quality of Service (QoS). A SaaS that is customizable fully has an architecture that is layered. 5. Case study on clinical trials Clinical Trials are the main way in which medical research is conducted to investigate new medications, devices and other products for medical purposes. Data collected during these clinical trials is very important to organizations carrying out the study. Hence, there has to be careful collection, handling and storage of data while observing and obeying both national and international regulations. The main way to deal with lingual and domain specific issues is by the use of ontologies. In the medical field, there has been extensive use of taxonomies and ontologies in structuring and organizing knowledge. In order to overcome the problems experienced, an ontology-based data integration system is required. The framework described below should be able to cater to these needs. 6. Proposed framework The proposed framework will have four layers:

43

151


6.1. Data Layer In a domain, different communities define ontology systems that are of multiple data. The integration of ontology is developed in order to solve heterogeneity issues, and this refers to build a larger and complete ontology at a higher level using an already existing ontology systems. The ontology will be represented as a tree and with a tree key words, which bear similar meaning are assigned the value one. The customization of data layers will be guided using the ontology information. After searching for ontology in a domain, the template is located and customized using ontology information. 6.2. Service layer Services provided can be categorized into two: simple and composite. An Atomic service is a service that carries out fundamental operations whereas a composite service is one that is comprised of atomic services that are related to carry out complex tasks. Every service operates under terms and conditions, and complicated tasks are achieved through ontology [8]. 6.3. Business process Layer This layer has organized its services, as well as participants, in such a way that they are able to achieve very complex business tasks, workflows, which consist of activities, and represent business processes. Tenants are able to search a workflow repository with the help of keywords and get access to the relevant ones depending on their interests. The process of customization is centred on the business domain knowledge in a workflow that is multi-layered using a series of steps or transformations obtained from template objects [9]. 6.4. User interface layer The user interface ontology can be built in such a way that it provides the concepts, relationships, reasoning and searching for elements that are User Interface related. The ontology should be made to accommodate user interface classification information. The information includes data collection and representation, command and control, monitoring, and finally, hybrid, which is a combination of two or more types. The easiest user interface customization is to change and configure the appearance and the UI available to the users, and this includes adding or even editing and deleting the icons, fonts and other issues [10]. 7. Conclusion The most important characteristic that distinguishes Multi-Agent Systems from the traditional distributed systems is that the MAS system paired with its components is intelligent. As time goes by, Multi-Agent Systems are becoming increasingly popular in solving larger and more complex issues so is the need for technology that is adequate to fit into the MAS paradigm. Additionally, MASs are heterogeneous, and this is influenced by the fact that agent interaction, as well as organization is flexible and complex at the same time. A MAS provides an added advantage of a group of agents. It shows an influential and regular analogy for conceptualizing and outlining numerous programming applications, as will be represented with the security area situation. Likewise,

44

152


Multi-Agent Systems encourage the interoperability of heterogeneous frameworks. The objective is to "agentify" the heterogeneous segments, that is, to wrap these segments with an operator layer that empowers them to interoperate with one another by means of a uniform agent communication language.

Figure 1: Ontology with Multi-Agent System in Cloud Computing

45

153


References [1].

Thomas Erl, Ricardo Puttini, and Zaigham Mahmood. Cloud Computing: Concepts,

Technology & Architecture. New Jersey: Prentice Hall, 2013. Print. [2].

Wooldridge, Michael. An Introduction to MultiAgent Systems. New Jersey: John Wiley &

Sons, 2009. Print. [3].

Pornpit Wongthongtham, Elizabeth Chang, and Tharam Dillon. Ontology-Based Multi-Agent

Systems. New York: Springer Science & Business Media, 2009. Print. [4].

De Oliviera et al. "Classifying Cloud Computing Environments Using Taxonomies." Nikos

Antonopoulos, Lee Gillam. Cloud Computing: Principles, Systems and Applications. New York: Springer Science & Business Media, 2010. Print. [5].

Shroff, Gautam. Enterprise Cloud Computing: Technology, Architecture, Applications.

Cambridge: Cambridge University Press, 2010. Print. [6].

Rodríguez, I. Lopez and Hernandez- Tejera M. "Software Agents as Cloud Computing

Services." Yves Demazeau, Michal Pechoucek, Juan Manuel Corchado Rodríguez, Javier Bajo Pérez. Advances on Practical Applications of Agents and Multiagent Systems: 9th International Conference on Practical Applications of Agents and Multiagent Systems. New York: Springer Science & Business Media, 2011. Print. [7].

Guarino, Nicola. "Formal Ontology and Information Systems." National Research Council

(1998): 3-15. Print. [8].

Tekinerdogan, B. and Ozturk, K. "Feature-Driven Design of SaaS Architecture." Zaigham

Mahmood, Saqib Saeed. Software Engineering Frameworks for the Cloud Computing Paradigm. New york: Springer Science & Business Media, 2013. Pint. [9]. Them?"

Chandrasekaran, B. and John R. Josephson. "What Are Ontologies and Why do we Need IEEE

Intelligent

Systems

(1999):

20-26.

Web.

5

aug.

2014

http://www.csee.umbc.edu/courses/771/papers/chandrasekaranetal99.pdf [10].

Alexa Huth, and James Cebula. "United States Computer Emergency Readiness Team." US-

CERT, A Government Organization Website. 2011. Web. 05 Aug 2014. http://journaleconomica.rhcloud.com/us-cert-united-states-computer-emergency-readiness-team/

46

154

I.J. Computer Network and Information Security, 2014, 9, 51-57 Published Online August 2014 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.2014.09.07

Artificial Intrusion Detection Techniques: A Survey Ashutosh Gupta Department of computer science, Ambedkar Institute of Advanced Communication Technology and Research (AIACTR), New Delhi, INDIA E-mail: [email protected]

Bhoopesh Singh Bhati Assistant Professor, Ambedkar Institute of Advanced Communication Technology and Research (AIACTR), New Delhi, INDIA E-mail: [email protected]

Vishal Jain Assistant Professor, Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi, INDIA E-Mail: [email protected]

researches in this field and many techniques have been evolved over a period of time. The main aim of this paper is to review the current trends in Intrusion Detection System (IDS) and to analyze some current problems attacks is not just a probability, but it is an accepted fact. An intrusion occurs when an attacker attempts to gain entry into or disrupt the normal operations of an information system, almost always with the intent to do harm. Even when such attacks are self-propagating, as in the case of viruses and distributed denial-of-service attacks, they are almost always instigated by someone whose purpose is to harm an organization [2]. One of the main aim of IDS is to make sure that case of the new attack, it is able to detect the attack and report it. Once the attack is reported then the administrator will be aware and try to avoid these attacks in the future. In this way the IDS will be upgraded and it will protect the network from the known attack. Monitor, detect and respond are three basic function of the Intrusion Detection System. IDS is a very good tool to detect the attack on the network but still it cannot be trusted all alone. It requires human expert also in order to assess those alarms. There are two main types of approaches in the Intrusion Detection System. Host based and Networkbased. Host based Intrusion Detection System (HBIDS) is present on a particular computer and operates on those systems which are the potential threat to the network. It generates alarms as soon as it detects any malicious activities or any threat. These alarms are sent to the administrator of that network so that actions can be taken as soon as possible. There are different types of HIDS like Tripwire, Cisco HIDS, and Symantec ESM. Network based Intrusion detection system resides on the computer or application connected to a part on an network traffic on that segment looking for indication of ongoing or successful attack [3] [1]. Different types of NIDS are

Abstract—Networking has become the most integral part of our cyber society. Everyone wants to connect themselves with each other. With the advancement of network technology, we find this most vulnerable to breach and take information and once information reaches to the wrong hands it can do terrible things. During recent years, number of attacks on networks have been increased which drew the attention of many researchers on this field. There have been many researches on intrusion detection lately. Many methods have been devised which are really very useful but they can only detect the attacks which already took place. These methods will always fail whenever there is a foreign attack which is not famous or which is new to the networking world. In order to detect new intrusions in the network, researchers have devised artificial intelligence technique for Intrusion detection prevention system. In this paper we are going to cover what types evolutionary techniques have been devised and their significance and modification. Index Terms—Artificial Neural Network, Genetic Algorithm, Immunity, Intrusion Detection, False Alarm. I. INTRODUCTION Internet is the global system of interconnection of computer networks that use the standard protocol (TCP/IP) to serve several user worldwide. Internet has become one of the most integral part of an individual. With the recent advancement in network based technology and dependability of our everyday life on this technology, assuring reliable operations of network based system is very important. There has been a major increase in the attacks on networks. This became the most prominent reason for the development of Intrusion Detection System (IDS).[1] There has been many Copyright © 2014 MECS

I.J. Computer Network and Information Security, 2014, 9, 51-57 155

52

Artificial Intrusion Detection Techniques: A Survey

widely used because many attacks have clear and distinct signatures, for example: (1) foot printing and fingerprinting activities use ICMP, DNS querying, and email routing analysis (2) exploits use a specific attack sequence designed to take advantage of a vulnerability to gain access to a system [5]. It has many disadvantages too. It has to be updated time to time otherwise new attacks might get their way through these IDPS. It is considerably slow and also requires the memory space for processing and comparing the pattern.

Snort, Netprowler. The rest of the paper is divided as follows. Section2 describes some related work on intrusion detection approaches, comparison is also being made between this section. In section 3, genetic algorithm and its use in intrusion detection is explained. Section 4 presents artificial neural network and its application in many IDS. Section 5 shows how artificial immune system is effective in detecting and defending intrusions. Comparison is also being made between three evolutionary intrusion detection systems. The conclusion is given in section 6.

C. Rule based Intrusion Detection System

A. Intrusion Detection Approaches :

This IDS is totally based on some predefined rules that are provided by the administrator. Each rule has specific operation in the system. These rules are being updated time to time so that known attacks can be detected very easily. Besides the fact that the false alarm in this IDS is very less, there are many drawbacks in this IDS. It cannot detect many new attacks. It requires a human expert to constantly upgrade this system [6].

There are many approaches which are proposed by many researchers. We classified these approaches into two groups Traditional approaches and Evolutionary approaches. Those approaches which do not involve any type of Artificial Intelligence (AI) are taken under Traditional approaches and those which involve AI are termed as Evolutionary algorithm. Traditional approaches are like statistical-anomaly approaches, rule based approaches, Expert system approach, pattern recognition approach, agent based approaches etc. Evolutionary approaches include artificial neural network approach, Genetic algorithm approach, Artificial immune system, Fuzzy Logic approach etc.

III. EVOLUTIONARY APPROACHES FOR INTRUSION DETECTION

This type of intrusion detection involves many Artificial Intelligence (AI) techniques. Here are some of the Evolutionary approaches using those techniques.

II. RELATED WORK: TRADITIONAL APPROACHES FOR

Table 1: Comparison between different approaches

INTRUSION DETECTION

A. Statistical Anomaly Intrusion Detection System It collects statistical summaries by observing traffic that is known to benormal. This normal period of evaluation establishes a performance baseline. Once the baseline is established, the stat IDPS periodically samples network activity and, using statistical methods, compares the sampled network activity to this baseline. When the measured activity is outside the baseline parameters, exceeding what is called the clipping level, the IDPS sends an alert to the administrator. Advantage of stat IDPS is that it can easily detect new attacks as it is always looking for anomalous activities. But these systems require a lot of CPU usage and memory as they are processing the data packets all the time. Other disadvantage of this IDS is that it may not detect some small changes or can generate the false alarm. Sorting of these false alarm will again require human expert and this leads to the consumption of time and labor. [4]

Intrusion Detection System

Network Based Intrusion Detection System

Host Based Intrusion Detection System

Presence

Present on the computer/applicat ion connected to part of an organization’s network Snort, Cisco NIDS and Netprowler Signature comparing Network operations may not get disturbed in this IDS Not capable to analyze encrypted data

Present on the particular server or a system, denoted as host, and controls activity Tripwire, Cisco HIDS and Symantec HIDS Configure and alteration Detect the irregularity in the attack

Types of software Basis Advantage s of IDS

Limitation s of IDS

B. Signature Based Intrusion Detection System A signature-based IDPS (sometimes called a knowledge-based IDPS or a misuse-detection IDPS) examines network traffic in search of patterns that match known signatures—that is, preconfigured, predetermined attack patterns. Signature based IDPS technology is Copyright © 2014 MECS

Attacks being detected

DOS,CGI, Port Scans, SHB probes Layer 3

Requires huge memory not suitable for large traffic network File Integrity check, Shell Attack, FTP Scans, SQL Injections



detect any malicious activity as soon as possible and the actions must be taken against that attack. Whenever there is a ping attack in the network, these attacks can be detected much sooner but the actions are taken on it really late. Using this technique can also reduce response time and increase the efficiency of the IDS. In the paper of Chetan Kumar et al [10], they have used GALIB C++ library to develop GA. Their fitness function is given by the formula of F = a/A + b/B, where a denotes quantity of attack correctly detected out of A attacks. Similarly b denotes quantity of normal connections in the total of B attacks. There are some advantages of GA based IDS. It is easy to module and separate from other applications. It gets better with time as it is based on experience. It is more flexible and provides friendly environment for the development and alteration of network. It is also used when there is a need to combine it with an existing solution [11]. The genetic algorithms is also being used in training the neural network for intrusion detection [12].

These intrusion detection techniques are being devised because they can deal with uncertain and partially true data. The main idea of involving evolutionary algorithm is to increase the efficiency of the IDS. There are many techniques that are devised to detect the attacks and prevention of those attacks

IV. INTRUSION DETECTION BASED ON GENETIC ALGORITHM

Genetic Algorithm can also be very helpful in IDS. Genetic Algorithms are computerized search and optimization algorithms based on the mechanics of natural genetics and natural selection. In order to understand the working GA, biological background is very important. All organisms consist of cells as their building block. In each cell, there are chromosomes which further consist of strings of DNA. A chromosome consists of genes on the blocks of DNA. Every gene encodes a particular pattern. These patterns decide the traits. During the creation of an offspring, recombination occurs and in that process genes from parents form a whole new chromosome in some way. The new created offspring can then be mutated. Mutation means that the element of DNA is modified [7]. The fitness of an organism is measured by means of success of organism in life. Genetic Algorithms are inspired by Darwinian Theory of Survival of the Fittest. Algorithm is started with a set of solution called populations. Solutions for one population are taken and used to form a new population. This is motivated by a hope that a new population will be better than the old one. Genetic Algorithm uses the process of natural selection and crossover in which chromosome like in which chromosome like data structures are used and these chromosomes are evolved with the help of mutation and recombination processes. An evaluation function is used in order to calculate the validity of each chromosome which should be fit from the previous generation chromosome.[8] For survival and combination of chromosomes is biased towards the fittest algorithm. Fitness function is applied in order to get the fitness score and based on this score forthe crossover to create new rules or hypothesis. So each time there is a new attack, the algorithm will automatically update itself in order to detect new malicious activities. This makes the approach for better than any approach present in the field of Intrusion Detection. Genetic Algorithm is capable of evolving rules that match only attacks on the network. For more clarity, the figure 1 is also included which demonstrates the basic Genetic Algorithm process[8].In the paper of Wei Li et al [9], the rules are in the form of basic if then else form:

V. ARTIFICIAL NEURAL NETWORK BASED INTRUSION DETECTION SYSTEM

A. Introduction The idea behind the application of soft computing techniques and particularly ANNs in implementing IDSs is to include an intelligentagent in the system that is capable of disclosing the latent patterns in abnormal and normal connection audit records, and to generalize the patterns to new (and slightly different) connection records of the same class.The results show that even a multi-layer perceptron (MLP) with a single layer of hidden neurons can generate satisfactory classification results. Because the generalization capability of the IDS is critically important, the training procedure of the neural networks is carried out using a validation method that increases the generalization capability of the final neural network. B. Origination of Neural Network Artificial neural network is inspired by the functioning and structure of constituents of the human brain, especially neuron. In the link [12], they have discussed about the structure of a neuron. A neuron is composed of nucleus called soma. There are long irregularly shaped filaments attached to neuron called dendrites. Dendrites behave as input channels, all inputs from other neurons arrive through the dendrites. There is another type of link attached to the soma known as Axon. Axon is electrically active and serves as an output channel. The axon terminates in the specialized contacts called synapse or synaptic junction. The synaptic junction is a very minute gap at the end of the dendritic link contains a neurotransmitter fluid. This fluid is responsible for accelerating or retarding the accelerating charges to the soma. The size of soma is likely to be related to learning. Synapses with larger area are thought to be excitatory while those with the smaller area are believed to inhibitory. Donald Hebb suggested that ―when an axon of cell A is near

If {condition} then {act}; The condition here refers to those criteria which satisfy the attack. The act above describes those security policies which should be taken whenever these attacks are encountered. The basic idea behind applying GA is to Copyright © 2014 MECS

53


54


attacks and variations in the known attacks. The speed is also another advantage of having neural network. Artificial Neural Network has ability to assemble patterns which has common features and classifies the attack.

enough to excite a cell B and repeatedly takes part in firing it some growth process or metabolic changes take place in one or both cell such that A’s efficiency as one of the cells firing bis increased‖. The modern era of neural network research is credited with the work done by neuro-physiologist, Warren McCulloh and mathematician, Walter Pitts in 1943. Both of them wrote a paper on how neurons might work. The next major breakthrough in the fields of neural network was made by Donald Hebb in 1949. He wrote a book about the neurons in NN named as ―The Organization Of Behavior‖. But this research did not help many researchers to solve the arising problem in this field. After the fifteen years of McColloh and Pitt’s work, a new approach in Neural Network was introduced. This approach of perceptron not only became famous but solved many big problems of pattern recognition in NN. The perceptron was the first ―practical‖ Artificial Neural Network. The idea of perceptron was originated from the functioning of the fly eye. Perceptron model comprises of three layers sensory unit, associative and response unit. The S unit comprises of 400 photo detectors receives input and gives binary output [13].

VI. ARTIFICIAL NEURAL NETWORK BASED INTRUSION DETECTION SYSTEM

The major problem in the field of Intrusion Detection is how to differentiate between the normal activity and real threat. There are many models which are capable of detecting malicious activities but they have their own shortcomings and limitations the most common approach is to define the rules and recognize the datasets and whenever it deviates from the normal behavior there will be an alarm generated for the administrator. This method requires a lot of human attention, labor, time and this IDS generates lot of false alarm. As we all know that, many organisms especially humans have survived billions of years. These organisms are capable of adapting any circumstance. This ability of surviving in harsh environment is possible only because of the Immune System. The immune system is a system of biological structures and processes within an organism that protects against disease. To function properly, an immune system must detect a wide variety of agents, from viruses to parasitic worms, and distinguish them from the organism's own healthy tissue [17]. In computer science, artificial immune systems are a class of computationally intelligent systems inspired by the principles and processes of the vertebrate immune system [18]. The algorithm typically exploits the immune system’s characteristics of learning and memory to solve a problem.Before moving towards artificial immune system, we would like to discuss about the human immune system. An immune system has mainly four properties: detection, diversity, learning and tolerance. For human immune system, there are basically four layers: skin, pH temperature, Innate Immune System, Adaptive Immune System. There are some basic terminologies that we should know. An epitope is a recognizable characteristic of a molecule, as seen by an immune system. One of the greatest advantages of immune system is its ability to detect insiders and outsiders. Antigens are epitopes that are recognized by the immune system as outsiders. Antibodies are the part of immune system which is responsible of detecting and binding to the antigens. First layer skin is a physical barrier which prevents any antigen to cross and enter into the human body. A pathogen is capable of entering into the system can penetrate through the skin it will meet a layer of pH temperature. Some pathogens can cross these two layers and they encounter the first immunity system that is the innate immunity system. This system has more specific response and has no memory. If the pathogen crosses all these layers, it will be taken care of by the second part of immune system which is the adaptive immune system which demonstrates responses to individual antigens [19] Inspiring from the working of immune system there has

C. Research on Neural Network Intrusion Detection System There have been many researches in this field. Many models have been devised using the concept of Neural Network (NN). A new approach to Intrusion Detection using ANN and fuzzy clustering. In this paper [14], the researcher used the hybridization of fuzzy clustering and existing neural network. In this model, they have used FCM to generate training datasets. In this way, they have increased the performance of ANN. They also compared FCM-ANN with BPNN and other famous proposed models. In the paper [15], researchers have proposed IDS technique which is based on Evolutionary Neural Network. ENN uses an evolutionary algorithm which not only sets the internal parameter but also designs the architecture of neural network simultaneously. The design can be prepared by knowing how many weights are required and how many hidden nodes can be used. Ghosh and other researchers have used some basic properties of neural network in order to find out intrusion like feed-forward back propagation [16]. D. Advantages of Artificial Neural-Network based Intrusion Detection System There is a great amount of flexibility that allows room for the growth and general learning. If an element of neural network fails, it can continue without any problem by their parallel nature. In Intrusion Detection system, it gives potential to tackle the most prominent of multiple types of attacks. A neural network will make a quick analysis on the type of attacks which are regular and recognize the pattern and learn it for future. This not only increases the accuracy but also reduces manual labour and false alarm to some extent. It also makes sure that the attack once recognized cannot take place in the future. Its generalization property enables it to detect unknown Copyright © 2014 MECS



been many models of Intrusion detection based on artificial immune system. Clonal selection, Negative selection and Immune networks are some famous models of IDS. The mechanism by which self-reactive lymphocytes are removed is known as negative selection and this process is known as clonal detection. These Tcells that remains alive in this process should not react with self –antigens. Immunological tolerance is the ability of lymphocytes not to react with the self-antigens. This whole mechanism described above inspires many negative selection algorithms in the field of artificial immune. There are many negative selection algorithms in the field of artificial immune system [20].There many negative selection algorithms which are being proposed by many researchers. The first algorithm was given by many researchers. The first algorithm was given by the Forrest in the year 1994[21]. The algorithm starts with the production of asset of self strings states the normal conditions of the system. After this, the main aim is to produce a group of detectors that bind the complement of self-antigens. These detectors are also applied to new data whether it manipulated or not. Many researchers are going to increase the efficiency [23][24]. Here is some comparisons being made between few evolutionary approaches used in latest IDS table 2.

[2]

VII. CONCLUSION

[12] [13]

[3]

[4] [5] [6]

[7]

[8]

[9] [10] [11]

Lot of research work has been going on Intrusion Detection System. There are many types of software available on Intrusion prevention system and intrusion detection system based on the Traditional approaches. But their performance is not too efficient as there is a greater chance of false alarm and updating is mandatory after a fixed period of time. These limitations can be avoided up to a certain level by the implication on Evolutionary based IDS (EBIDS). Whenever there is a new attack on the network, detects the anomaly and train itself to for this threat. In this way, it updates itself against new attacks.

[14]

[15]

[16] [17] [18]

VIII. FUTURE WORK Though EBIDS has greater advantage on the Traditional based IDS, but there are no full proof system. From the above comparative analysis, we have seen that neural network based IDS is one of the most efficient IDS in the field of Intrusion detection system. There are some limitations on ANN that can be overcome with a little bit of modification and manipulation. There can be hybridization of Genetic Algorithm and Neural Network in order to overcome the limitations of the IDS. There have been many researches on hybridization of different models. The future of IDS lies in Evolutionary Techniques.

[19] [20]

[21]

[22]

REFERENCES [1]

http://en.wikipedia.org/wiki/Internet.


55

Michael E. Whitman, Herbert J. Mattord "Principles of Information Security" pp. Karthikeyan .K.R and A. Indra "Intrusion Detection Tools and Techniques – A Survey"International Journal of Computer Theory and Engineering, Vol.2, No.6, December, 2010 pp 901-906. Michael E. Whitman, Herbert J. Mattord "Principles of Information Security" pp. http://www.sans.org/security-resources/idfaq /statistic_ids.php. Iftikar Ahmad, Azween B Abdullah, Abdullah S Alghamadi Comparative Analysis of Intrusion Detection Approaches 2012 21th International Conference on Computer Modelling and Simulation pp 586-591. S. Rajasekaran, Pai G. A. Vijayalakshmi "Neural Networks, Fuzzy Logic, and Genetic Algorithms: Synthesis and Applications" pp. Mohammad SazzadulHoque, Md. Abdul Mukit and Md. Abu NaserBikas "AN IMPLEMENTATION OF INTRUSION DETECTION SYSTEM USING GENETIC ALGORITHM" International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 pp 110-120. Wei Li, Using Genetic Algorithm for Network Intrusion Detection. AnupGoyal, Chetan Kumar (2008) GA-NIDS: A Genetic Algorithm based Network Intrusion Detection System. S. Rajasekaran, Pai G. A. Vijayalakshmi "Neural Networks, Fuzzy Logic, and Genetic Algorithms: Synthesis and Applications" pp. http://library.thinkquest.org/C007395/tqweb/history.html. Gang Wang, Jinxing Hao,Jian Ma, Lihua Huang "A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering". Sang-Jun Han and Sung-Bae Cho "Evolutionary Neural Network for Anomaly Detection Based on the Behaviour of a Program", IEEE Transaction on System, Man, and Cybernetics- Part B CYBRNETICS VOL. 36, NO. 3, JUNE 2006. Deqiang Zhou,‖ Optimization Modeling for GM(1,1) Model Based on BP Neural Network ‖ I. J. Computer Network and Information Security, 2012, 1, 24-30 Published Online February 2012 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.201 2.01.03. http://en.wikipedia.org/wiki/Immune_system. .http://en.wikipedia.org/wiki/Artificial_immune_system. ArefEshghiShargh "Using artificial immune system on Implementation of Intrusion Detection System", 2009 Third UKSim European Symposium on Computer Modelling and Simulation. http://www.artificial -immune- systems.org/ algorithms.shtml. Susan C. Lee, David V. Heinbuch "Training a NeuralNetwork Based Intrusion Detector to Recognize Novel Attacks" IEEE TRANSACTION ON SYSTEM, MAN AND CYBERNETICS- PART: SYSTEMS AND HUMANS, VOL. 31, NO. 4, JULY 2001. Chung-Ming Ou, C. R. Ou "Multi-Agent Artificial Immune Systems (MAAIS) for Intrusion Detection: Abstraction from Danger Theory" Agent and Multi-Agent Systems: Technologies and Applications Lecture Notes in Computer Science Volume 5559, 2009, pp 11-19. Patricia Mostardinha, Bruno Filipe Faria, AndréZúquete, FernãoVistulo de Abreu "A Negative Selection Approach to Intrusion Detection" Artificial Immune Systems


56


Lecture Notes in Computer Science Volume 7597, 2012, pp 178-190. [23] Mahdi Mohammadi, Ahmad Akbari, BijanRaahemi, BabakNassersharif "A Real Time Anomaly Detection

System Based on Probabilistic Artificial Immune Based Algorithm" Artificial Immune Systems Lecture Notes in Computer Science Volume 7597, 2012, pp 205-217.

Fig. 1: Basic Genetic Algorithm Process

Table 2: Comparisons between Evolutionary Intrusion Detection Approaches

Approaches

Artificial Neural Network

Artificial Immune System

Genetic Algorithm

Intrusion Detection Techiques

Anomaly

Anomaly

Misuse

Input Data

Data sets, perceptron

Data sets, Sequence of Algorithms

Data sets , Fitness Function

Detecting known threats

Yes

Yes

Yes

Detecting unknown threats

Yes

Yes

No

Related paper work included in this survey

Gang Wang [13], Sang-Jun Han [14], Susan C. Lee[19]

Chung-Ming Ou[20], Patricia Mostardinha[21], Mahdi Mohammadi[22]

Md. SazzadulHoque[8], Wei Li[9], Chetan Kumar [10]

Performance of IDS

High

Moderate

Low




Author’s Profile

57

Engineering of Ambedkar Institute of Advanced Communication Technologies & Research, Govt. of NCT, Delhi-110031. He published various Research Paper in International Journals and Conferences. His current research area is Information Security.

Ashutosh Gupta, Pusruing B.Tech. degree (Computer Science and Engineering) from G.G.S.I.P. University Delhi. He published a research paper on information retrieval and semantic web in National Journal. His current research area is Multimedia Information Retrieval.

Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Sidgh Indraprastha University, Delhi and doing Ph.D from Engineering Department, Lingaya’s Computer Science and University, Faridabad. Presently he is working as Assistant Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE.

Bhoopesh Singh Bhati, Pursuing Ph.D. degree from the G.G.S.I.P. University Delhi. He has obtained M.Tech. Degree in Information Security and B. Tech. (Computer Science and Engineering) from the G.G.S.I.P. University Delhi. He is working as an Assistant Professor in the department of Computer Science and

How to cite this paper: Ashutosh Gupta, Bhoopesh Singh Bhati, Vishal Jain,"Artificial Intrusion Detection Techniques: A Survey", IJCNIS, vol.6, no.9, pp.51-57, 2014. DOI: 10.5815/ijcnis.2014.09.07



International Journal of Sciences: Basic and Applied Research (IJSBAR) ISSN 2307-4531 (Print & Online) http://gssrr.org/index.php?journal=JournalOfBasicAndApplied

--------------------------------------------------------------------------------------------------------------------------------------

Incremental Learning Approach for Enhancing the Performance of Multi-Layer Perceptron for Determining the Stock Trend Basant Ali Sayed Alia*, Abeer Badr El Din Ahmedb, Alaa El Din Muhammad El Ghazalic, Vishal Jain d a

Teaching of Department of Management Information System , Higher Institute of Qualitative Studies, Cairo, Egypt b

c

Lecture of Department Computer Science , Sadat Academy for Management Sciences, Cairo, Egypt

Professor of Department Computer and Information system ,Sadat Academy for Management Sciences, Cairo, Egypt

d

Assistant Professor of Department Computer Applications and Management, Bharati Vidyapeeth's Institute of Computer Applications and Management, New Delhi, INDIA a

[email protected]

b

[email protected]

d

[email protected]

Abstract This paper introduces a new technique for achieving minimum risk of predicting stock trend using multi-layer perceptron. The proposed technique presents the method of classification the stock trend .the paper show a comparison among multi-layer perceptron, gene learning theory. The achieved results show the superior performance of the multi-layer perceptron which is based on mathematical back ground. Keywords:Prediction, Multi-layer Perceptron, Grid search, classification accuracy.

-----------------------------------------------------------------------* Corresponding author. E-mail address: [email protected]

15

162

International Journal of Sciences: Basic and Applied Research (IJSBAR)(2014) Volume 16, No 1, pp 15-23

1. Introduction It is sure that the main challenge to discover the technique of Stock market prediction is an applicable and less complicated application of machine learning algorithms. It is found that many various effort in price prediction by using methods such as Neural Network, Linear Regression(LR), Multi Linear Regression(MLR), Auto Regressive Moving Average Models (ARMA) and Genetic Algorithms(GA) .In this report we consider about the Twin Gaussian Process (TGP) method to predict the stock prices. Because of dynamic manner of stock market prices, prediction is so difficult. Increase and decrease of stock market prices depends on various factors such as amount of demand, exchange rate, price of gold, price of oil, political and economic events and … but in the other view point we can consider the stock market price variation as time series and without notation to the mentioned factors, and just by finding the sequence rules of price train, make the price prediction in the future. There are so many researches in price variation time series prediction by different methods such as Neural Network, LR, MLR; ARMA, GA. Neural Network is one of the popular methods for predicating stock market. Most of researches that use Neural Network for prediction, use Multi-Layer Perceptron and use back propagation for learn Network. Who Train Neural Network by historical data of stock for predicting price in future. 2. Limitations of the research • The paper focus on the binary classification process performed by neural network. • The domain of the application is the Egyptian stock market. • The selected sector of the Egyptian stock market is textile sector. • The results achieved under the normal economic conditions of the Egyptian stock market. 3. Adaptive Activation Functions There is a method for improving the performance of the neural network based on enabling the functions of the activation to modify related to the training data characteristics obtained. Using the adaptive activations functions is considered one of the earlier methods which are developed by Zurada. In this case the achieved slope of the selected sigmoid activation function can be learned simultaneously by the achieved weights. In this case the slope parameter λ is gained for every output and hidden layers. Furthermore there is an Additional development is achieved by Engel Brecht et al for the lambda-learning algorithm. Here, the definition of this sigmoid function can be shown as

Such λ is defined as the slope of the function and γ is considered the maximum range. There is an additional development is achieved by Engel Brecht et al for learning equations to can learn the maximum ranges of the used sigmoid functions, thereby doing automatic scaling process which is based on gamma-learning. This development has no need to scale target values to the range (0, 1). The result of modifying the slope and range of the sigmoid function which is called delta learning, the lambda and variations of gamma learning,

16

163


Fig 1: the Multi-layer perceptron structure 4. Challenges for the Training Multilayer Perceptron Networks The main aim of the training process is discovering the group of weight values that will make the output from the neural network can match the actual target values as closely as possible [2]. No doubt that there are many critical points that involved in designing and training a multilayer perceptron network: • Selecting how many hidden layers to use in the network. •

Deciding how many neurons to use in each hidden layer.

•

Finding a globally optimal solution that avoids local minima.

• Converging to an optimal solution in a reasonable period of time. • Validating the neural network to test for over fitting. 5. Neural network Model Experiments: 4.1. Selective Learning Not much research has been done in selective learning. Hunt and Deller developed Selective Updating, where training starts on an initial candidate training set. Patterns that exhibit a high influence on weights, i.e. patterns that cause the largest changes in weight values [3].are selected from the candidate set and added to the training set. Patterns that have a high influence on weights are selected at each epoch by calculating the effect that patterns have on weight estimates. These calculations are based on matrix perturbation theory[5].Where an input pattern is viewed as a perturbation of previous patterns.If the perturbation is expected to cause large changes to weights, the corresponding pattern is included in the training set. The learning algorithm does use current knowledge to select the next training subset, and training subsets may differ from epoch to epoch [9]. Selective Updating has the drawback of assuming uncorrelated input units, which is often not the case for practical applications. 4.2. Incremental learning

17

164


Research on incremental learning is more abundant than for selective learning. Most current incremental learning Techniques have their roots in information theory, adapting Fedorov’s optimal experiment design for NN learning[12]. The different information theoretic incremental learning algorithms are very similar, and differ only in whether they consider only bias, only variance, or both bias and variance terms in their selection criteria. One drawback of the incremental learning algorithms summarized above is that they rely on the inversion of an information matrix [7].Fukumizu showed that, in relation to pattern selection to minimize the expected MSE, the Fisher information matrix maybe singular. 4.3. Converging to the Optimal Solution – Conjugate Gradient It is a set of starting weights value where that selected randomly. The suggested approach use the benefits generated from the conjugate gradient technique. Practically, majority of training algorithms are based on similar cycle to select the best values of weights. Firstly, run the values predicted for the case through the selected network based on tentative set of weights [15]. Secondly, calculate the variation among the target predicted and the actual target for the case. This is called the error of the prediction .thirdly; calculate the mean error of the information through the set of training cases. 4. Generate the achieved error backward by the network itself and calculate the gradient (vector of derivatives) of the difference in the achieved error with respect to difference in the values of weight 5 [6]. Perform the current adjustments to the generated weights to reduce the error. Every cycle of this is called a /epoch/. Due to the generated error information is propagated backward by the network, this sort of training technique is called /backward propagation/ or "back prop". The fundamental technique is based on using the /gradient descent/ algorithm to modify the value weights to can convergence based on the gradient [8]. Majority of neural networks use the best of algorithm .this approach can provide the classical conjugate gradient algorithm with line search, but it also offers a newer algorithm, /Scaled Conjugate/ /Gradient/ (see Moller, 1993). This technique is based on using the technique of numerical approximation for the second derivatives that be called Hessian matrix). On the other hand in the same time, it can avoidinstability by combining the model-trust region technique from the Liebenberg-Marquardt technique with the conjugate gradient approach. This permit the scaled conjugate gradient to calculate [3] .what is the optimal step size in the search direction without having to done more expensive computations in line search used by the classical conjugate gradient algorithm. It is sure, that must be a cost involved in evaluating the second derivatives [5]. The performed Tests by Moller show the scaled conjugate gradient algorithm converging up to approximately twice as fast as classical conjugate gradient and up to twenty times as fast as the neural back propagation which is based on gradient descent. Moller's tests also presented that scaled conjugate gradient failed to converge less often than traditional conjugate gradient or back propagation using gradient descent [4]. 4.4. The approach for Avoiding Over fitting Over fitting case “happens only when the parameters of a model are trained so tightly that the model fits the training data well but has poor accuracy on separate data not used for training. Especially the type of network is called Multilayer perceptron's are subject to over fitting as are most other types of models. The paper has a

18

165


method for dealing with over fitting by selecting the optimal number of neurons. The paper has a method for dealing with over fitting: (1) by selecting the optimal number of neurons as described above, and (2) by evaluating the model as the parameters are being tuned and stopping the tuning when over fitting is detected. This is known as “early stopping”. 6. The proposed algorithm The best neural network structure was chosen from Table (1) below. The selected network has 5 input neurons with 11 hidden neurons and 1 output. Each MLP in Table (1) was trained and tested using different learning rates and epochs. The best network (marked in bold font) was chosen using the least difference between the training and testing data. The motivation for using this criterion is to ensure that the network chosen can predict the stock price as accurate as possible. The proposed algorithm 1- Prepare the data collected. 2- Clean the data. 3- Run the model of neural network optimized by grid search. 4- Compute the relative importance of dimensions. 5- Rank the relative importance of dimensions. 6- Select Best dimensions achieving minimum error. 7- If the minimum errors satisfy acceptable error, stop or else go to step 3. 7. Experiments and results This subsection shows a comparison between neural network and hybrid neural and grid search Input Data Input data file: \csv\bolivar.csv Number of variables (data columns): 47 Data subletting: Use all data rows Number of data rows: 800 Total weight for all rows: 800 Rows with missing target or weight values: 0 Rows with missing predictor values: 1

19

166


6.1. Neural Network Parameters Number of rows excluded because of missing predictor values = 1 Table1: shows the Neural Network Architecture Layer

Neurons

Activation

Min. Weight

Max. Weight

Input

5

Passthru

Hidden1

13

Linear

-1.483e+000

1.706e+000

Output

3

Linear

-1.787e+000

1.516e+000

Table2: shows Category weights (prior probabilities) Category

Probability

Down

0.5225564

Up

0.4398496

stable

0.0375940

6.1.2. Training Statistics Table3: shows error generated by Conjugate gradient Process

Time

Evaluations

Error

Conjugate gradient

00:00:02.5

806,145

9.5403e-002

6.1.3. Model Size Summary Report Network size evaluation was performed using 4-fold cross-validation. Table 4: shows Network size evaluation was performed using 4-fold cross-validation Hidden layer 1 Neurons

MLP % Misclassifications

IncrementalMLP % Misclassifications

2

42.765433221

37.09273

3

38.67765442

32.20551

4

35.675576

32.83208

5

34.87656578

32.08020

6

34.9876543

32.08020

7

33.86664532

32.58145

8

34.87675445

33.33333

9

34.97776654

31.95489

10

38.677788

34.58647

11

38.8766554

33.83459

20

167


12

34.678909087

32.08020

13

33.78769 Uk

They are related to prior and posterior probabilities. Prior means finding probability as earliest as possible without knowing features of document. Posterior means finding probability after examining

The model concludes that the terms which occur many times in single document is relevant but if same terms occur in large

176

40

International Journal of Computer Applications (0975 – 8887) Volume 94 – No 2, May 2014 number of large number of documents , then it is not relevant. So, a weight function is developed that varies from idf to Wk formula.

statements describing term is relevant or not. A graph has following elements:

Limitation of this model: - It is not able to distinguish between low frequency terms and high frequency terms in context of weights. It gives weight of low frequency terms as same as those of high frequency terms. It does not able to extract terms from multiple queries also. So, to overcome these problems, we have used Inference Network Model. (b) Bayesian Inference Network Model It is one of statistical approach for extraction of terms from multimedia documents with the help of constructing graph called as Inference Network Graph. Besides computing probabilities for different nodes, this model also determines concepts between various retrieved terms. It provides surety that user needs are fulfilled because it also combines multiple sources of evidence regarding relevance of document to user query. Graph Structure: - Inference Network is a graph that has nodes connected by edges. Nodes represent True/ false

D1

D2

T1 CR1

…………………….



Document Nodes (Dn) : - They are called Root Nodes



Text Nodes (Tn): - They are child nodes of document nodes. It may include audio, video nodes, text image nodes etc. So, child nodes have multiple representations of document.



Concept Representation Nodes (CRn): - They are child of text nodes. The concepts used in terms that are in text nodes are represented by CR nodes. These nodes are index terms or keywords that are matched in document and retrieves relevant terms.



Document Network: - It is network consisting of Document nodes, Text nodes, and CR nodes. It is not tree as it has multiple roots and nodes. Document Network is Directed Acyclic Graph (DAG) since it has no loop. The representation of document network for different documents from D1 to Dn is shown as: Dn-1

T2

Tn-1

CR2

CRn-1

Dn Tn

CRn

Figure4: Document Network (It describes concepts used in multiple terms from different documents) 

describe relevant terms are shown in form of results and presented to user.

Query Network: - Since we have extracted concepts in Document Network, it is possible that different concepts are used in same query nodes or different concepts in different nodes. The concepts that

CR1

Query Nodes

Leaf Nodes

CR2

The representation of query network for different query nodes from Q1 to Qn is shown as:

……….............…………….……….. CR n-1

Q1

r1

(Results)

CRn

Qn-1

Qn

r2

rn

Figure5: Query Network (It describes generation of results (leaf nodes)) When we combine document network and query network, we get inference graph. This graph computes probabilities of terms contained in child nodes of document nodes and so on. It is done by using LINK MATRIX. Each node is assigned with its weight in each row of matrix. The column represents number of possible combinations a node can have.

probability of their respective parents‟ node. Consider combination of 110 (1 stands for True, 0 for False). The probability for combination is calculated as P1 * P2 * (1-P3). Weight function for such combination is (W1 + W2) / (W1 + W2 + W3). Total probability is calculated as P1 * P2 * (1-P3) * (W1 + W2) / (W1 + W2 + W3).

In link matrix, Number of parents = 2n Number of columns. If node has 3 parents, then there will be 8 columns. Then, probabilities and weight function are computed for all 8 columns of matrix. Each probability is multiplied by its weight and then all eight probabilities are added to get total

3.4 Ontology Module This module is used to represent concepts and conceptual relationships among nodes that are described by inference network graph in previous module using concept of ontology. Ontology is defined as Formal, Explicit, and Shared

177

41

International Journal of Computer Applications (0975 – 8887) Volume 94 – No 2, May 2014 Conceptualization of concepts, thus organizing them in hierarchical fashion [10]. Various phases of ontology module are described below: (a) Creation of Ontology or Ontology Representation: Inference Graph consists of document nodes ( root nodes). Di has concept nodes as CRi. Edges represent relationship between them

Each document node has concept nodes that are treated as Vertices. An edge from one node to other node represents relationship among concepts.

Di

Ti

End for

CRi (b) Ontology Building: - It uses an algorithm for developing ontology for inference graph. It requires use of OWL (Ontology Web Language) that is used for writing ontology. It is used for creating objects of each class.

3.5 Query Processing Module A query is called information need. It is final result with optimal and effective terms. This module deals with expansion and refinement of query either automatically or manually with user interaction. It analyze query according to query language, extract information symbols from it and pass it to Retrieval Module for searching index terms.

BEGIN For each vertices V of inference graph G Class C = new (owl: class) C.Id = C.label each concept has its unique identification and name//

Query Expansion through manual methods: It includes:

//

DatatypeProperty DP = new (owl: DatatypeProperty) DP.Id = DP.Name, DP.Value; DP.AddDomain (C); adds values of child nodes to given concept node C//

// It

 Sketch Retrieval: - It is one of methods to query a // DatatypeProperty of parent node means multimedia database. Withwhat this, should user query is visual sketch given by user, and then system processes this drawing to extract its features and searches the index for similar images. 

Search by Example: - In this, user gives query as an example of image that he tends to find. A query then extracts low level features.



Search by Keyword: - It is most popular method. User describes information with set of relevant terms and system searches it in documents.

For each edge E of Graph G DP.AddDomain (B.getClass ()) getClass is used to show relationship between concepts//

//

End for End begin

Query Expansion through automatic method: - It includes Local Context Analysis (LCA) approach.

(c) Generation of OWL class Class Result = new (owl: class) Result represents leaf nodes//

It is one of best methods for automatic query expansion. It expands terms from query, rank and weights them by using certain formula.

//

Result.Id = Result. Name

LCA = Local Feedback Analysis + Global Analysis

DatatypeProperty ResultDP = new (owl: DatatypeProperty) // to show value of leaf nodes// ResultDP.Id = Result.Name, // Leaf nodes have name and value//

Result.

It is local because concept relevant terms are only retrieved from globally retrieved documents. It is global because documents related to given query topic are selected randomly from huge collection of documents present on web (like we have selected three documents related to semantic web from web). When we put query in Google and press ENTER, query is executed and it retrieves some documents. It is global activity. LCA is concept based fixed length scheme. It expands user query and retrieves top n relevant terms that closely satisfies query. It returns only fixed number of terms.

Value;

Result.AddDomain (Result) For each edge E of Graph G Class Relationship = new (owl: class) Relationship.Id= “

“

For each vertices of graph

The retrieved terms are ranked accordingly as:

Relationship.Id= Relationship.Id + C.label;

Belief (Q, C) =  [ + log (af(c, ta)) idfc / log (n)] idfa

End for

Where C= Concepts related to query Q

ResultDP.AddDomain (Relationship)

Belief (Q, C) = Ranking Function

178

42

be type of

International Journal of Computer Applications (0975 – 8887) Volume 94 – No 2, May 2014 ta = Query Term

∑ wtaNRD = All weights of NRD are added together

af(c, ta) = fta1d1 * fc1d1 + fta2d2 * fc2d2 + fta3d3 * fc3d3 + …………… ft an, dn * f cn, dn

y = It is constant that gives average of weights of terms in RD z = It is constant that gives average of weights of terms in NRD. The result is that we get negative weights and they will be discarded automatically.

d=n

af(c, ta) = ∑ ftad fcd

3.6 Retrieval Module

d=1

It is module that retrieves final results/optimal queries that have been extracted after going through various phases. It ranks document according to similar queries and maintains index according to information symbols contained in that query.

Where d = documents from 1 to n ftad = Frequency of occurrence of query term ta in document d fcd = Frequency of concepts (terms related to query) in document d

3.6.1 Re-Use of Queries

idfc = It measures importance of concepts related to query terms i.e. how many times the same concept is used in document

Need for Re-Use of Queries: - The queries that were already expanded and refined according to user‟s requirements are optimized and stored anywhere. If user needs some information ion future, then what is way to retrieve those documents that satisfies query?

idfa = It measures importance of query terms ta.  = It is constant used for distinguishing between relevant and non relevant terms. It stores non relevant terms that are treated as constant.

Solution: - Re-Use of queries. Analysis: - The expanded and refined queries are stored in database that is called as Query database. The query base contains queries related to previously retrieved documents. These queries are called Persistent Queries.

3.5.1 Query Refinement A tern can have different weights in each relevant document, so there is need to refine query. Query Refinement means calculation of old weights of expanded query terns in order to produce new weights of same query terns. These query terms are transformed into dummy document that is used for Indexing.

How to Use Persistent Queries with new Query? (a) If a new query is somewhat similar to persistent query, then result of new query is related to persistent query.

Here is formula used that calculates new weights of query terms and produces optimal results by discarding non relevant terms. It is called Rocchio Formula.

(b) If user new query is not similar to persistent query in any way, then system has to find persistent query from database that satisfies new query to some extent.

Aim: - The aim of this formula is to increase weights of terms that occur in relevant documents and decrease the weights of terms occurring in non relevant documents.

How to check similar queries? Using concept of Solution Region: - When search for an optimal query begins, system retrieves number of queries instead of only one query. All those queries are described in query space. The region containing that query space is called Solution Region.

Equation: Qa (new) = x * Qa (old) + y * 1/ (RD) * ∑ wtaRD – z * 1/ (NRD) * ∑ wtaNRD

We can check similarity between queries as the new queries are compared with queries in solution region and if they get matched, then both queries are said to be similar.

Where Qa (new) = New weight of query term a Qa (old) = old weights of tern s

4. EXPERIMENTAL ANALYSIS AND CALCULATIONS

RD = Relevant documents judged by user NRD = Non- Relevant documents judged by user

Consider a given sets of data. We have to compute probabilities of relevant and non relevant terms and hence calculate weight function for each term.

wtaRD = Weights of terms in relevant documents wtaNRD = Weights of terms in non relevant documents ∑ wtaRD = All weights of RD are added together Given data: Total number of relevant documents (R) = 10 Documents with Term tk ( r) = 4

Total number of Non relevant documents (N-r) = 15

Documents without Term tk (R – r) = 6

Documents with Term tk (n-r) = 5

Documents without Term tk (N-r) – (n-r) = 10

Total no. of documents without term tk (N-r) – (n-r) + (Rr) = 16 Total no. of documents having terms tk n-r + r = 9 n=9

179

43

International Journal of Computer Applications (0975 – 8887) Volume 94 – No 2, May 2014

Non Relevant Documents

According to BI model, Total number of documents N= 25 Total number of documents with term tk (n) = 9 Total number of relevant documents (R) = 10 Total number of relevant documents with term tk (r) = 4 From above data, Pk = Probability of term tk occurring in relevant documents = 4/10 = 2/5

Uk = Probability of term tk occurring in non - relevant documents = 5/15 = 1/3 X = Pk / (1- Pk) = (2/5)/ (3/5) = 2/3 Y = Uk / (1 – Uk) = (1/3) / (2/3) = ½ Odd Ratio or Weighting Function Wk = X/Y = 4/3 Ranking function W = log (X/Y) = log (4/3) = 0.20068

Graph for BI model

30 20

Total no. of relevant documents

10 0 0

5

10

15

Total no. of non relevant documents Total no of documents

Relevant Documents

Figure6: - Computation of Probabilities of terms graphically On the basis of above graph and probability values, we can find new weight function for terms from old weight function by using Rocchio Formula.

For infinite long queries, various methods of calculating expected value [E] like Poisson distribution, Binomial distribution are employed.

Qa (new) = x * Qa (old) + y * 1/ (RD) * ∑ wtaRD – z * 1/ (NRD) * ∑ wtaNRD

In this way, both short queries as well as long queries can be reused and expanded.

Here Qa (old) = 4/3

5. CONCLUSION

Relevant Documents (RD) = 10

The paper illustrates the working of proposed high level multimedia IR model consisting of various modules. Each module is described separately. This module provides extraction of relevant terms from huge collection of multimedia documents. Since multimedia documents produce information tokens that are different from text tokens, so those statistical approaches are shown in paper that analyses multimedia document and retrieves multimedia terms (text, images, and videos) from them.

Non relevant documents (NRD) = 15 ∑ wtaRD = 4 + 6 = 10 ∑ wtaNRD = 5 + 10 = 15 X = 1, y = (4+6) / 2 = 5, z = (5+10) / 2 = 15/2 = 7.5 So, Qa (new) = 1 * (4/3) + 5 * (1/10) * 10 – 7.5 * (1/15) * 15 = 4/3 + 5 – 7.5 = - 3.7

The new model can replace ambiguities of traditional multimedia IR model that deals with information symbols only instead of maintaining relationships between them. It is beneficial in various aspects like there is module introduced in it for maintaining conceptual relationships between extracted terms and represents them using ontology. The model uses probabilistic approaches for calculating ranking of documents and retrieves optimal queries. The results are then presented to user.

Since new weight function is negative, so it is discarded and old function is considered as relevance function. Catchy Concept: - The proposed High-Level Statistical Multimedia IR Model deals with the queries that have been expanded and refined according to user's requirements. In this way the queries can be reused. It is good idea if given queries is short. HOW THIS MODEL CAN BECOME SUITABLE FOR LONG QUERIES ALSO? Catchy Answer: - The answer to above question is Use of Random Variables. These variables may be continuous as well as Discrete. The terms that are found in multimedia text documents can be treated as variables. If the terms are short or finite, then it is solved using concept of Discrete Random Variables. It simply means adding product of probabilities of various terms/queries used in document. In this way, expected value [E] of term function is calculated and we can determine its relevance.

6. REFERENCES

For long queries, the concept of Continuous Random Variables can be used. Further, long queries may have some limit or they are infinite. For queries having limit, approximation is used. The terms are integrated to particular interval and produce results proximity to user‟s requirements.

[3] Borgo, S., Masolo, C.: Foundational choices in DOLCE. In: Handbook on Ontologies. 2nd edn. Springer (2009)

[1] International Press Telecommunications Council: \IPTC Core" Schema for XMP Version 1.0 Specification document (2005) [2] Technical Standardization Committee on AV & IT Storage Systems and Equipment: Exchangeable image _le format for digital still cameras: Exif Version 2.2. Technical Report JEITA CP-3451 (April 2002)

[4] Joao Miguel Costa Magalhaes: „Statistical Models for Semantic – Multimedia Information Retrieval‟, September 2008.

180

44

International Journal of Computer Applications (0975 – 8887) Volume 94 – No 2, May 2014 [5] Meghini C, Sebastiani F, and Straccia U: „A model of multimedia information retrieval‟ Journal of ACM (JACM), 48(5), pages 909–970, 2001.

4th IET International Conference on Advances in Medical, Signal and Information Processing (MEDSIP 2008), January 2008 page 314.

[6] Grosky, W.I., Zhao, R.: „Negotiating the semantic gap: From feature maps to semantic landscape‟, Lecture Notes in Computer Science 2234 (2001).

[20] S.Vigneshwari, M.Aramudhan: „An Ontological Approach for effective knowledge engineering‟, International Conference on Software Engineering and Mobile Application Modeling and Development (ICSEMA 2012), January 2012 page 5.

[7] Adams, W. H., Iyengart, G., Lin, C. Y., Naphade, M. R., Neti, C., Nock, H. J., and Smith, J.:‟ Semantic indexing of multimedia content using visual, audio and text cues‟ EURASIP Journal on Applied Signal Processing 2003 (2), pages 170-185.

[21] M.A. Moraga, C.Calero, and M.F. Bertoa: „Improving interpretation of component-based systems quality through visualization techniques‟, IET Software, Volume 4, Issue 1, February 2010, p. 79 – 90, DOI: 10.1049/ietsen.2008.0056,Print ISSN 1751-8806, Online ISSN 1751-8814.

[8] Datta, R., Joshi, D., Li, J., and Wang, J. Z.: „Image retrieval: ideas, influences, and trends of the new age‟ ACM Computing Surveys, 2008.

[22] Michael S.Lew, Nicu Sebu, Chabane Djeraba and Ramesh Jain: „Content-based Multimedia Information Retrieval: State of Art and Challenges‟, In ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), Feb 2006.

[9] Hofmann, T., and Puzicha: „Statistical models for cooccurrence data. Technical Report‟, Massachusetts Institute of Technology, 1998 [10] M. Preethi, Dr. J. Akilandeswari,: „Combining Retrieval with Ontology Browsing‟, International Journal of Internet Computing, Vol.1, Issue-1”, 2011

[23] Alberto Del Bimbo, Pietro Pala: „Content- based retrieval of 3D Models‟, In ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), Vol. 2 Issue 1, Feb 2006, Pages 20-43.

[11] Croft, W. B., Turtle, H. R., and Lewis, D. D.: „The use of phrases and structured queries in information retrieval‟, In ACM SIGIR Conf. on research and development in information retrieval, Chicago, Illinois, United States 2004

[24] Carlo Meghini, Fabrizio Sebastiani and Umberto Straccia: „A model of multimedia information retrieval‟, Journal of ACM (JACM), Vol 48, Issue 5 September 2001, Pages 909-970.

[12] Rifat Ozcan, Y. Alp: „Concept Based Information Access using Ontologies and Latent Semantic Analysis‟, Technical Report, 2004-08.

[25] Simone Sanitini: „Efficient Computation of queries on feature streams‟, In ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), Vol. 7 Issue 4, November 2011, Article No. 38

[13] F. Crestani, M. Lalmas, C.J. van Rijsbergen, and I. Campbell: „Is this document relevant? . . . Probably: A survey of probabilistic models in information retrieval‟, ACM Computing Surveys, 30(4), pages 528- 552, December 1998.

[26] Graham Bennett, Falk Scholer and Alexandra: „A comparative study of probabilistic and language models for information retrieval‟, In Proceedings of nineteenth conference on Australian database ADC‟08, Vol.75 ISBN: 978-1-920682-56-9, Pages 65-74.

[14] Manning C.D., Raghavan P., and Schu¨tze H: „An Introduction to Information Retrieval‟, Cambridge University Press, Cambridge, 2007. [15] CAI, D., Yu, S. Wen, J.-R., and Ma, W.-Y: „Extracting content structure for Web pages based on visual Representation‟. In Asia Pacific Web Conference 2003

ABOUT THE AUTHORS Gagandeep Singh has completed his B.Tech (CSE) from GTBIT affiliated to Guru Gobind Singh Indraprastha University, Delhi. His Research areas include Semantic Web, Information Retrieval, Data Mining, Remote Sensing (GIS) and Knowledge Engineering.

[16] Metzler, D. Manmatha, R: „An inference network approach to image retrieval‟, In Enser, P.G.B., Kompatsiaris, Y., O‟Connor, N.E. Smeaton, A.F. Smeulders, A.W.M., eds.: CIVR. Volume 3115 of Lecture Notes in Computer Science. Springer (2004) 42– 50.


[17] Faloutsos C., Barber R., Flickner M., Hafner J., and Niblack W: „Efficient and effective querying by image content‟, J. Intell. Inform. Syst., 3:231–262, 1994. [18] Ed Greengrass: „Information Retrieval: A Survey‟, November 2000 [19] O.S. Al- Kadi: „Combined Statistical and Model based texture features for improved image classification‟,

.

IJCATM : www.ijcaonline.org

181

45

International Journal of Computer Application Available online on http://www.rspublication.com/ijca/ijca_index.htm

Issue 4, Volume 2 (March - April 2014) ISSN: 2250-1797

A BRIEF OVERVIEW ON INFORMATION RETRIEVAL IN SEMANTIC WEB Vishal Jain Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad

ABSTRACT Information retrieval technology has been central to the success of the Web. For semantic web documents or annotations to have an impact, they will have to be compatible with Web based indexing and retrieval technology. We discuss some of the underlying problems and issues central to extending information retrieval systems to handle annotations in semantic web languages There is need to organize that data in formal manner so that user can easily exploit them. To retrieve information from documents, we have many Information Retrieval techniques. Current Information Retrieval (IR) techniques are not so advanced that they can be able to exploit semantic knowledge within documents and give precise result. IR technology is major factor responsible for handling annotations in Semantic Web (SW) languages. Knowledgeable representation languages are used for retrieving information.

Keywords Semantic Web, Information Retrieval, Ontology, HIR

1. INTRODUCTION We view the future web as combination of text documents as well as Semantic markup. Semantic Web (SW) uses Semantic Web documents (SWD’s) that must be combined with Web based Indexing. Current IR techniques are not so intelligent that they are able to produce Semantic relations between documents. Extracting information manually with the help of XML and string matching techniques like Rabin Karp matcher has not proven successful. To use these techniques normal user has to be aware of all these tools. Information retrieval technology has been central to the success of the Web. Web based indexing and search systems such as Google and Yahoo have profoundly changed the way we access information. For the semantic web technologies [2][3] to have an impact, they will have to be compatible with Web search engines and information retrieval technology in general. We discuss several approaches to using information retrieval systems with both semantic web documents and with text documents that have semantic web annotations. So, keeping this in mind we have moved to concept of Ontology in Semantic Web.

2. SEMANTIC WEB In spite of many efforts by researchers and developers, Semantic Web has remained a future concept or technology. It is not practiced presently. There are few reasons for this which is listed below: (a) Complete Semantic Web has not been developed yet and the parts that have been developed are so poor that they can’t be used in real world. (b) No optimal software or hardware is provided. “SW is not technology, it is philosophy” [1]. “The Semantic Web is a mesh of information linked up in such a way as to be easily process able by machines, on a global scale.” “The Semantic Web approach develops languages for expressing information in a machine process able form. “ These two sentences define the essence of the SW:

R S. Publication (rspublication.com), [email protected] 182

Page 1



Its information in machine process able form; however in the same time first one defines SW as a global scale information mesh and the second sentence defines it as Framework for expressing information.

2.1 WHY WE SHOULD USE SW? We use Web as a global database first of all for search. Today’s search engines cannot search more precise that they do it now. May be the main reason is that the structure and size of current Web do not allow to make search more precise and efficient. The second reason cannot be eliminated: Web contains now a huge number of documents and this number has a strong tendency to double each one or two years. The structure of documents and Web itself .probably, can be changed in “a better – machine process able way”

2.2 SEMANTIC WEB ARCHITECTURE Architecture consists of following parts: URI and UNICODE: - The Semantic Web is generally built on syntaxes which use URIs to represent data, usually in triples based structures: i.e. many triples of URI data that can be held in databases, or interchanged on the World Wide Web using a set of particular syntaxes developed especially for the task. These syntaxes are called "Resource Description Framework" syntaxes. Unicode allows supporting the international text style standard  RDF and rdfschema: - RDF is Resource Description Framework. It processes metadata. It provides interoperation to work together between applications that exchange machine understandable information on web. o Rdfschema: - It is RDF vocabulary description language and represents relationship between groups of resources. There is RDF model designed for representing properties and their values.  Ontology: - Ontology is abbreviated as FESC which means Formal, Explicit, specification of shared conceptualization. [4]. Formal specifies that it should be machine understandable. Explicit defines the type of constraints used in model. Shared defines that ontology is not for individual, it is for group. Conceptualization means model of some phenomenon that identifies relevant concept of that phenomenon. 

Inference: - It is defined as producing new data from existing one or to reach some conclusion. E.g.: Adios is a French word which is replaced by Good bye that is understandable by user.

Inference

Ontology

RDF & rdfs

URI

Unicode

[5]Figure1:“SW Architecture” 3. INFORMATION RETRIEVAL (IR) Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing.


Page 86



Automated information retrieval systems are used to reduce what has been called “Information Overload”. Many universities and public libraries use IR systems to provide access to books, journals and other documents. Web search engines are the most visible IR applications.

3.1 PROCESS OF IR A schematic description of a usual IR process is shown on Fig. 4. Background knowledge stored in the form of ontologies can be used at practically every step of the process. For performance reasons, however, it does not seem to be feasible to use background information in the similarity measure used during matching and ranking, as it would be prohibitively expensive. Although, e.g., case-based reasoning systems apply domain-specific heuristics in their similarity measure [6], they operate on document collections which contain only several hundreds or thousands cases. We rather believe that it is possible to extend the query (and/or the document representation) syntactically based on the information stored in ontologies so that a simple, syntax-based similarity measure will yield semantically correct results. Ranked •list of •documents

Text documents

Result

Solves user query

Admin

Figure2: “IR Process” 4. HIR (Hybrid Information Retrieval) Semantic Web with all its features like semantic markups and ontologies allows to use additional search techniques. However, In the same time the standard IR approaches are still useful for making differences “semantically similar” (similarity only by additional features) to the query documents. SW IR system can use so called Hybrid Informational Retrieval. (HIR)

4.1 COMPONENTS OF HIR Standard Text IR

Semantic IR

It contains Vector Space Model, Indexing and It contains Inference, Ontology mapping, Markup Markup similarity relations.

Table1: “Components of HIR” Markup/text relationship (statistics of occurring the pair markup/term in data collection) allows in the first step to convert the query, which is usually posted as simple text string into semantic (markup) form. Tags (markups) are associated with query term, if tag/term frequency exceeds some threshold . Markup Similarity: - It allows ranking results together with text documents. Markup similarity allows ranking the result together with text term similarity. Query after it was processed in semantic form has nondeterministic form (with one text-term can be associated more than one markup). In this case query can be evaluated in two ways. • The first one is essentially SW method: the structure of the query is sifted using ontology and the most relevant to the query concept is used to reduce search space. • The second approach includes markup component in the ranking function. In this case similarity between query markup and concept taken from data collection has numerical equivalent.


Page 87



5. PROTOTYPE SYSTEMS After using several approaches for retrieving information from documents, three prototypes system has been developed make use of knowledgeable representation languages for solving queries. They are OWLIR, SWANGLER and SWOOGLE. We will discuss one by one in detail.

5.1 OWLIR OWLIR is an example of the Semantic Web IR system. It is an implemented system for retrieval of documents that contain both free text and semantic markup. OWLIR is only a framework, which was designed to work with almost any local information retrieval system. OWLIR (Ontology Web Language and Information Retrieval) focuses on addressing three scenarios that involve semantically marked up web pages and text documents  IR: - gathering information about documents for query.  Q & A: - Ask simple questions and answers.  Complex Q & A OWLIR works with two retrieval engines- HAIRCUT and WONDIR. HAIRCUT: - It is abbreviated as Hopkins Automated Information Retrieval for Combining Unstructured Text. It is used for specifying required query terms. It is language modeling approach to find similarity between documents. WONDIR: - It is abbreviated as Word Or N-gram based Dynamic Information Retrieval Engine. It is written in Java. It provides basic indexing, retrieval and storage facilities for documents.

5.1.1 OWLIR ARCHITECTURE (a) IE (Information Extraction):- OWLIR is not only a semantic web retrieval system. Unfortunately, SW is only under construction now and the problem of marking the documents in a data collection (to make them really SW’s documents) can be solved either by human correcting (which requires much time and resources) or by using information extraction tools. OWLIR uses Aero Text system for text extraction of key phrases and elements from free text documents. Document structure analysis supports exploitation of tables, lists, and other elements and complex event extraction to provide more effective analysis. Aero Text is used together with Event ontology and outputs the result into a corresponding RDF triple model that utilizes the DAML+OIL syntax (b) Inference Engine: - OWLIR uses the metadata information added during the text extraction process to infer additional semantic relations. The inference engine exploits two information sources for deriving an answer: Event Ontology and the facts in the knowledge base. DAMLJessKB facilitates reading DAML+OIL pages, interpreting the information as per the DAML+OIL language, and allowing the user to reason over that information. DAMLJessKB is an inference system included into OWLIR. It provides basic facts and rules that facilitate drawing inferences on relationships such as Subclasses and Sub properties.

5.2 SWANGLER Currently the semantic web, in the form of RDF and OWL documents, is essentially a web universe parallel to the web of HTML documents. There is as yet no standard way for HTML (even XHTML) documents to embed RDF and OWL markup or to reference them in a standard way that carries meaning. Semantic web documents reference one another as well as HTML documents in meaningful ways. Some Internet search engines, such as Google, do in fact discover and index RDF documents. There are several problems with the current situation that stem from the fact that systems like Google treat semantic web documents (SWDs) as simple text files. One simple problem is that the XML namespace mechanism is opaque to these engines. A second problem is that the tokenization rules are designed for natural languages and do not always work well with XML documents. Finally, we would like to take advantage of the semantic nature of the markup. Swangling technique has been applied to SWDs to enrich them with additional RDF statements that add terms as additional properties of the documents.

5.3 SWOOGLE SWOOGLE [6] has been designed to facilitate the development of Semantic Web. With the help of Swoogle, we can AEQ RDF and OWL documents. AEQ means Access, Explore and Querying. Swoogle is crawler based indexing and retrieval system for Semantic Web. It extracts metadata for each discovered document and gives relationships between documents. Documents are indexed by IR system which use either character N-gram as keyword to find relevant documents.


Page 88



Figure3: “Swoogle window” Swoogle as a prototype internet indexing and retrieval engine for semantic web documents encoded in RDF and OWL. The system is intended to support human users as well as software agents and services. Human users are expected to be semantic web researchers and developers who are interested in accessing, exploring and querying a collection of metadata for a collection of RDF documents automatically discovered on the web. Software APIs will support programs that need to find SWDs matching certain descriptions, e.g., those containing certain terms, similar to other SWDs, using certain classes or properties, etc.

5.3.1 SWOOGLE ARCHITECTURE Four components include in its architecture. They are as follows: (a) SWD’s discovery (b) Metadata creation (c) Analysis of data (d) Interface All four components work independently and interact with each other through database.  SWD’s discovery: - It discovers Semantic Web Documents and keeps up to data information about objects.  Metadata creation: - It gives SWD cache and generates metadata at both semantic and syntactic level.  Data Analysis: - It uses cache SWD’s and metadata to produce analysis with the help of IR analyzer and SWD analyzer.  Interface: - It provides data services to SW community.


Page 89



Web URL’s

Cache

Basic

Relational

Analytical

Data Analysis

IR Analyzer

Figure4: “Swoogle Architecture”

6. CONCLUSION The Semantic Web will contain two kinds of documents. Some will be conventional text documents enriched by annotations that provide metadata as well as machine interpretable statements capturing some of the meaning of the documents’ content. Information retrieval over collections of these documents offers new challenges and new opportunities. Here, presented framework for integrating search that supports Inference engine. We can use Swangling technique to enrich SWD’s to text documents. While many challenges must be resolved to bring this vision to fruition, the benefits of pursuing it are clear. The Semantic Web is also likely to contain documents whose content is entirely encoded in an RDF based markup language such as OWL. We can use the swangling technique to enrich these documents to terms that capture some of their meaning in a form that can be indexed by conventional search engines. Finally, there is also a role for specialized search engines that are designed to work over collections of RDF documents.

REFERENCES [1]. Accessible from T.Berners Lee, “The Semantic Web”, “Scientific American”, May 2007. [2]. Urvi Shah, James Mayfield,” Information Retrieval on the Semantic Web”, “ACM CIKM International Conference on Information Management”, Nov 2002. [3].http://www.mpiinf.mpg.de/departments/d5/teaching/ss03/xmlseminar/talks/CaiEskeWang.pdf. [4].Berners Lee, J.Lassila, “Ontologies in Semantic Web”, “Scientific American”, May (2001) 34-43 [5]. David Vallet, M.Fernandes, “An Ontology-Based Information Retrieval Model”, “European Semantic Web Symposium (ESWS)”, 2006. [6] Tim Finin, Anupam Joshi, Vishal Doshi, “Swoogle: A Semantic Web Search and Metadata Engine”, “In proceedings of the 13th international conference on Information and knowledge management”, pages 461-468, 2004. [7] http://www.daml.org/ontologies


Page 90



[8] U.Shah. T.Finin and A.Joshi. “Information Retrieval on the semantic web”, “Scientific American”, pages 34-43, 2003 [9] T.Finin, J. Mayfield, A.Joshi, “Information retrieval and the semantic web”, “IEEE/WIC International Conference on Web Intelligence”, October 2003. [10] http://www.semanticwebsearch.com [11] http://www.semanticweb.info/schemaweb [12] Berners-Lee and Fischetti, “Weaving the Web: The Original Design of the World Wide Web by its inventor”, “Scientific American”, 2005. [13] Kiryakov, A. Popov, L. Manov, “Semantic annotation, indexing and retrieval”, “Journal of Web Semantics”, 2005-2006. [14] N.Shadbolt, T.Berners Lee and W.Hall, “The Semantic Web revisited”,”IEEE Intelligent Systems” 2006. [15] J. Mayfield, “Ontologies and text retrieval”, “Knowledge Engineering Review”, 2007. [16] J. Carroll, J. Roo, “OWL Web Ontology Language”, “W3C recommendation”, 2004. [17] J. Kopena, A.Joshi, “DAMLJessKB: A tool for reasoning with Semantic Web”, “IEEE Intelligent Systems”, 2006. [18] Jeremy, Lan, Dollin, “Implementing the Semantic Web Recommendations”, “In proceedings of the 13th international conference World Wide Web”, 2004.


Page 91

INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 2, ISSUE 10, OCTOBER 2013

ISSN 2277-8616

A Framework To Convert Relational Database To Ontology For Knowledge Database In Semantic Web Vishal Jain, Dr. Mayank Singh ABSTRACT: Now days, a large amount of data is available on the internet in the form of structured, un-structured and semi-structured data. Basically, today lots of data is available offline and online in the form of data, collection of tables, rows and columns. But now in the era Web 2.0, we have one other mechanism to store the data in the form of ontology, which are very useful for a community as a way of structuring and defining the meaning of the metadata that are currently collected and standardized. In this paper, we are going to discuss various techniques for mapping between the database and ontology. Keywords: Semantic Web, Ontology Mapping, OWL, RDF, Relational Database ————————————————————

1. INTRODUCTION

3. SEMANTIC WEB

Applications using ontologies become more intelligent since they can more deal with human background knowledge. More generally, ontologies are critical for applications which want to merge information from diverse sources [1]. A large amount of data is present on the web. It contains huge number of web pages and to find suitable information from them is very cumbersome task. There is need to organize data in formal manner so that user can easily access and use them. The advance of the Web has significantly and rapidly changed the way of information organization, sharing and distribution [15]. The majority of data underpinning the Web and in domains such as life sciences [NCBI resources] and spatial data management [Green et al., 2008] are stored in RDB with their proven track record of scalability, efficient storage, optimized query execution, and reliability [11].

The semantic approach comes to solve the polysemy problem which that the same word may have different meanings according to the context of sentences [19]. In spite of many efforts by researchers and developers, Semantic Web has remained a future concept or technology. It is not practiced presently. There are few reasons for this which is listed below: (a) Complete Semantic Web has not been developed yet and the parts that have been developed are so poor that they can‘t be used in real world. (b) No optimal software or hardware is provided.

2. INFORMATION RETRIEVAL We view the future web as combination of text documents as well as Semantic markup. Semantic Web (SW) uses Semantic Web documents (SWD‘s) that must be combined with Web based Indexing. Current IR techniques are not so intelligent that they are able to produce Semantic relations between documents. Extracting information manually with the help of XML and string matching techniques like Rabin Karp matcher has not proven successful. To use these techniques normal user has to be aware of all these tools. So, keeping this in mind we have moved to concept of Ontology in Semantic Web. It represents various languages that are used for building SW and increase accuracy.

―SW is not technology, it is philosophy‖ [5]. It is defined as collection of information linked in a way so that they can be easily processed by machines. From this statement, we conclude that SW is information in machine form. It is also called Global Information Mesh [6]. It is also known as framework for expressing information.

4. ONTOLOGY Ontologies are nowadays ubiquitous. Their number, their complexity and the domains they model are increasing considerably [2]. Ontology is defined as a formal specification of a shared conceptualization of some domain knowledge [9]. It involves identifying and extracting relevant pages containing that specific information according to predefined guidelines. There are many IR techniques for extracting keywords like NLP based extraction techniques are used to search for simple keywords. Then we have AeroText system for text extraction of key phrases from text documents.

5. MAPPING BETWEEN ONTOLOGY _____________________________  Vishal Jain, Dr. Mayank Singh  Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad  Associate Professor, Computer Science and Engineering Department, Lingaya’s University, Faridabad  [email protected], [email protected]

DATABASE

TO

The conversion of Relational Database to Ontology is one of the key areas of research. Many researchers have come up with numerous techniques and tools for converting relational databases to ontology. Some of the proposed approaches include RDBOnto, Data Semantic Preservation, DB2OWL, R2O, D2RQ, Semantic Bridge, DartGrid Semantic and Semantic interoperability among others. This paper presents a comparison of the already developed frameworks and tools. In addition, this paper identifies the


9


ISSN 2277-8616

problems and deficiencies of these tools. Finally, this paper proposes a framework that can be developed to overcome the deficiencies and problems posed by other tools and frameworks.

application, which is easy to deploy and integrate in a wide range of web applications. Triplify does not support SPARQL, which other tools like Semantic Bridge and the DataGrid Semantic tool.

5.1 COMPARISON OF ALREADY DEVELOPED TOOLS AND FRAMEWORKS This section of the paper compares some of the well known frameworks used in the conversion of Relational databases to Ontology. The TDBToOnto is one of the renowned frameworks for converting relational databases to ontology. It is a vastly configurable technique that eases the process of designing and implementing ontology basing on a relational database [3]. TDBToOnto is a user-oriented device, which supports the access and input processes. The Asio Semantic Bridge is another renowned framework that uses the table and class approach [3]. The resulting ontology after the conversion consists of a class corresponding to each table in the database. The columns of tables appear as properties of the respective classes. The cardinality of a primary key is set to 1 while the cardinality of a nullable column is set to a maximum of 1. The developed ontology has rules that equate individuals depending on several primary key columns. The Semantic bridge approach helps to rewrite SPARQL queries to SQL and executes the SQL. Another renowned tool for converting relational database to ontology is the DataGrid Semantic Web kit. This tool has the capacity to map and query RDF generated from relational databases. The mapping process involves the conversion of a manual table into classes. This tool is one of the best tools because it has a visual tool that enables users to define mappings. The visual tool helps in the creation of SPARQL queries. The DataGrid Semantic tool then translates SPARQL queries into SQL queries depending on the defined mappings. Another powerful tool in the conversion of RDS to ontology is the DB2OWLtool. This tool is powerful because of its ability to automate the creation of new ontology from an existing relation database. DB2OWL creates ontology from a RDB by looking for particular cases of database tables. This helps to determine the component which will be created from a given database component. DB2OWL expresses the created ontology in the OWL-DL language, which uses Description Logics. The process of mapping begins by detecting particular cases for relations (tables) in the database schema. This tool converts each component (table, column and relation) of the database onto the relevant component (class, property and relation). The SOAM is a framework that represents tables as classes to predicate an approach for creating an ontology schema. This conversion process involves mapping the constraints of the relational model to those of the ontology schema. R2O is another well-known tool used in the conversion of relational databases to ontology. R2O is an XML-based language that helps to express mappings between the elements of relational databases and ontology. R2O mappings are useful in detecting any existing ambiguities and inconsistencies. Triplify is another useful tool in the conversion of relational databases to ontology. It helps to publish linked data and RDF from a relational database. Triplify accomplishes its purpose by mapping HTTP-URI request onto RDS queries expressed in a Structured Query Language. Triplify is a light-weight

5.2 PROBLEMS AND DEFICIENCIES OF EXISTING TOOLS From the above comparison, it is clear that there are numerous tools and frameworks for converting relational database to ontology. This indicates that new frameworks and tools are yet to be developed each new day. Each framework will come with its merits and demerits. In spite of the merits posed by each framework or tool, these frameworks and tools have some problems and deficiencies that need to be fixed. One of the key problems of these tools and framework is the difficulty in standardization. In spite of some few similarities, such as the use of structured query languages, these frameworks and, tools use different approaches, which are difficult to integrate and form a standard tool or framework. Most the tools and frameworks explicated in this paper use diverse representation formats, as well as tool-specific languages. In this case, only experts are able to re-use the crucial artifacts used in the mapping process. Some of the domain-specific problems are eminent in the BD2OWL framework that uses the OWL-DL language, which cannot be implemented in other domainspecific, concept frameworks, tools and applications. Therefore, existing frameworks lack the ability to help users re-use the artifacts. They do not use languages that the user can easily understand or learn. For instance, the use of SPARQL creates a hard task for users who have little knowledge of programming language semantics. This creates a need to integrate a language module that enables users to select and use the most convenient language for converting relational databases to ontology. There is a deficiency of a module that converts programming languages into a simple language for non-programmer to re-use in case the need to re-use code arises. Some relational database conversion tools such as R2O and D2R map use a manual mapping definition process, which is less preferable compared to an automatic process. Manual mapping is not reliable compared to automatic mapping. Therefore, these tools do not consider conversion time as a key factor in the conversion of RDB to ontology. It is extremely vital to address this deficiency by using a software module that automates the mapping process. This is an excellent mechanism of ensuring that the conversion process does not take a lot of time because of manual processes. It is a recommendable mechanism of ensuring that the user does not waste time, which can be allocated to other complicated processes such as querying and searching from a database. Another problem experienced in the current frameworks is the support for only few databases. For instance, BD2OWL supports Mysql and Oracle databases, leaving out other databases such as Microsoft Access. Therefore, there is a deficiency of a framework that can convert different databases from different database programs into ontology. 5.3 NEW PROPOSED FRAMEWORK WHICH CAN BE DEVELOPED In order to overcome the problems of the existing frameworks and tools, it is critical to design an appropriate


10


framework. This paper proposes a framework similar to BD2OWL but has additional features that address the identified problems and deficiencies. This implies that the proposed framework can support many databases as possible and the most used programming languages. In addition, the new framework has the capacity to output information in different formats, which non-programmers can under re-use without the need for an expert. However, the proposed framework will not depend on particular table cases. It is a general framework that is applicable to all tables, whatever the case. The proposed mapping process involves converting tables into classes, which have several properties, as well as relationships. The conversion process will start when the user uses a well-designed user interface to sent queries to the database. The proposed visualization service must be able to present the required queries in a suitable manner. In order to consider the requirements if different users, including those who do not have programming skills, the visualization service should have an interface that has select option for users to key-in commands in a desired language. This should consider all the available programming languages as well as human language, which diverse users can understand. Therefore, it is extremely critical to include a module that translates the input text and instruction into different programming languages. In addition, the new framework ought to incorporate a module that enables users to export information into different formats apart from the default format. Users must be able to output information in the form of text files, tables and datasets among others. The table name (Ti) must match to the class name (Ci.name) and each property (Yi) must match with each column (Li) of the table. In addition, The primary Key of each table (P(Ti)) must match to a class id (C.id) in the generated ontology. The same applies for the foreign keys in order to ensure maximum referential integrity. The subscripted ―i‖ refers to the number and, therefore, the subscript for each element depends on the total number of elements. For instance of there are five tables in the relational database, the each table (T) will have a subscript ranging from 0 to 4, and each table must have a unique subscript. Therefore, the number of classes created in the ontology will be the same as the number of tables in the relational database. In addition, the number of class properties will be the same as the number of columns in the database. The same applies for the primary keys and the foreign keys. In order to overcome the problems and deficiencies of existing, there will be a dynamic mapping mediator, which enables the user to convert data from multiple database sources and store it into easily readable text files. The new system ought to support heterogeneous data sources of data such as preformatted text files, MS Excel, MS Access, MySQL and Oracle among others. The new framework will have the capability to automate the translation of SPARQL queries using the mappings of a mediator class.

demerits. Data presentation and output formats and languages are crucial concerns. The proposed frameworks will ensure that there is maximum data integrity in after conversion. In addition, it offers users the ability to customize queries depending on their literacy level. Automation is also a critical part of the proposed framework.

ACKNOWLEDGEMENT I, Vishal Jain would like to give my sincere thanks to Prof. M. N. Hoda, Director, Bharati Vidyapeeth‘s Institute of Computer Applications and Management (BVICAM), New Delhi for giving me opportunity to do Ph.D from Lingaya‘s University, Faridabad.

REFERENCES [1].

http://conferences.telecombretagne.eu/data/odbis2008/cfp_ODBIS2008.pdf

[2].

Moussa Benaissa and Yahia Lebbah, ―A Constraint Programming Based Approach to Detect Ontology Inconsistencies‖, The International Arab Journal of Information Technology(IAJIT), Vol. 8, No. 1, January 2011

[3].

Satya S. Sahoo, Wolfgang Halb, Sebastian Hellmann, Kingsley Idehen, Ted Thibodeau Jr, Sören Auer, Juan Sequeda, Ahmed Ezzat, ― A Survey of Current Approaches for Mapping of Relational Databases to RDF‖, W3C RDB2RDF Incubator Group January 08 2009

[4].

Hadjila Fethallah and Chikh Mohammed Amine, ―Automated Retrieval of Semantic Web Services: A Matching Based on Conceptual Indexation”, The International Arab Journal of Information Technology (IAJIT), Vol. 10, No. 1, January 2013 61

[5].

Accessible from T.Berners Lee. ―The Semantic Web‖. ―Scientific American‖, May 2007.

[6].

Urvi Shah, James Mayfield,‖ Information Retrieval on the Semantic Web‖, ―ACM CIKM International Conference on Information Management‖, Nov 2002.

[7].

Michael Wick, Khashayar, Rohanimanesh, Andrew McCallum, AnHai Doan, ―A Discriminative Approach to Ontology Mapping‖, VLDB ‗08, August 2430, 2008, Auckland, New Zealand

[8].

Zhuoming Xu, Shichao Zhang, and Yisheng Dong, Mapping between Relational Database Schema and Owl Ontology for Deep Annotation, WI'06: Proceedings of the 2006 IEEE/WIC/ACM InternationalConferenceon Web Intelligence, IEEE Computer Society, 2006, pp. 548-552.

[9].

Zakaria Elberrichi, Abdelattif Rahmoun, and Mohamed Amine Bentaalah, ―Using WordNet for Text Categorization‖, The International Arab Journal of Information Technology (IAJIT), Vol. 5, No. 1, January 2008

CONCLUSION This paper emphasis on the concept of Ontology Mapping, discuss various approaches for converting relational database to ontology and vice-versa. It is evident the conversion of relational databases to ontology is a diverse process and the frameworks and tools used are diverse. These frameworks and tools have their merits and

ISSN 2277-8616


11


ISSN 2277-8616

[10]. R. Ghawi and N.Cullot, ―Database-to-Ontology Mapping Generation for Semantic Interoperability‖, 2007

Approaches for eRecruitment‖, International Journal of Computer Applications (0975 – 8887) Volume 51– No.2, August 2012

[11]. Abdelkader Dekdouk, ―Ontology-Based Intelligent Mobile Search Oriented to Global e-Commerce‖, The International Arab Journal of Information Technology, Vol. 7, No. 1, January 2010

[22]. Mohd Amin Mohd Yunus, Roziati Zainuddin, and Noorhidawati Abdullah, ―Semantic Method for Query Translation‖, The International Arab Journal of Information Technology (IAJIT), Accepted in May 24, 2011

[12]. Nadine Cullot, Raji Ghawi, and Kokou Ytongnon, Db2owl: A Tool for Automatic Database-to-Ontology Mapping, SEBD, 2007, pp.491-494. [13]. Green, J., Dolbear, C., Hart, G., Engelbrecht, P., Goodwin, J., "Creating a semantic integration system using spatial data", in International Semantic Web Conference 2008 Karlsruhe, Germany. [14]. Hu, W., Qu, Y., ―Discovering Simple Mappings Between Relational Database Schemas and Ontologies‖, In Proc. of 6th International Semantic Web Conference (ISWC 2007), 2nd Asian Semantic Web Conference (ASWC 2007), LNCS 4825, pages 225‐ 238, Busan, Korea, 11‐ 15 November 2007. [15]. Eduard Dragut Ramon Lawrence, Composing Mappings Between Schemas Using a Reference Ontology, International Conference on Ontologies, Databases and Application of Semantics (ODBASE), Springer, 2004, pp.783-800. [16]. Yassaman Zand Moghaddam and Joe D. Horton, ―RELATIONAL DATABASE SCHEMA TO ONTOLOGY MAPPING APPROACHES‖, 10th November 2010 [17]. Sidi Benslimane, Mimoun Malki, Mustapha Rahmouni, and Adellatif Rahmoun, ―Towards Ontology Extraction from Data-Intensive Web Sites: An HTML Forms-Based Reverse Engineering Approach‖, The International Arab Journal of Information Technology (IAJIT), Vol. 5, No. 1, January 2008

[23]. Mahamat Hassan and Azween Abdullah, ―A New Grid Resource Discovery Framework‖, The International Arab Journal of Information Technology (IAJIT), Vol. 8, No. 1, January 2011 [24]. Bouchiha Djelloul, Malki Mimoun, and Mostefai Abd El Kader, ―Towards Reengineering Web Applications to Web Services”, The International Arab Journal of Information Technology (IAJIT), Vol. 6, No. 4, October 2009 [25]. Sidi Benslimane, Mimoun Malki, and Djelloul Bouchiha, ―Deriving Conceptual Schema from Domain Ontology: A Web Application Reverse Engineering Approach”, The International Arab Journal of Information Technology (IAJIT), Vol. 7, No. 2, April 2010

About the Authors Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing PhD from Computer Science and Engineering Department, Lingaya‘s University, Faridabad. Presently he is working as Assistant Professor in Bharati Vidyapeeth‘s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE.

[18]. Petros Papapanagiotou, Polyxeni Katsiouli, Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades, ―RONTO: RELATIONAL TO ONTOLOGY SCHEMA MATCHING‖, AIS SIGSEMIS, 2005 [19]. Guntars Bumans, ―Mapping between Relational Databases and OWL Ontologies: an Example‖, Scientific Papers, University of Latvia , 2010. Vol. 756 Computer Science and Information Technologies [20]. C. Kavitha, G. Sudha Sadasivam, Sangeetha N. Shenoy, ―Ontology Based Semantic Integration of Heterogeneous Databases‖, European Journal of Scientific Research, ISSN 1450-216X Vol.64 No.1 (2011), pp. 115-122

Dr. Mayank Singh has completed his M. E in software engineering from Thapar University and PhD from Uttarakhand Technical University. His Research area includes Software Engineering, Software Testing, Wireless Sensor Networks and Data Mining. Presently he is working as Associate Professor in Krishna Engineering College, Ghaziabad. He is associated with CSI, IE (I), IEEE Computer Society India and ACM.

[21]. Fuad Mire Hassan, Imran Ghani, Muhammad Faheem, Abdirahman Ali Hajji, ― Ontology Matching 192 IJSTR©2013 www.ijstr.org

12

I.J. Intelligent Systems and Applications, 2013, 09, 67-75 Published Online August 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2013.09.08

Ontology Development and Query Retrieval using ProtégéTool Vishal Jain Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad, India E-mail: [email protected] Dr. Mayank Singh Associate Professor, Krishna Engineering College, Ghaziabad, India E-mail: [email protected] Index Terms— Semantic Web, Ontology Development, OWL, Protégé3.1

Abstract— This paper highlights the explicit description about concept of ontology which is concerned with the development and methodology involved in building ontology. The concept of ontologies has contributed to the development of Semantic Web where Semantic Web is an extension of the current World Wide Web in which information is given in a well-defined meaning that translates the given unstructured data into knowledgeable representation data thus enabling computers and people to work in cooperation. Thus, we can say that Semantic Web is information in machine understandable form. It is also called as Global Information Mesh (GIM). Semantic Web technology can be used to deal with challenges including traditional search engines and retrieval techniques within given organizations or for ecommerce applications whose initial focus is on professional users. Ontology represents information in a manner so that this information can also be used by machines not only for displaying, but also for automating, integrating, and reusing the same information across various applications which may include Artificial Intelligence, Information Retrieval (IR) and many more. Ontology is defined as a collection of set of concepts, their definitions and the relationships among them represented in a hierarchical manner that is termed as Taxonomy. There are various tools available for developing ontologies like Hozo, DOML, and AltovaSemantic Works etc. We have used protégé which is one of the most widely used ontology development editor that defines ontology concepts (classes), properties, taxonomies, various restrictions and class instances. It also supports several ontology representation languages, including OWL. There are various versions of protégé available like WebProtege 2.0 beta, Protégé 3.4.8, Protégé 4.1 etc. In this paper, we have illustrated ontology development using protégé 3.1 by giving an example of Computer Science Department of University System. It may be useful for future researchers in making ontology on protégé version 3.1.


I.

Introduction

World Wide Web is the largest database in the Universe which is mostly understandable by human users and not by machines. WWW is human focused web. It discovers documents for the people. It lacks the existence of a semantic structure which maintains interdependency and scalability of its components. It returns results of given query with the help of hyperlinks between resources. It produces large number of results that may or may not satisfy user’s query. It results in the presentation of irrelevant information to the user. In the current web, resources are accessible through hyperlinks to web content spread throughout the world. The content of information is machine readable but not machine understandable. Use of current www does not support the concept of ontologies and users cannot make inferences due to unavailability of complete data. An enormous collection of unstructured data present on web leads to problems in extracting information about a particular domain. Hence information extraction is a logical step to retrieve relevant data and the extracted information. The word Information Retrieval is explicitly defined as process of extracting relevant results in context of given query. It is described as the task of identifying documents on the basis of properties assigned to the documents by various users requesting for retrieval. There are many Information Retrieval techniques for extracting keywords like NLP based extraction techniques. Content-based image retrieval system requires users to adopt new and challenges search strategies based on the visual pictures of images [1]. Multimedia information retrieval provides retrieval capabilities of text images and different dimensions like form, content and structure. When text annotation is nonexistent and incomplete content-based method must be used. Retrieval accuracy can be improved by content-based methods [2].

193

I.J. Intelligent Systems and Applications, 2013, 09, 67-75

68

Ontology Development and Query Retrieval using ProtégéTool

The remaining sections of paper are as follows. Section 2 makes readers aware of Semantic Web including its architecture and its importance as future web technology. In this section, we have also discussed about Ontology and its components. A list of differences is shown on Relational Database and Ontology. Section 3 defines development of ontology on “Computer Science Department” using Protégé tool via Case Study.

follow same syntax in writing properties. Therefore, it is located on top of the XML layer [10]. RDF Schema (rdfs) provides modeling primitives for organizing Web objects into hierarchies. Its key primitives are classes and properties, subclass and sub property relationships, and domain and range restrictions [11]. RDF Schema is based on RDF. RDF Schema is RDF vocabulary description language. It represents relationship between groups of resources. The Logic layer is used in development of ontology and producing a knowledgeable representation document written in either XML or RDF. The Proof layer involves the actual deductive process as well as the representation of proofs in Web languages (from lower levels) and proof validation [12]. Finally, the Trust layer will emerge through the use of digital signatures and other kinds of knowledge, based on recommendations. The Semantic Web is envisioned as a collection of information linked in a way that can be easily processed by machine. This whole vision depends on agreeing upon common standards - something that is used and extended everywhere [13, 14].

II. Semantic Web 2.1 Importance This futuristic concept of Semantic Web is needed to make our present web more precise and effective by increasing the structure and size of current web. Semantic Web (SW) uses Semantic Web documents (SWD’s) that must be combined with Web based Indexing. The idea of Semantic Web (SW) as envisioned by Tim Bermers Lee came into existence in 1996 with the aim to translate given information into machine understandable form.

2.2 Definition Semantic Web is the new-generation Web that tries to represent information such that it can be used by machines not just for display purposes, but for automation, integration, and reuse across applications [3]. The emerging Semantic Web technology has revolutionized the way we use the Web to find and organize information. It is defined as framework of expressing information because we can develop various languages and approaches for increasing IR effectiveness. Semantic Web (SW) uses Semantic Web documents (SWD’s) that are written in SW languages like OWL, DAML+OIL. We can say that Semantic Web documents are means of information exchange in Semantic Web (SW).The Semantic Web (SW) is an extension of current www in which documents are filled by annotations in machine understandable markup language. Semantic Web technology can be used first to address efficiency, productivity and scalability challenges within Enterprises or for e-commerce applications and the initial focus is on professional users [4].

Fig. 1: “Semantic Web layered Architecture [5]”

Berners-lee outlined the architecture of the Semantic Web in the following 3 layers [15]: The metadata layer: It contains the concepts of resource and properties and RDF (Resource Description Framework), most popular data model for the metadata layer. The schema layer: Web ontology languages (OWL) are introduced here to define a hierarchical description of concepts (is-a hierarchy) and properties and RDFS (RDF Schema) is a popular schema layer language.

Tim Berner Lee (Inventor of Web, HTTP, & HTML) says that Semantic web will be the next generation of Current Web and the next IT revolution [6, 7, and 8]. It is treated as future concept or technology. In the Fig. 1, at the bottom of the architecture we find XML, a language that lets enables us to write structured documents according to predefined guidelines or syntax. XML is particularly suitable for sending documents across the Web [9]. RDF is a basic data model for writing simple statements about Web objects (resources). RDF Model has three components: Resource, Property and Statement. Both XML and RDF Copyright © 2013 MECS

The logical layer: Set of web ontology languages are introduced at this layer to provide a richer set of modeling primitives in which Semantic Web plays a very important role to replace slow, ineffective, inefficient, & non intelligent web processes by fast, effective and inexpensive automatic processes. We can make our web more precise and increase retrieval capacity by adding annotations to documents. The 194



Semantic Web will allow both humans and machines to find and make use of data in modern ways that previously haven't been possible by www.

69

Both Semantic Web (SW) and World Wide Web (www) are different from each other in various aspects which are described in the form of table as shown

Table 1: “Comparison between Web and Semantic Web” [16] Feature

WWW

Semantic Web

Fundamental Component

Unstructured Content

Structured Content

Primary Audience

Humans

Applications

Links

Indicate Location

Indicate Location and Meaning

Primary Vocabulary

Formatting Instructions

Semantics and Logics

Logic

Informal/Nonstandard

Descriptive Logic

Resources

Web Pages, Photos, Videos etc.

Web Pages, Photos, People

Search Engines

Google, Yahoo etc.

Hakia, Sindise etc.

Use of Ontology

Not Applicable

Applicable

Inference

WWW users cannot reach to conclusions.

SW users can reach.

and constraints. Fig. 2 displays the Ontology and its Constituents parts.

The WWW consists primarily of content for human consumption. Content links to other content on the WWW via the universal Resource Locator (URL). The URL relies on surrounding context (if any) to communicate the purpose of the link that it represents; usually the user infers the semantics. Web content typically contains formatting instructions for a nice presentation, again for human consumption [17]. WWW content does not have any formal logical constructs. Correspondingly, the Semantic Web consists primarily of statements for application consumption. The statements link together via constructs that can form semantics, the meaning of the link. Thus, link semantics provide a defined meaningful path rather than a user-interpreted one. The statements may also contain logic that allows further interpretation and inference of the statements.

Fig. 2: “Ontology and its components [20]”

2.3 Ontology Advantages

The term ontology can be defined in many different ways. Genesereth and Nilsson defined Ontology as an explicit specification of a set of objects, concepts, and other entities that are presumed to exist in some area of interest and the relationships that hold them. It enables the Web for software components can be ideally supported through the use of Semantic Web technologies [18]. This helps in understanding the concepts of the domain as well as helps the machine to interpret the definitions of concepts in the domains and also the relations between them. Ontologies can be broadly divided into two main types: lightweight and heavyweight. Lightweight Ontologies involve taxonomy (or class hierarchy) that contains classes, subclasses, attributes and values. Heavy weight Ontologies model domains in a deeper way and include axioms and constraints [19]. Ontology layer consists of hierarchical distribution of important concepts in the domain and describing about the Ontology concepts, relationships Copyright © 2013 MECS

There are many advantages of using ontology in the Semantic Web technology. Some of them are as follows [21, 22]:  Sharing common understanding of the structure of information among people or software agents is one of the more common goals in developing Ontologies [23].  Ontology enables reusability of domain knowledge in representing concepts and their relationships.  Making explicit domain assumptions underlying an implementation makes it possible to change these assumptions easily if our knowledge about the domain changes [24].  Separating the domain knowledge from the operational knowledge is another common use of ontologies. We can describe a task of configuring a product from its components according to a required

195


70

  

 


 SWRL: - It stands for Semantic Web Rule Language. It adds rules to OWL+DL.  OWL: - It stands for Web Ontology Language. It is used to represent relations between entities by using formal semantics and vocabulary.

specification and implement a program that does this configuration independent of the products and components themselves [25]. Use of ontology enables to analyze domain knowledge on basis of declared terms in a document. Each user has its defined attributes and relationships between other users. Ontology is considered as backbone of Software. Since SW translates the given data into machine understandable language using concept of ontologies [26]. Ontology development is a cooperative process; it allows different peoples to express their views on given domain. Ontology language editors helps to build SW.

Ontology Editors: - They are applications designed to assist modifications of ontology. Various editors are listed below:  Protégé: - It is free, open source and knowledge requisition system. It is written in Java and uses Swings to create the complex user interface.  DOME: - It stands for DERI Ontology Management Environment. It is designed to create effective management of ontologies.  Onto Lingua: - It is an ontology developed by OnTO Knowledge Project. It implements Ontology construction process.  Altova SemanticWorks: - It is an RDF document editor and ontology development IDE. It creates and edits RDF documents, RDF Schema and OWL ontologies.

2.4 Ontology Languages and Editors It is defined as formal language used to encode ontology. Various languages are listed below:  DAML+OIL: - DAML stands for DARPA Agent Markup Language. DARPA stands for Defense Advanced Research project Agency. OIL stands for Ontology Interchange Language. This language uses Description Logic (DL) to express this language.

Table 2: “Comparison between RDBMS and Ontology” Feature

Relational Database

Knowledgebase

Structure

Schema

Ontology Statements

Data

Rows

Instance Statements

Administration Language

DDL

Ontology Statements

Query Language

SQL

SPARQL

Relationships

Foreign Keys

Multidimensional

Logic

External of Database/Triggers

Formal Logic Statements

Uniqueness

Keys of Table

URI

broad range of application domains. There are various versions of protégé available out of which the frequently used ones are: protégé 2000, protégé 3.1, protégé 3.4 beta, protégé 3.4(released recently) and protégé4.0 beta.

III. Case Study The Computer Science Department Ontology describes various terms used in a computer science department. It shows the terms and their inheritance but not the relationships. For example, A Professor inherits from a Teaching which inherits from the Staff which is a generalization of a Person. Similarly Assistant inherits from Non Teaching which in turn inherits from Staff which in turn Person. The Screen Shot of Computer Science Department is shown in Fig. 3.

Computer Science Department Ontology Computer Science Person Staff Teaching (faculty) Professor Reader Lecturer Non-Teaching Assistant Technician Student Post Graduate Graduate Publication Books Journals

3.1 Ontology Development Tool: Protégéis an open-source tool for editing and managing Ontologies. It is the most widely used domain-independent, freely available, platformindependent technology for developing and managing terminologies, Ontologies, and knowledge bases in a Copyright © 2013 MECS

196



It provides a rich set of knowledge modeling structures. We have used the protégé version 3.1 to develop my Ontology on Computer Science Department. It provides the facility to support for multi user system, class trees on different tabs are synchronized by default,

71

standard max memory allocation is 100 MB, RDF backend validates frame names, improved handling of sub slots and database backend correctly identifies MSSQL server and optimizes table creation accordingly.

Fig. 3:“Computer Science Department Ontology”

Fig.3, shows the Ontology on “Computer Science Department with the help of Protégétool.

:SYSTEM-CLASS :STANDARD-CLASS

3.2 Code Snippets

Following are different various Code snippet of Computer Science Department Ontology, developed in Protégé3.1

:ROLE Abstract

XML Code Snippet

:THING


197


72


:ROLE

Concrete

Concrete

Teaching

Person

ID

Sal

RDF Code Snippet

Teaching

:STANDARD-CLASS

:ROLE

Concrete

Professor

:ROLE

Concrete

OWL Code Snippet

Teaching

K'z/EdD/E/E'

s/^,>:/E Z^Z,^,K>Z KDWhdZ^/EΘE'/EZ/E'WZdDEd >/E'z͛^hE/sZ^/dz &Z/ ''EW^/E', ^dhEd 'hZhd',,hZ/E^d/dhdK&d,EK>K'z ''^/EZWZ^d,hE/sZ^/dz >,/ Z͘DzEd/'Ed^z^dD^ tŝƚŚĂĚǀĞŶƚŽĨĐŽŶĐĞƉƚŽĨKŶƚŽůŽŐǇĂŶĚ^ĞŵĂŶƚŝĐtĞď ď͕ǁĞĂƌĞĂďůĞƚŽƵƐĞĂŶŐƵĂŐĞƐĂŶĚƐĞƌǀŝĐĞƐĂƌĞ ƉƌŽǀŝĚĞĚĨŽƌĚŝĨĨĞƌĞŶƚĞŶǀŝƌŽŶŵĞŶƚƐǁŚĞƌĞ/ŶƚĞƌĨĂĐĞ>ĂŶŐƵĂŐĞƌĞĨĨĞƌƐƚŽĚĞĨŝŶŝŶŐŽĨĚĂƚĂŽďũĞĐƚƐĂŶĚƚŚĞŝƌ ůŽĐĂƚŝŽŶ͘ ŽŶKďũĞĐƚZĞƋƵĞƐƚƌŽŬĞƌƌĐŚŝƚĞĐƚƵƌĞͿ͕ 9 ŐĞŶƚƐĂƌĞĂďůĞƚŽĂĐĐĞƐƐƐŽŵĞĚŝƐƚƌŝďƵƚĞĚŵŽĚĞĞůƐƚŽĞŶĂďůĞŝŶƚĞƌĂĐƚŝŽŶ ďĞƚǁĞĞŶƉƌŽĐĞƐƐĞƐůŝŬĞKZ;ŽŵŵŽ ZD/;ZĞŵŽƚĞDĞƚŚŽĚ/ŶǀŽĐĂƚŝŽŶͿ͘ Ϯ͘ϭDh>d/'Ed^z^dD^;D^Ϳ Ž ŬŶŽǁůĞĚŐĞ ƉƌŽĐĞƐƐĞƐ͕ ǁĞ ŚĂǀĞ ƵƐĞĚ WϮW ĂƌĐŚŝƚĞĐƚƵƌĞ ĨŽƌ ĞŶĂďůŝŶŐ ĚŝƐƚƌŝďƵƚĞĚ ĐŽŶƚƌŽů ŽĨ ŬŶŽǁůĞĚŐĞ WƌŽďůĞŵ͗ Ͳ tŚŝůĞ ƵƐŝŶŐ ĂƚƚŝĐĞŝƐƐƚƌƵĐƚƵƌĞƚŚĂƚŝƐŝŵƉƌŽǀĞĚǁŝƚŚĂĚĚŝƚŝŽŶŽĨŽďũĞĐƚƐĂŶĚĨĞĂƚƵƌĞƐ͘ ¾ ŽŶĐĞƉƚƐŝŶůĂƚƚŝĐĞĂƌĞŐĞŶĞƌĂƚĞĚĂƵƚŽŵĂƚŝĐĂůůǇĂŶĚƵŶĚĞƌŐƵŝĚĂŶĐĞŽĨƐƵƉĞƌǀŝƐŽƌ͘ ¾ ŽŶĐĞƉƚƐĂƌĞĐůĞĂƌůǇůĂďĞůĞĚďǇƚŚĞƐƵƉĞƌǀŝƐŽƌ͘ &/'hZͲϱ͗&KZDd/KEK&KEWd^

+ Objects

= Features

Lattice

ϯ͘/EdZKhd/KEdKdD/E/E' ĂƚĂŵŝŶŝŶŐŝƐĂƚĞĐŚŶŽůŽŐǇƚŚĂƚŝƐƵƐĞĚĨŽƌŝĚĞŶƚŝĨǇŝŶŐƉĂƚƚĞƌŶƐĂŶĚǁĂǇƐĨƌŽŵůĂƌŐĞƋƵĂŶƚŝƚŝĞƐŽĨĚĂƚĂŽƌŽƚŚĞƌŝŶĨŽƌŵĂƚŝŽŶƌĞƉŽƐŝƚŽƌŝĞƐ͘dŚŝƐƚĞĐŚŶŽůŽŐǇǁŽƌŬƐ ŝŶĂǁĂǇƚŚĂƚŝƚĂĚŽƉƚƐĚĂƚĂŝŶƚĞŐƌĂƚŝŽŶŵĞƚŚŽĚƚŽŐĞŶĞƌĂƚĞĂƚĂtĂƌĞŚŽƵƐĞ͘tŝƚŚŚĞůƉŽĨĂůŐŽƌŝƚŚŵ͕ŝƚĞǆƚƌĂĐƚƐĂŐĞŶƚŝŶĨŽƌŵĂƚŝŽŶ͘ ĂƚĂDŝŶŝŶŐŝŶǀŽůǀĞƐƵƐĞŽĨƐĞǀĞƌĂůĂŐĞŶƚƐĂŶĚĚĂƚĂƐŽƵƌĐĞƐŝŶǁŚŝĐŚĂŐĞŶƚƐĂƌĞĐŽŶĨŝŐƵƌĞƐƚŽǁŽƌŬŽŶďĂƐŝƐŽĨŵŝŶŝŶŐĂůŐŽƌŝƚŚŵƐƚŽƐŽůǀĞŵƵůƚŝƉůĞƚĂƐŬƐ͘ĂƚĂ DŝŶŝŶŐŝƐŽƌŝŐŝŶĂƚĞĚĨƌŽŵyDh>d/'Ed^z^dD^΀ϲ΁

CMAS

Agents

E

Publish

Special Ontology

ŽŵŵŽŶKŶƚŽůŽŐǇ

ŽŶĐĞƉƚŽĨĂƚĂDŝŶŝŶŐƐƵĨĨĞƌĞĚĨƌŽŵŵĂŶǇĐŚĂůůĞŶŐĞƐĂƐƐŚŽǁŶďĞůŽǁ͗ o dŚĞƌĞŝƐŚƵŐĞĂŵŽƵŶƚŽĨĚĂƚĂůŽĐĂƚĞĚĂƚĚŝĨĨĞƌĞŶƚƐŝƚĞƐ͘dŽŝĚĞŶƚŝĨǇƉĂƚƚĞƌŶƐĨƌŽŵƚŚĞŵĐĂŶĞĂƐŝůǇĞǆĐĞĞĚůŝŵŝƚ͘ o ^ŝŶĐĞĂƚĂDŝŶŝŶŐŝŶǀŽůǀĞƐůĂƌŐĞĚĂƚĂƐĞƚƐƚŚĞƌĞĨŽƌĞŝƚŝƐƉƌĞĨĞƌƌĞĚƚŽĚŝƐƚƌŝďƵƚĞĚĂƚĂŝŶƚŽĨƌĂŐŵĞŶƚƐŝŶŽƌĚĞƌƚŽĂĐŚŝĞǀĞƉĂƌĂůůĞůƉƌŽĐĞƐƐŝŶŐ͘ o /ŶŵĂŶǇĐĂƐĞƐ͕ĚĂƚĂǁŚŝĐŚŝƐƚŽďĞŵŝŶĞĚŝƐƉƌŽĚƵĐĞĚĂƚĞŝƚŚĞƌŚŝŐŚĞƌƌĂƚĞŽƌŝƚĐŽŵĞƐŝŶƐŵĂůůƉĂĐŬĞƚƐ͘/ƚŵĂǇůĞĂĚƚŽǁĂƐƚĂŐĞŽĨĚĂƚĂĂŶĚůĞƐƐĂĐĐƵƌĂĐǇ͘ o ĂƚĂĞǆƚƌĂĐƚĞĚĂĨƚĞƌŵŝŶŝŶŐƉƌŽĐĞƐƐŝƐŶŽƚƐĞĐƵƌĞƚŽƚƌĂŶƐĨĞƌŽŶĂůůŶĞƚǁŽƌŬƐ͘ ϯ͘ϭ/^dZ/hddD/E/E';DͿ WƌŽďůĞŵ͗Ͳ^ŝŶĐĞŚƵŐĞĂŵŽƵŶƚŽĨĚĂƚĂŝƐůŽĐĂƚĞĚĂƚĚŝĨĨĞƌĞŶƚƐŝƚĞƐĂŶĚĚĂƚĂŝƐŶŽƚƚƌĂŶƐĨĞƌƌĞĚƚŽŽƚŚĞƌƐŝƚĞƐĚƵĞƚŽƉƌŝǀĂĐǇŝƐƐƵĞƐ͘^ŝŶŐůĞĂƚĂDŝŶŝŶŐƚĞĐŚŶŝƋƵĞƐ ĚĞĂůƐǁŝƚŚĐĞŶƚƌĂůŝǌĞĚƐƚƌƵĐƚƵƌĞĂŶĚƚŽĂĐŚŝĞǀĞĨĂƐƚĞƌƉƌŽĐĞƐƐŝŶŐ͕ǁĞŚĂǀĞƚŽĚŝƐƚƌŝďƵƚĞĚĂƚĂƐŽƵƌĐĞƐŝŶƚŽĨƌĂŐŵĞŶƚƐ͘ ^ŽůƵƚŝŽŶ͗ͲD ŶĂůǇƐŝƐ͗ͲKďũĞĐƚŝǀĞŽĨDŝƐƚŽƉĞƌĨŽƌŵĚĂƚĂŵŝŶŝŶŐŽƉĞƌĂƚŝŽŶƐďĂƐĞĚŽŶƚǇƉĞĂŶĚĂǀĂŝůĂďŝůŝƚǇŽĨĚĂƚĂƐŽƵƌĐĞƐ͘/ƚŚĂƐĂůŐŽƌŝƚŚŵƐƚŽƉĞƌĨŽƌŵŵŝŶŝŶŐŽƉĞƌĂƚŝŽŶƐ ĂƚĐĞŶƚƌĂůŝǌĞĚůŽĐĂƚŝŽŶĂŶĚůĞĂĚƐƚŽĚŝƐƚƌŝďƵƚĞĚĚĂƚĂ͘ ŽŶĐĞƉƚŽĨDĂƌŝƐĞƐĨƌŽŵŝƚƐĚĞĐĞŶƚƌĂůŝǌĞĚĂƌĐŚŝƚĞĐƚƵƌĞƚŚĂƚƌĞĂĐŚĞƐĞǀĞƌǇŶĞƚǁŽƌŬ͘/ƚŝŶĐƌĞĂƐĞƐƐĞĐƵƌŝƚǇŽĨĚĂƚĂǁŚĞŶŝƚŝƐƚƌĂŶƐĨĞƌƌĞĚƚŽŽƚŚĞƌŶĞƚǁŽƌŬƐ͘D ŝƐ ĐŽŵƉůĞǆ ƐǇƐƚĞŵ ƚŚĂƚ ĨŽĐƵƐĞƐ ŽŶ ƚŚĞ ĚŝƐƚƌŝďƵƚŝŽŶ ŽĨ ƌĞƐŽƵƌĐĞƐ ŽǀĞƌ ǁŝĚĞ ŶĞƚǁŽƌŬ ĂƐ ĚĂƚĂ ŵŝŶŝŶŐ ƉƌŽĐĞƐƐĞƐ͘ /ƚ ĞǆƚƌĂĐƚƐ ƵƐĞĨƵů ƉĂƚƚĞƌŶƐ ĨƌŽŵ ĚŝƐƚƌŝďƵƚĞĚ ŚĞƚĞƌŽŐĞŶĞŽƵƐĚĂƚĂďĂƐĞƐ͘/ƚŚĂƐĨƌĂŵĞǁŽƌŬĚĞƐŝŐŶĞĚĂƐĂŶŝŶƚĞŐƌĂƚĞĚ'h/ďĂƐĞĚĞŶǀŝƌŽŶŵĞŶƚƚŚĂƚĐŽŶƐƚƌƵĐƚƐƚŚĞĚĞƐŝŐŶƉƌŽĐĞƐƐŽĨŵƵůƚŝĂŐĞŶƚƐǇƐƚĞŵ ΀ϳ΁͘/ƚ ŵĂŬĞƐƵƐĞŽĨĂŐĞŶƚƐĨŽƌƵƉĚĂƚŝŶŐƚŚĞƐĞƐǇƐƚĞŵƐĨƌŽŵƚŝŵĞƚŽƚŝŵĞ͘/ƚŝƐƐŚŽǁŶďĞůŽǁ͗ &/'hZͲϴ͗D&ZDtKZ:KhZE>K&Z^Z,/EKDWhdZWW>/d/KEΘDE'DEd ϭϭϭ 225 DŽŶƚŚůǇŽƵďůĞͲůŝŶĚWĞĞƌZĞǀŝĞǁĞĚ;ZĞĨĞƌĞĞĚͬ:ƵƌŝĞĚͿKƉĞŶĐĐĞƐƐ/ŶƚĞƌŶĂƚŝŽŶĂůĞͲ:ŽƵƌŶĂůͲ/ŶĐůƵĚĞĚŝŶƚŚĞ/ŶƚĞƌŶĂƚŝŽŶĂů^ĞƌŝĂůŝƌĞĐƚŽƌŝĞƐ ŚƚƚƉ͗ͬͬŝũƌĐŵ͘ŽƌŐ͘ŝŶͬ

VOLUME NO. 3 (2013), ISSUE N O. 01 (J ANUARY Y)

ISSN 2231-1009

͗Ͳ/ƚŝƐĐŽŵƉŽƐĞĚŽĨĐŽŵƉŽŶĞŶƚƐƚŚĂƚŚĂŶĚůĞŵĞƚĂĚĂ ĂƚĂ͘/ƚŽƉĞƌĂƚĞƐŝŶĂŶǇŚŽƐƚ͘ŝƐƚƌŝďƵƚĞĚůĂƐƐŝĨŝĐĂƚŝŽŶDƵůƚŝŐĞŶƚ^^ǇƐƚĞŵ;D^ͿŝƐĐĂƚĞŐŽƌŝǌĞĚŝŶƚŽƚǁŽ ƉĂƌƚƐĂƐƐŚŽǁŶŝŶƚĂďůĞϮ͘ dĂďůĞͲϭ͗DD^ Dd>s>WZd ^KhZ^WZd /ƚŚĂƐĨŽůůŽǁŝŶŐĐŽŵƉŽŶĞŶƚƐ͗ /ƚŚĂƐĨŽůůŽǁŝŶŐĐŽŵƉŽŶĞŶƚƐ͗ ŶŝŶŐŐĞŶƚ͗Ͳ/ƚƉĞƌĨŽƌŵƐƚƌĂŝŶŝŶŐĂŶĚ • ĞĂƌŶ • ĂƚĂ^ŽƵƌĐĞDĂŶĂŐĞŵĞŶƚ&ƵŶĐƚŝŽŶ͗Ͳ/ƚƉĂƌƚŝĐŝƉĂƚĞƐŝŶĚŝƐƚƌŝďƵƚĞĚ DĞƚĂůĞǀĞůĂŐĞŶƚƐŝŶƚĞƐƚŝŶŐ ƚĞƐƚŝŶŐŽĨDĞƚĂĚĂƚĂ͘/ƚŵ ŵĂŶĂŐĞƐĚĞƐŝŐŶŽĨDĞƚĂŵŽĚĞůƵƐĞĚ ĚĞƐŝŐŶŽĨŽŶƚŽůŽŐǇĂŶĚĐŽůůĂďŽƌĂƚĞƐǁŝƚŚD ĂŶĚƚƌĂŝŶŝŶŐŵĞƚŚŽĚ͘ ŝŶĚĞĐŝƐŝŽŶŵĂŬŝŶŐ͘ • /dz ^ŝŶĐĞĚĞĐĞŶƚƌĂůŝǌĂƚŝŽŶŽĨĚĂƚĂŝƐƚŚĞŵĂŝŶƌĞƋƵŝƌĞŵĞŶƚ ĨŽƌDƐǇƐƚĞŵƐ͘ĂƚĂŝƐĚŝƐƚƌŝďƵƚĞĚĂƚĞĂĐŚĚĂƚĂƐŝƚĞ͘ƵƚŽŶŽŵŽ ŽƵƐĂŐĞŶƚƐƉĞƌĨŽƌŵŵƵůƚŝƉůĞƚĂƐŬƐďĂƐĞĚ ŽŶĐŽŶĨŝŐƵƌĂƚŝŽŶŽĨƐǇƐƚĞŵƐ͘dŚĞŶĂŐĞŶƚŝŶƚĞƌƉƌĞƚƐƚŚĞĐŽŶĨŝŐƵƌĂƚŝŽŶĂŶĚŐĞŶĞƌĂƚĞƐĞǆĞĐƵƚŝŽŶƉůĂŶƚŽĐŽŵƉůĞƚĞŵƵůƚŝƉůĞƚƚĂƐŬƐ͘ ŐĞŶƚƐ ĐĂŶ ďĞ ƚƌĂŶƐĨĞƌƌĞĚ ĨƌŽŵ ŽŶĞ ĚĂƚĂ ƐŝƚĞ ƚŽ ŽƚŚĞƌ ĚĂƚĂ ƐŝƚĞƐ͘ /ƚ ůĞĂĚƐ ƚŽ ĚǇŶĂŵŝĐ ŽƌŐĂŶŝǌĂƚŝŽŶ ŽĨ D ƐǇƐƚĞŵ͘ ĂĐŚ ĂŐĞŶƚ ĐĂŶ ǀŝĞǁ ůŝŵŝƚĞĚ ƐǇƐƚĞŵ͘ dŚŝƐ ůŝŵŝƚĂƚŝŽŶůĞĂĚƐƚŽďĞƚƚĞƌƐĞĐƵƌŝƚǇĂƐŝƚƉƌĞǀĞŶƚƐƵŶǁĂŶƚƚĞĚƚĂƐŬƐĨƌŽŵĚŝĨĨĞƌĞŶƚŚŽƐƚƐ͘ &/'hZϭϭ͗EdZ>/d/KEWZK^^

Agents

Multiple tasks

Interprets task

Generate G Execution plan

ϯ͘ϰDh>d/'Ed^/^dZ/hddD/E/E'^z^dD;DDͿ ŵ ŝƐ ĚŝĨĨĞƌĞŶƚ ĨƌŽŵ ĂŶŽƚŚĞƌ ŵŽĚĞů WƌŽďůĞŵ͗ Ͳ /Ŷ D ƐǇƐƚĞŵƐ͕ ǁĞ ĐĂŶ͛ƚ ŽďƚĂŝŶ ĞǆĂĐƚ ƌĞĞƐƵůƚ ĨƌŽŵ ǀĂƌŝŽƵƐ ƉƌŽĐĞƐƐĞƐ ĂƐ ŬŶŽǁůĞĚŐĞ ŽďƚĂŝŶĞĚ ĨƌŽŵ ŽŶĞ ŵŽĚĞů ŝƌƌĞƐƉĞĐƚŝǀĞŽĨĚĂƚĂŝƐƐĂŵĞ͘dŚŝƐŵĂŬĞƐDƐǇƐƚĞŵƐŝŶĐŽŵƉĂƚŝďůĞ͘dŚĞŶǁŚĂƚŝƐǁĂǇƚŽŽďƚĂŝŶĂĐĐƵƌĂƚĞŬŶŽǁůĞĚŐĞĂŵŽŶŐĚŝƐƚƌŝďƵƚĞĚƐǇƐƚĞŵƐ͘ ^ŽůƵƚŝŽŶ͗ͲDDƐǇƐƚĞŵƐ ŶĂůǇƐŝƐ͗ͲDDƐǇƐƚĞŵŝŶǀŽůǀĞƐƵƐĞŽĨǀĂƌŝŽƵƐĂŐĞŶƚƚƐƚŽĐŽŵƉůĞƚĞŝƚƐƚĂƐŬ͘/ƚƵƐĞƐĂƐƐƚĂŶĚĂƌĚůĂŶŐƵĂŐĞƚŚĂƚĨĂĐĐŝůŝƚĂƚĞƐŝŶƚĞƌĂĐƚŝŽŶĂŵŽŶŐĂŐĞŶƚƐ΀ϴ΁͘/ƚ ŚĂƐǀĂƌŝŽƵƐĂŐĞŶƚƐǁŚŝĐŚĂƌĞĚĞƐĐƌŝďĞĚŝŶƚĂďůĞϯĂůŽŶŐǁ ǁŝƚŚƚŚĞŝƌĨƵŶĐƚŝŽŶƐ͘

/EdZEd/KE>:KhZE>K&Z^Z,/EKDWhdZWW>/d/KE EΘDE'DEd ϭϭϮ 226 DŽŶƚŚůǇŽƵďůĞͲůŝŶĚWĞĞƌZĞǀŝĞǁĞĚ;ZĞĨĞƌƌĞĞĚͬ:ƵƌŝĞĚͿKƉĞŶĐĐĞƐƐ/ŶƚĞƌŶĂƚŝŽŶĂůĞͲ:ŽƵƌŶĂůͲ/ŶĐůƵĚĞĚŝŶƚŚĞ/ /ŶƚĞƌŶĂƚŝŽŶĂů^ĞƌŝĂůŝƌĞĐƚŽƌŝĞƐ ŚƚƚƉ͗ͬͬŝũƌĐŵ͘ŽƌŐ͘ŝŶͬ

VOLUME NO. 3 (2013), ISSUE N O. 01 (J ANUARY) 'Ed^ ;ĂͿ/ŶƚĞƌĨĂĐĞŐĞŶƚ;hƐĞƌŐĞŶƚͿ ;ďͿ&ĂĐŝůŝƚĂƚŽƌŐĞŶƚ;DĂŶĂŐĞŵĞŶƚ ŐĞŶƚͿ ;ĐͿZĞƐŽƵƌĐĞŐĞŶƚ;ĂƚĂŐĞŶƚͿ ;ĚͿDŝŶŝŶŐŐĞŶƚ ;ĞͿZĞƐƵůƚŐĞŶƚ

ISSN 2231-1009

d>ϯ͗sZ/Kh^DD'EdÊd,/Z&hEd/KE^ &hEd/KE^ /ƚŝŶƚĞƌĂĐƚƐǁŝƚŚƵƐĞƌƚŽƉƌŽǀŝĚĞƌĞƋƵŝƌĞŵĞŶƚƐĂŶĚĚŝƐƉůĂǇƐƌĞƐƵůƚƐ͘/ƚŚĂƐŝŶƚĞƌĨĂĐĞŵŽĚƵůĞƚŚĂƚĐŽŶƚĂŝŶƐŵĞƚŚŽĚĨŽƌŝŶƚĞƌ ĂŐĞŶƚĐŽŵŵƵŶŝĐĂƚŝŽŶ͘ /ƚĂĐƚŝǀĂƚĞƐĚŝĨĨĞƌĞŶƚĂŐĞŶƚƐ͘/ƚƌĞĐĞŝǀĞƐƋƵĞƐƚŝŽŶƐĨƌŽŵŝŶƚĞƌĨĂĐĞĂŐĞŶƚĂŶĚŵĂǇƚĂŬĞƚŚĞŚĞůƉŽĨŐƌŽƵƉŽĨĂŐĞŶƚƐƚŽƐŽůǀĞ ƚŚŽƐĞƋƵĞƐƚŝŽŶƐ͘ /ƚŵĂŝŶƚĂŝŶƐDĞƚĂĚĂƚĂŝŶĨŽƌŵĂƚŝŽŶĂďŽƵƚĚĂƚĂƐŽƵƌĐĞƐ͘/ƚŐĞŶĞƌĂƚĞƐƋƵĞƌŝĞƐďĂƐĞĚŽŶƵƐĞƌƌĞƋƵĞƐƚĂŶĚƐĞŶĚƐƚŚĞŝƌƌĞƐƵůƚƐ ƚŽƵƐĞƌĂŐĞŶƚ͘ /ƚŝŵƉůĞŵĞŶƚƐĂƚĂŵŝŶŝŶŐƚĞĐŚŶŝƋƵĞƐĂŶĚĂůŐŽƌŝƚŚŵƐ͘ /ƚŽďƐĞƌǀĞƐŵŝŶŝŶŐĂŐĞŶƚƐĂŶĚŽƚŚĞƌƌĞƐƵůƚƐĨƌŽŵƚŚĞŵ͘ĨƚĞƌŽďƚĂŝŶŝŶŐƌĞƐƵůƚƐ͕ƚŚĞƐĞĂŐĞŶƚƐƐŚŽǁƌĞƐƵůƚƐƚŽƵƐĞƌĂŐĞŶƚďǇ ŝŶƚĞŐƌĂƚŝŶŐǁŝƚŚŵĂŶĂŐĞƌĂŐĞŶƚ͘ /ƚŝƐĂĚǀŝƐŽƌĂŐĞŶƚƚŚĂƚĐĂŶƐĞŶĚƌĞƉůǇƚŽƋƵĞƌǇŽĨĂŶĂŐĞŶƚǁŝƚŚŶĂŵĞĂŶĚŽŶƚŽůŽŐǇŽĨƌĞƐƉĞĐƚŝǀĞĂŐĞŶƚ͘ /ƚŵĂŝŶƚĂŝŶƐĂŶĚƉƌŽǀŝĚĞƐŬŶŽǁůĞĚŐĞŽĨŽŶƚŽůŽŐǇƚŽƐŽůǀĞƋƵĞƌŝĞƐƌĞůĂƚĞĚƚŽŽŶƚŽůŽŐǇ

;ĨͿƌŽŬĞƌŐĞŶƚ ;ŐͿKŶƚŽůŽŐǇŐĞŶƚ ϯ͘ϱ&>KtK&^z^dDKWZd/KE^ WƌĞĚĞĐŝƐŝŽŶŵĂĚĞŝŶŽƌĚĞƌƚŽŵŝŶĞŐŝǀĞŶĚĂƚĂƐŽƵƌĐĞƐŝƐĐĂůůĞĚĂƚĂDŝŶŝŶŐdĂƐŬWůĂŶŶŝŶŐ͘ĂƚĂŵŝŶŝŶŐƚĂƐŬƉůĂŶŶŝŶŐƌĞƋƵŝƌĞƐĐŽŵƉĞŶƐĂƚŝŽŶďĞƚǁĞĞŶ&ĂĐŝůŝƚĂƚŽƌ ŐĞŶƚƐĂŶĚDŝŶŝŶŐŐĞŶƚƐƚŚƌŽƵŐŚŵĞƐƐĂŐĞƉĂƐƐŝŶŐ͘ ŽŶƐŝĚĞƌhƐĞƌŐĞŶƚŝƐĚĞŶŽƚĞĚh͘&ĂĐŝůŝƚĂƚŽƌŐĞŶƚŝƐĚĞŶŽƚĞĚďǇy͘ƌŽŬĞƌŐĞŶƚŝƐĚĞŶŽƚĞĚďǇz͘DŝŶŝŶŐŐĞŶƚŝƐĚĞŶŽƚĞĚďǇ͘/ĨhƐĞŶĚƐƌĞƋƵĞƐƚƚŽyƚŽĂƐŬ ĨŽƌĂƚĂŵŝŶŝŶŐǁŝƚŚŽƚŚĞƌĂŐĞŶƚƐŝŶƐǇƐƚĞŵ͘dŚĞŶyƚƌŝĞƐƚŽĐŽŵƉĞŶƐĂƚĞǁŝƚŚzƚŽĚĞƚĞƌŵŝŶĞǁŚŝĐŚĂŐĞŶƚƐĂƌĞƐƵŝƚĂďůĞĨŽƌƉĞƌĨŽƌŵŝŶŐƚĂƐŬ͘DŝŶŝŶŐŐĞŶƚŝƐ ƌĞƐƉŽŶƐŝďůĞĨŽƌĐŽŵƉůĞƚŝŽŶŽĨƚĂƐŬǁŚŝůĞyŝƐƵƐĞĚĨŽƌƉůĂŶŶŝŶŐ͘tŚĞŶŝƐĐŽŵƉůĞƚĞĚ͕ŝƚƐŚŽǁƐƌĞƐƵůƚƐƚŽyĂŶĚyƉĂƐƐĞƐƚŚĞŵƚŽh͘ &/'hZͲϭϮ͗&>KtK&^z^dDKWZd/KE^

Request

User agent (U)

Message Passing

Facilitator Agent(X) Broker Agent(Y)

Mining Agent (Z)

ϰ͘>^^/&z/E'EE>z^/E'tKhDEd^ DĂŶǇŝŶĨŽƌŵĂƚŝŽŶĞǆƚƌĂĐƚŝŽŶŵĞƚŚŽĚƐĂŶĚƚĞĐŚŶŝƋƵĞƐǁĞƌĞƵƐĞĚďƵƚƚŚĞǇĂůůĂƌĞŝŶǀĂŝŶ͘^ŽǁĞŶĞĞĚŵŽƌĞŝŶƚĞůůŝŐĞŶƚƐǇƐƚĞŵƚŽŐĂƚŚĞƌƵƐĞĨƵůŝŶĨŽƌŵĂƚŝŽŶĨƌŽŵ ŚƵŐĞĂŵŽƵŶƚŽĨĚĂƚĂ͘ WƌŽďůĞŵ͗ͲdŽĨŝŶĚŵĞĂŶŝŶŐĨƵůĂŶĚŝŶĨŽƌŵĂƚŝǀĞĚŽĐƵŵĞŶƚƐǁŝƚŚŚĞůƉŽĨĂƚĂDŝŶŝŶŐĂůŐŽƌŝƚŚŵƐĂŶĚƚŚĞŶŝŶƚĞƌƉƌĞƚŝŶŐŵŝŶŝŶŐƌĞƐƵůƚƐŝŶĞǆƉƌĞƐƐŝǀĞǁĂǇ͘ ^ŽůƵƚŝŽŶ͗ͲKŶƚŽůŽŐǇĂƐĞĚtĞďŽŶƚĞŶƚDŝŶŝŶŐDĞƚŚŽĚŽůŽŐǇ ƉƉƌŽĂĐŚŝŶǀŽůǀĞĚ͗ͲdŚĞƉƌŽƉŽƐĞĚŵĞƚŚŽĚŽůŽŐǇƵƐĞƐĐŽŶĐĞƉƚŽĨŽŵĂŝŶKŶƚŽůŽŐǇ΀ϵ΁͘ŽŵĂŝŶKŶƚŽůŽŐǇŽƌŐĂŶŝǌĞƐĐŽŶĐĞƉƚƐ͕ƌĞůĂƚŝŽŶƐĂŶĚŝŶƐƚĂŶĐĞƐŝŶƚŽŐŝǀĞŶ ĚŽŵĂŝŶ͘dŚŝƐĂƉƉƌŽĂĐŚŝƐƵƐĞĚďĞĐĂƵƐĞŝƚƌĞƐŽůǀĞƐƐǇŶŽŶǇŵƐĂŶĚƌĞĚƵĐŝŶŐĐŽŶĨƵƐŝŽŶĂŵŽŶŐĂŐĞŶƚƐ͘ ŶĂůǇƐŝƐ͗ Ͳ KŶƚŽůŽŐǇ ĂƐĞĚ tĞď ŽŶƚĞŶƚ DŝŶŝŶŐ ƌĞƉƌĞƐĞŶƚƐ ĐŽŶĐĞƉƚƵĂů ŝŶĨŽƌŵĂƚŝŽŶ ĂďŽƵƚ ŐŝǀĞŶ ĚŽŵĂŝŶ͘ /ƚ ƐŚŽǁƐ ĚŽĐƵŵĞŶƚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ͕ ĞǆƚƌĂĐƚŝŽŶ ŽĨ ƌĞůĞǀĂŶƚŝŶĨŽƌŵĂƚŝŽŶĨƌŽŵƚĞǆƚĚŽĐƵŵĞŶƚƐĂŶĚĐƌĞĂƚĞƐĐůĂƐƐŝĨŝĐĂƚŝŽŶŵŽĚĞůƐ͘dŚŝƐŵĞƚŚŽĚŽůŽŐǇŝƐĨŽůůŽǁĞĚƚŚĂƚƵƐĞƐƚŚĞŝĚĞĂƐĂŶĚƉƌŝŶĐŝƉůĞƐŽĨĂƚĂDŝŶŝŶŐƚŽ ĂŶĂůǇǌĞǁĞďĚĂƚĂ͘ &/'hZͲϭϯ͗^d'^K&KEdK>K'zt^KEdEdD/E/E'

Classification of new documents Gathering information of documents

Creation of Ontology

Building classification model and algorithm

/Ŷ ƚŚŝƐ ƐĞĐƚŝŽŶ͕ ǁĞ ŚĂǀĞ ĞǀĂůƵĂƚĞĚ ƉƌŽƉŽƐĞĚ ŵĞƚŚŽĚŽůŽŐǇ ŝŶ ƚǁŽ ƐƉĞĐŝĨŝĐ ĚŽŵĂŝŶƐͲ ǁĞĂƚŚĞƌ ĚŽŵĂŝŶ ;ǁĞď ƉĂŐĞƐ ĐŽŶƚĂŝŶŝŶŐ ŝŶĨŽƌŵĂƚŝŽŶ ĂďŽƵƚ ǁĞĂƚŚĞƌ ĨŽƌĞĐĂƐƚŝŶŐĂŶĚĂŶĂůǇƐŝƐͿĂŶĚ'ŽŽŐůĞdDĐŽůůĞĐƚŝŽŶ;ǁĞďƉĂŐĞƐĐŽŶƚĂŝŶŝŶŐŶĞǁƐͿ͘ ϰ͘ϭKhdtKZŶĞƚ tKZŶĞƚ ǁĂƐ ŝŶǀĞŶƚĞĚ ďǇ ͘ DŝůůĞƌ ΀ϭϬ΁͘ /ƚ ŝƐ Ă ůĂƌŐĞ ůĞǆŝĐĂů ĚĂƚĂďĂƐĞ ŽĨ ŶŐůŝƐŚ͘ EŽƵŶƐ͕ sĞƌďƐ͕ ĚǀĞƌďƐ͕ ĚũĞĐƚŝǀĞƐ ĐŽŵďŝŶĞ ŝŶƚŽ ƐĞƚƐ ŽĨ ƐǇŶŽŶǇŵƐ͘ ĂĐŚ ŽĨ ƐǇŶŽŶǇŵƐ ƌĞƉƌĞƐĞŶƚƐ ĐŽŶĐĞƉƚ ĂŶĚ ƐŽůǀĞƐ ƋƵĞƌŝĞƐ ƚŚƌŽƵŐŚ ƐĞĂƌĐŚ ĂŶĚ ůĞǆŝĐĂů ƌĞƐƵůƚƐ͘ /Ŷ ƚŚŝƐ ƉĂƉĞƌ͕ ǁĞ ŚĂǀĞ ƵƐĞĚ tKZŶĞƚ ǀĞƌƐŝŽŶ ϯ͘ϭ͘ /ƚ ĐŽŶƚĂŝŶƐ ϭϱϱϮϴϳ ƐǇŶŽŶǇŵƐ͘tKZŶĞƚŝƐŽŶĞŽĨďĞƐƚĞǆĂŵƉůĞŽĨŽŶƚŽůŽŐǇƵƐĞĚŝŶĞǆƉĞƌŝŵĞŶƚƐ͘ tŽƌĚEĞƚ^ĞĂƌĐŚͲϯ͘ϭ ͲtŽƌĚEĞƚŚŽŵĞƉĂŐĞͲ'ůŽƐƐĂƌǇͲ,ĞůƉ tŽƌĚƚŽƐĞĂƌĐŚĨŽƌ͗

ontology

ŝƐƉůĂǇKƉƚŝŽŶƐ͗

Search WordNet

(Select option to change)

Change

/EdZEd/KE>:KhZE>K&Z^Z,/EKDWhdZWW>/d/KEΘDE'DEd ϭϭϯ 227 DŽŶƚŚůǇŽƵďůĞͲůŝŶĚWĞĞƌZĞǀŝĞǁĞĚ;ZĞĨĞƌĞĞĚͬ:ƵƌŝĞĚͿKƉĞŶĐĐĞƐƐ/ŶƚĞƌŶĂƚŝŽŶĂůĞͲ:ŽƵƌŶĂůͲ/ŶĐůƵĚĞĚŝŶƚŚĞ/ŶƚĞƌŶĂƚŝŽŶĂů^ĞƌŝĂůŝƌĞĐƚŽƌŝĞƐ ŚƚƚƉ͗ͬͬŝũƌĐŵ͘ŽƌŐ͘ŝŶͬ

VOLUME NO. 3 (2013), ISSUE N O. 01 (J ANUARY Y)

ISSN 2231-1009

d^ dŚĞĂďŽǀĞĞǆƉĞƌŝŵĞŶƚƐŚŽǁƐƚŚĂƚĐůĂƐƐŝĨŝĐĂƚŝŽŶĂƚĂD DŝŶŝŶŐĂůŐŽƌŝƚŚŵƐůŝŬĞϰ͘ϱ͕ĂǇĞƐEĞƚĂŶĚ^sDĂƌĞŝŵƉƌŽǀĞĚďǇƵ ƵƐŝŶŐtKZŶĞƚ͘tĞĐĂŶĐůĂƐƐŝĨǇƐŝŵƉůĞ ǁŽƌĚƐ ĂŶĚ ĞǆƉƌĞƐƐŝŽŶƐ ŽĨ ĚŝĨĨĞƌĞŶƚ ĚĂƚĂƐĞƚƐ͘ EĂŢǀĞ ĂǇĞƐ ŝƐ ĂůŐŽƌŝƚŚŵ ƚŚĂƚ ƐŚŽǁƐ ĂĐĐƵƌĂƚĞ ƌĞƐƵůƚ ďĞĨŽƌĞ ĂďƐƚƌĂĐƚŝŽŶ Ŷ ĂŶĚ ƐŚŽǁƐ ůĞƐƐ ĂĐĐƵƌĂƚĞ ƌĞƐƵůƚ ĂĨƚĞƌ ĂďƐƚƌĂĐƚŝŽŶ͘

ϱ͘KE>h^/KE &ƌŽŵƚŚŝƐƉĂƉĞƌ͕ǁĞĐŽŶĐůƵĚĞƚŚĂƚǁĞŚĂǀĞĚĞƐĐƌŝďĞĚ ƌŽůĞŽĨ@1RGHVVHQVHWKHGDWDDQGVHQGLWWRWKHFOXVWHUKHDGVZKLFK DJJUHJDWHV WKH GDWD DQG VHQG WR WKH EDVH VWDWLRQ'XH WR ZLGHVSUHDG JURZWK LQ WHFKQRORJ\ DQG QXPHURXV DSSOLFDWLRQV :61V DUH EHLQJ XVHG HQRUPRXVO\ :LWK LQFUHDVLQJ HDVH WKLV QHWZRUN LV SURQH WR DWWDFNV DOVR 6HFXULW\ LV PDMRU FRQFHUQ LQ :61V0DQ\W\SHVRIDWWDFNVEDGO\DIIHFWQHWZRUNRSHUDWLRQ %ODFN KROH DWWDFNV RFFXU ZKHQ DQ LQWUXGHU RU VD\ RXWVLGHU FDSWLYDWHWKHQRGHVDQGUHSURJUDPVWKHPWKDWOHDGVWREORFNLQJ RI SDFNHWV UDWKHU WKDQ WUDQVPLWWLQJ WKHP WR VWDWLRQ >@,Q ZLUHOHVV VHQVRU QHWZRUNV WKH UHVHDUFKHUV IRFXV RQ HVWDEOLVKLQJWKHVKRUWHVWURXWHWRWKHGHVWLQDWLRQIRUURXWLQJWKH GDWD SDFNHWV ZLWK PLQLPXP FRVW RI EDQGZLGWK DQG EDWWHU\ SRZHU>@6HYHUDOVWXGLHVKDYHEHHQFRQGXFWHGLQ>@LQOLHXRI KDQGOLQJSUREOHPVRIVHQVRUVHOHFWLYHQRGHV 9DULRXVLVVXHVDUHWKHUHLQGLIIHUHQWLQWUXVLRQGHWHFWLRQV\VWHP WHFKQLTXHV6RPHLVVXHVDUH x ,Q KLHUDUFKLFDO FOXVWHULQJ EDVHG ,'6V FOXVWHULQJ DOJRULWKPV FRQVXPHV KXJH DPRXQW RI HQHUJ\ LQ FOXVWHU IRUPDWLRQ:KHQFOXVWHUIRUPDWLRQLVGRQHDQGWKH&+V DUHHOHFWHG&+VPXVWEHVHFXUHDVWKH\DUHPRUHSURQH WR DWWDFNV E\ DQ DGYHUVDU\ 0RUHRYHU LI FOXVWHU KHDGV DUHOHVVHIILFLHQWLQWHUPVRIHQHUJ\WKHQWKH\FDQIDLOLQ VHUYLQJWKHQHWZRUNEHLQJDKHDG x ,QDJHQWEDVHGLQWUXVLRQGHWHFWLRQV\VWHPQHWZRUNORDG DQGODWHQF\LVUHGXFHG%XWQRGH¶VHQHUJ\FRQVXPSWLRQ LV KLJK +RZHYHU DJHQW WR DJHQW FRPPXQLFDWLRQ DQG

c 978-9-3805-4421-2/16/$31.00 2016 IEEE

,, /,7(5$785(6859(@ WKH DXWKRUV SURSRVHG WZR DSSURDFKHV IRU LPSURYLQJ WKH VHFXULW\ RI VHQVRU QHWZRUNV FOXVWHU EDVHG )LUVW DSSURDFK LV DXWKHQWLFDWLRQEDVHGLQWUXVLRQSUHYHQWLRQDQGVHFRQGDSSURDFK LVHQHUJ\VDYLQJLQWUXVLRQGHWHFWLRQ $XWKHQWLFDWLRQ PHFKDQLVPV DUH GLVFXVVHG LQ ILUVW DSSURDFK WKHVH DUH LPSOHPHQWHG LQ JHQHULF VHQVRU QHWZRUNV WR VDYH HQHUJ\ RI QRGHV ,Q VHFRQG DSSURDFK PRQLWRULQJ PHFKDQLVPV DUH GLVFXVVHG ZKLFK PRQLWRUV WKH FOXVWHU KHDGV DQG QRGHV:KHQ FOXVWHU KHDG LV PRQLWRUHG VHQVRU QRGH PRQLWRUV WKDW FOXVWHU KHDG 'XH WR WKLV PRQLWRULQJ WLPH LV UHGXFHG WKHUHE\VDYLQJHQHUJ\&OXVWHUKHDGVPRQLWRUVWKHVHQVRUQRGHV DQGGHWHFWVWKHPDOLFLRXVPHPEHUQRGH,QVWHDGRIXVLQJVHQVRU QRGHVWRPRQLWRUQRGHVFOXVWHUKHDGVDUHXVHGIRUPRQLWRULQJ SXUSRVH,Q>@DXWKRUSURSRVHGDQDSSURDFKIRUEODFNKROHDQG JUD\ KROH GHWHFWLRQ XVLQJ KDVK IXQFWLRQ DQG VHFRQG RSWLPDO URXWH IRU GDWD WUDQVPLVVLRQ ,Q WKH SURSRVDO WKH YHU\ ILUVW RSWLPXP UHSO\ LV GLVFDUGHG DQG WKH VHFRQG VKRUWHVW 5RXWH 5HSO\55(3 PHVVDJHLVFKRVHQWRHVWDEOLVKURXWHIURPVRXUFH WRGHVWLQDWLRQ7KLVDSSURDFKKDVJRRGUHVXOWVDVLWUHGXFHVWKH DWWDFN IUHTXHQF\ ,W ZRXOG EH GLIILFXOW IRU PDOLFLRXV QRGHV WR PRQLWRU WKH ZKROH QHWZRUN $ KDVK IXQFWLRQ LQ FDVH RI PDQ\ PDOLFLRXV QRGHV LQ WKH QHWZRUN LV XVHG :KLOH VHQGLQJ GDWD SDFNHWVWRGHVWLQDWLRQVRXUFHDOVRVHQGVWKHKDVKYDOXHRIWKH PHVVDJH2QUHFHLYLQJDOOWKHGDWDSDFNHWVGHVWLQDWLRQFRPSXWH KDVKYDOXHDQGLIERWKWKHYDOXHVIRXQGHTXDOPHDQVWKHLVQR EODFN KROH JUD\ KROH DWWDFN ,I LQ FDVH RI DWWDFN GHVWLQDWLRQ QRGHEURDGFDVWGDWDSDFNHWHUURUPHVVDJHDQGVRXUFHVDYHVWKLV

3399 254

J UG,QWHUQDWLRQDO&RQIHUHQFHRQ³&RPSXWLQJIRU6XVWDLQDEOH*OREDO'HYHORSPHQW´WKWK0DUFK URXWHLQWKHWDEOHVRDVWRDYRLGLQIXWXUHDQGUHEURDGFDVWURXWH ,,, 352326('$3352$&+ UHTXHVW PHVVDJH ,Q >@ DXWKRU SURSRVHG D PHWKRG FDOOHG 7KHDUFKLWHFWXUHLVGHVLJQHGWRGHIHQGDJDLQVWEODFNKROHDWWDFN (QKDQFH 5RXWH 'LVFRYHU\ IRU $2'9 (5'$ 7KLV DSSURDFK XVLQJ WZR FOXVWHU KHDGV LQ HDFK FOXVWHU +LHUDUFKLFDO FOXVWHU LPSURYHVWKHVHFXULW\RIWKH$2'9GXULQJWKHURXWHGLVFRYHU\ EDVHG QHWZRUN FRQVLVWV RI %DVH VWDWLRQ &OXVWHUV ZKLFK SURFHVV DQG KHOSV WR SURWHFW WKH QHWZRUN IURP DWWDFNV 7KUHH FRQVLVWVRIWZRFOXVWHUKHDGVDQGVHQVRUQRGHV%DVHVWDWLRQLV QHZ HOHPHQWV DUH LQWURGXFHG WR LPSURYH WKH H[LVWLQJ $2'9 SODFHGLQFHQWHURIWKHQHWZRUN&OXVWHUIRUPDWLRQLVGRQHDQG ,Q>@DXWKRULPSOHPHQWHGDVLPXODWLRQEDVHGPRGHOIRUVROXWLRQ WZR FOXVWHU KHDGV DUH HOHFWHG EDVHG RQ HQHUJ\ DQG SRVLWLRQ RI WR SUHYHQW QHWZRUN IURP EODFN KROH DWWDFN 5HVXOWV DUH VKRZQ WKHQRGHV,QLWLDOO\WKHQRGHVKDYLQJKLJKHVWHQHUJ\DUHFKRVHQ E\ FRPSDULQJ WKH FRPPXQLFDWLRQ RYHUKHDG DQG FRVW LQ WKH DV FOXVWHU KHDGV 3RVLWLRQ RI QRGHV DOVR FRPHV LQWR QHWZRUN ZLWK XVLQJ PXOWLSOH VLQN DQG ZLWKRXW XVLQJ PXOWLSOH FRQVLGHUDWLRQ QRGHV ZKLFK DUH QHDUHVW WR WKH EDVH VWDWLRQ DUH VLQN LQ RUGHU WR SUHYHQW EODFNKROH DWWDFN 7KH PRELOH DJHQWV FKRVHQ DV FOXVWHU KHDGV $IWHU VRPH URXQGV FOXVWHU KHDG LV ZHUHGHYHORSHGXVLQJWKHDJOHW VHOHFWHGEDVHGRQUHVLGXDOHQHUJ\2QHLVSULPDU\FOXVWHUKHDG $ PHWKRG RI SULPH SURGXFW QXPEHU 331 LV XVHG LQ >@ IRU DQGRWKHULVVHFRQGDU\FOXVWHUKHDG1RGHVDUHFRQQHFWHGZLWK GHWHFWLRQ DQG UHPRYDO RI PDOLFLRXV QRGHV (DFK QRGH VKRXOG WZR FOXVWHU KHDGV RXW RI ZKLFK RQO\ RQH LV DFWLYH DW D WLPH SRVVHVV XQLTXH SULPH QXPEHU 6RXUFH QRGH 61 EURDGFDVW 1RGHV VHQG GDWD WR WKH SULPDU\ FOXVWHU KHDG &OXVWHU KHDG 55(4 WR GHVWLQDWLRQ DQG LQ UHVSRQVH LQWHUPHGLDWH QRGH ,1 SURFHVVWKHGDWDDQGVHQGLWWRWKHEDVHVWDWLRQQRGHV ZLVKLQJ WR VHQG 55(3 KDV WR SURYLGH SURGXFW RI DOO SULPH QXPEHUV331 IURPGHVWLQDWLRQWRVRXUFHDQGDOVRLQIRUPDWLRQ RI LWV FOXVWHU KHDG 2Q UHFHLYLQJ 55(3 PHVVDJHV 331 JHWV GLYLGHGZLWKQRGH,'¶V,I331LVGLYLVLEOHWKHQLWLVWHUPHGDV UHOLDEOH QRGH HOVH PDOLFLRXV QRGH $OWKRXJK FRPSOH[LW\ RI ILQGLQJ SULPH QR LV 2 Q DVVLJQLQJ DSSURSULDWH DQG XQLTXH XQXVHG SULPH QRV LQ 0$1(7 LV FRPSOH[ 0$1(7V DUH G\QDPLF LQ QDWXUH7KH SURSRVDO >@ XWLOL]HV SURPLVFXRXV PRGHRIWKHQRGH$FFRUGLQJWRWKLVPRGHDQRGHLVDOORZHGWR LQWHUFHSWDQGUHDGHDFKSDFNHWWKDWDUULYHVLQLWVUDQJH6RXUFH QRGHEURDGFDVW55(4PHVVDJHLQQHWZRUN2QUHFHLYLQJ55(3 PHVVDJH IURP GHVWLQDWLRQ D URXWH LV HVWDEOLVKHG DQG LI 55(3 PHVVDJHLVUHFHLYHGIURPLQWHUPHGLDWHQRGHWKHQDQRGHZKLFK LVMXVWEHVLGHWKDWQRGHZKLFKVHQG55(3PHVVDJHVZLWFKHVWR SURPLVFXRXVPRGH DQG VHQGVKHOOR PHVVDJH WR WKHGHVWLQDWLRQ )LJ &OXVWHUEDVHGVHQVRUQHWZRUN QRGHWKURXJKWKLVQRGH,IWKDWKHOORSDFNHWLVIRUZDUGHGE\WKLV QRGHWRWKHGHVWLQDWLRQWKLVLPSOLHVWKDWWKHQRGHLVVDIHHOVH WKHQRGHLVDPDOLFLRXVQRGH7KHSUHFHGLQJQRGHLQIRUPVDERXW WKHPDOLFLRXVQRGHLQWKHQHWZRUN ,W FDQ LQFUHDVH WKH +HOOR PHVVDJHV LQ WKH QHWZRUNV LQ FDVH RI PRUH VHQGHUV DQG UHFHLYHUV WKXV FUHDWLQJ FRQJHVWLRQ 1HWZRUN 5RXWLQJ /RDG ZLOO LQFUHDVH ,Q >@ DXWKRU SURSRVHG WZR VROXWLRQ WZR VROXWLRQV WR GHWHFW WKH EODFN KROH DWWDFN ,Q WKH ILUVWVROXWLRQDSDWKZLOOEHVHOHFWHGDPRQJDOOUHFHLYHGURXWHV LQ WHUPV RI VKDUHG KRSV 7KH VRXUFH QRGH ZLOO EH DEOH WR UHFRJQL]HWKHVDIHURXWHWRWKHGHVWLQDWLRQWKURXJKVKDUHGKRSV 7KHPDLQGUDZEDFNRIWKLVDSSURDFKLVWRLQWURGXFHPRUHGHOD\ RQ WKH QHWZRUN ,Q WKH VHFRQG VROXWLRQ HDFK QRGH VWRUHV WKH ODVWSDFNHWVHTXHQFHQXPEHUV IRU WKH ODVW SDFNHW VHQW WR HDFK QRGH DQG WKH ODVWSDFNHWVHTXHQFHQXPEHUV IRU WKH ODVW SDFNHW UHFHLYHG IURP HDFK QRGH 7KH UHFHLYHG 55(3 FRQWDLQV ODVW SDFNHWVHTXHQFHQXPEHUV UHFHLYHG IURP WKH VRXUFH QRGH $FFRUGLQJWRWKHVHTXHQFHQXPEHUWKHVRXUFHQRGHFDQGHWHFW WKH PDOLFLRXV 55(3,Q >@ DXWKRUV SURSRVHG DQ HQKDQFHG )LJ 0DOLFLRXVQRGHLQDFOXVWHU YHUVLRQ RI $2'9 FDOOHG DV '35$2'9 LH GHWHFWLRQ SUHYHQWLRQ DQG UHDFWLRQ $2'9 WR LPSURYH VHFXULW\ $2'9 UHFHLYHV55(3WKDWLGHQWLILHVYDOXHRIVHTXHQFHQXPEHULQLWV $ &/867(5,1* URXWLQJWDEOH3DFNHWVDUHDFFHSWHGLIWKHLUYDOXHLVKLJKHUWKDQ &OXVWHULQJ UHIHU WR DV JURXSLQJ VHQVRU QRGHV LQWR FOXVWHUV SUHGHILQHGVHTXHQFHQXPEHUHOVHGLVFDUGHG &OXVWHUFRQVLVWVRIQRGHVRXWRIZKLFKRQHLVVHOHFWHGDVFOXVWHU

3400

2016 International Conference on Computing for Sustainable Global Development (INDIACom) 255

KHDG &OXVWHU KHDG SHUIRUP VSHFLDO WDVN 7KH VHQVRU QRGHV SHULRGLFDOO\ WUDQVPLW WKH GDWD SDFNHWV WR WKH FOXVWHU KHDG &OXVWHUKHDGFROOHFWVWKHGDWDSDFNHWVDQGIRUZDUGWKHPWRWKH EDVHVWDWLRQDOVRFDOOHGDVVLQN

,QVWDWLFDSSURDFKFOXVWHUKHDGLVHOHFWHGRQFHDQGWKDWFOXVWHU KHDG ZLOO UHPDLQ FOXVWHU KHDG IRU HQWLUH WLPH ZKHUHDV LQ G\QDPLFDSSURDFKWKHFOXVWHUKHDGVDUHUHHOHFWHGEDVHGRQWKH HQHUJ\OHYHO'\QDPLFDUFKLWHFWXUHLVHQHUJ\HIILFLHQWDQGWKHUH ZLOOEHORZSRZHUFRQVXPSWLRQ 3UREDELOLVWLFFOXVWHULQJDOJRULWKP ,Q SUREDELOLVWLF DSSURDFK SULRULW\ LV DVVLJQHG WR DOO WKH QRGHV IRU WKH GHWHUPLQDWLRQ RI FOXVWHU KHDG LQLWLDOO\ 3UREDELOLVWLF FOXVWHULQJDSSURDFKHVLQFOXGHV/($&+((&+DQG+((' 1RQSUREDELOLVWLFFOXVWHULQJDOJRULWKP ,Q WKLV DOJRULWKP VSHFLILF FULWHULRQ IRU FOXVWHU KHDG HOHFWLRQ LV FRQVLGHUHG 1RGH GHJUHH FRQQHFWLYLW\ HWF IDFWRUV DUH FRQVLGHUHGDQGLQIRUPDWLRQUHFHLYHGIURPRWKHUFORVHO\ORFDWHG QRGHVDUHFRQVLGHUHG /LQNHGFOXVWHULQJDOJRULWKP /LQNHG FOXVWHULQJ DOJRULWKP /&$ LV GLVWULEXWHG ,' EDVHG RQHKRSVWDWLFFOXVWHULQJDOJRULWKPDLPVWRPD[LPL]HQHWZRUN FRQQHFWLYLW\ %XW LW KDV D OLPLWDWLRQ LW GR QRW FRQVLGHU WKH SUREOHPRIOLPLWHGHQHUJ\ 7KH GUDZEDFN RI /&$ LV WKDW LW OHDGV WR H[FHVVLYH QXPEHU RI FOXVWHUV

)LJ :LUHOHVVVHQVRUQHWZRUN>@

,9 352326('$/*25,7+0)25'(7(&7,21$1' 35(9(17212)%/$&.+2/($77$&.

% $'9$17$*(62)&/867(5,1* x &OXVWHULQJ LQFUHDVHV HQHUJ\ HIILFLHQF\ RI WKH VHQVRU QHWZRUN x ,WIDFLOLWDWHVORZHUHQHUJ\FRQVXPSWLRQ x ,WVXSSRUWVQHWZRUNVFDODELOLW\ x &OXVWHULQJ KHOSV LQ ORFDOL]LQJ WKH URXWH VHWXS DQG WKHUHE\UHGXFLQJWKHVL]HRIURXWLQJWDEOHVWRUHGDWHDFK VHQVRUQRGH x ,QFOXVWHUEDVHGQHWZRUNLQWHUFOXVWHUFRPPXQLFDWLRQ x LV QRW WKHUH WKHUHE\ FRQVHUYLQJ FRPPXQLFDWLRQ EDQGZLGWK & &/867(5,1*$/*25,7+0&/$66,),&$7,21 $FFRUGLQJ WR >@ FOXVWHULQJ DOJRULWKPV DUH FODVVLILHG DV IROORZV &OXVWHULQJ DOJRULWKP IRU KRPRJHQHRXV RU KHWHURJHQHRXVQHWZRUN 7KLVDOJRULWKPLVEDVHGRQWKHIDFWWKDWLQKHWHURJHQHRXVVHQVRU QHWZRUNV WKHUH DUH WZR W\SHV RI VHQVRU QRGHV WKH RQH ZLWK KLJK FRPSXWDWLRQ SRZHU WR FUHDWH D EDFNERQH LQVLGH WKH QHWZRUN DQG RWKHU ZLWK ORZ FRPSXWDWLRQ SRZHU WR VHQVH WKH DWWULEXWHV DQG IRUZDUG WKH GDWD WR WKH FOXVWHU KHDG ,Q KRPRJHQHRXV QHWZRUN DOO WKH QRGHV KDYH VDPH SURFHVVLQJ FDSDELOLWLHV &HQWUDOL]HGRUGLVWULEXWHGFOXVWHULQJDOJR ,QFHQWUDOL]HGQHWZRUNEDVHVWDWLRQFRQWUROVDOOWKHDFWLYLWLHVRI WKH QHWZRUN ,Q GLVWULEXWHG QHWZRUN WKH FOXVWHU KHDGV FROOHFWV DOO WKH GDWD SDFNHWV DQG VHQGV WKHP WR WKH EDVH VWDWLRQ +LHUDUFK\LVPDLQWDLQHGLQWKHQHWZRUN 6WDWLFDQGG\QDPLFFOXVWHULQJDOJRULWKP

1HHGIRUWZRFOXVWHUKHDGV 7RGHWHFWPDOLFLRXVQRGHZHKDYHVHOHFWHGWZRFOXVWHUKHDGVLQ DFOXVWHU2QHFOXVWHUKHDGZLOOEHDFWLYDWHGDWDWLPHWKHUHIRUH FRPSOH[LW\ RI WKH V\VWHP ZRXOG QRW LQFUHDVH &OXVWHU KHDGV GLVWULEXWHWKHHQHUJ\ORDGLQWKHQHWZRUN $OJRULWKP &OXVWHU KHDG PDLQWDLQV WKH WDEOH FRQWDLQLQJ ,'V RI WKH QRGHV :KHQQRGHVVWDUWVVHQGLQJGDWDWRWKHFOXVWHUKHDGFOXVWHUKHDG VWDUW D WLPHU XS WR D SDUWLFXODU WLPH QRGHV PXVW VHQG DOO WKH GDWD,IDQ\QRGHLVPDOLFLRXVLWZLOOQRWIRUZDUGWKHSDFNHWV $IWHUDWKUHVKROGFOXVWHUKHDGZLOOFRPHWRNQRZZKLFKQRGH LV PDOLFLRXV DQG ZLOO UHPRYH WKDW QRGH,I FOXVWHU KHDG LV PDOLFLRXV EDVH VWDWLRQ ZLOO GHWHFW WKDW QRGH DQG ZLOO UHPRYH WKDW FOXVWHU KHDG QRGH $IWHU UHPRYLQJ WKDW QRGH VHFRQGDU\ FOXVWHU KHDG LV DFWLYDWHG DQG EDVH VWDWLRQ LQIRUP DOO WKH QRGHV DERXW WKDW DQG QRZ QRGHV ZLOO VHQG GDWD WR WKH VHFRQGDU\ FOXVWHUKHDG 1RWDWLRQV 010DOLFLRXV1RGH &+&OXVWHU+HDG %HJLQ &UHDWHQHWZRUNWRSRORJ\ )RUDOOQRGHV ^ $VVLJQHQHUJ\OHYHOV &UHDWHFOXVWHUV


3401

J UG,QWHUQDWLRQDO&RQIHUHQFHRQ³&RPSXWLQJIRU6XVWDLQDEOH*OREDO'HYHORSPHQW´WKWK0DUFK )RUHDFKFOXVWHU ^ 6HOHFWWZRFOXVWHUKHDGV 3ULPDU\&+LVDFWLYDWLQJDWDWLPH 1RGHVVHQGGDWDWRDFWLYHFOXVWHUKHDG ,ISULPDU\&+QRGHLV01 ,WGURSVDOOWKHSDFNHWVFRPLQJWRLW %DVHVWDWLRQZLOOGHWHFWDQGUHPRYHWKH01 %DVH VWDWLRQ ZLOO LQIRUP QRGHV DERXW LW DQG DFWLYDWH&+ &+LVQRZSULPDU\&+ (OVH ^ &RQWLQXHQRUPDORSHUDWLRQ ` ` ` (QG &OXVWHUKHDGPDLQWDLQVDWDEOHZKLFKFRQWDLQVQRGH,' 7DEOH JLYHQ EHORZ GHSLFWV WKH QRGH ,' RI FRUUHVSRQGLQJQRGHVEHORQJVWRWKHFOXVWHU 7$%/(,

12'(,'6725('$7&/867(5+($'

&/867(5

12'(

,'

1

,'

1

,'

1

,'

)LJ 7D[RQRP\RIEODFNKROHGHWHFWLRQDSSURDFKHV

7KH FOXVWHU IRUPDWLRQ LV GRQH EDVHG RQ WKH VHQVRU QRGHV ORFDWLRQ 7KH VHQVRU QRGHV ZKLFK DUH QHLJKERULQJ QRGHV DUH JURXSHGXQGHU RQH FOXVWHU$IWHU FOXVWHU IRUPDWLRQ WKH FOXVWHU KHDGLVHOHFWHGEDVHGRQWKHHQHUJ\RIWKHVHQVRUQRGHV

5()(5(1&(6 >@ 6WDQNRYLF-:KHQVHQVRUDQGDFWXDWRUQHWZRUNVFRYHUWKHZRUOG (75,-RXUQDO ± >@ 0RKDPPDG :D]LG DYLWD .DWDO 5RVKDQ VLQJK VDFKDQ 5+ *RXGDU DQG '3 VLQJK´'HWHFWLRQ DQG SUHYHQWLRQ PHFKDQLVP IRU EODFNKROH DWWDFN LQ ZLUHOHVV VHQVRU QHWZRUN´ ,((( ,QWHUQDWLRQDO FRQIHUHQFH RQ&RPPXQLFDWLRQDQG6LJQDO3URFHVVLQJ$SULO,QGLD >@ .2 VDWKDQXQNXO DQG 1]KDQJ $ FRXQWHUPHDVXUH WR EODFN KROH DWWDFNV LQ PRELOH DG KRF QHWZRUNV WK ,((( ,QWHUQDWLRQDO &RQIHUHQFH RQ 1HWZRUNLQJ 6HQVLQJ DQG &RQWURO WK ,&16& SS$SULO >@ & 5DPDNULVWDQDLDK $ / 6UHHQLYDVXOX ,GHQWLILFDWLRQ RI 0LVEHKDYLQJ 1RGHV WKDW &DQ 'URS RU 0RGLI\ WKH 3DFNHWV LQ :LUHOHVV 6HQVRU 1HWZRUNV ,QWHUQDWLRQDO -RXUQDO RI 6FLHQFH DQG 5HVHDUFK,-65 ,QGLD2QOLQH,661 >@ &KLHQ&KXQJ 6X .R0LQJ &KDQJ ZLUHOHVV VHQVRU QHWZRUNV@ LQ :LUHOHVV&RPPXQLFDWLRQVDQG1HWZRUNLQJ&RQIHUHQFH,((( YROQRSS9RO0DUFK

9 &21&/86,21$1')8785(6&23( 7KH SURSRVHG DSSURDFK LV EDVHG RQ WZR FOXVWHU KHDGV LQ D FOXVWHU ,Q WKH GHWHFWLRQ SKDVH WKH EDVH VWDWLRQ GHWHFWV WKH PDOLFLRXV QRGH ,Q 5HPRYDO SKDVH WKH PDOLFLRXV QRGH LV UHPRYHGE\WKHEDVHVWDWLRQ9DULRXVWHFKQLTXHVIRUEODFNKROH GHWHFWLRQDQGSUHYHQWLRQDUHVKRZQLQILJXUH 7KH ZRUN FDQ EH H[WHQGHG E\ LPSOHPHQWLQJ WKH SURSRVHG DOJRULWKP DQG LQWURGXFLQJ QHZ WHFKQLTXH IRU FOXVWHU KHDG VHOHFWLRQ VR DV WR RSWLPL]H HQHUJ\ FRQVXPSWLRQ RI WKH VHQVRU QHWZRUN

3402


>@ .KDWWDN+1L]DPXGGLQ1.KXUVKLG)XO$PLQ13UHYHQWLQJ EODFN DQG JUD\ KROH DWWDFNV LQ $2'9 XVLQJ RSWLPDO SDWK URXWLQJ DQGKDVKLQ1HWZRUNLQJ6HQVLQJDQG&RQWURO,&16& WK ,((( ,QWHUQDWLRQDO &RQIHUHQFH RQ YRO QR SS $SULO >@ .DPDUXODULILQ $EG -DOLO =DLG $KPDG -DPDOXO/DLO $E 0DQDQ ³6HFXULQJ5RXWLQJ7DEOH8SGDWHLQ$2'95RXWLQJ3URWRFRO´ ,((( &RQIHUHQFH RQ 2SHQ 6\VWHPV ,&26 6HSWHPEHU /DQJNDZL0DOD\VLD >@ 6KHHOD'6ULYLGK\D95$VPD%HJDP$QMDOLDQG&KLGDQDQG*0 ³'HWHFWLQJ%ODFN+ROH$WWDFNVLQ:LUHOHVV6HQVRU1HWZRUNVXVLQJ 0RELOH $JHQW´ ,QWHUQDWLRQDO &RQIHUHQFH RQ $UWLILFLDO ,QWHOOLJHQFH DQG(PEHGGHG6\VWHPV,&$,(6 -XO\6LQJDSRUH >@ *DPEKLU 66KDUPD 6 331 3ULPH SURGXFW QXPEHU EDVHG PDOLFLRXV QRGH GHWHFWLRQ VFKHPH IRU 0$1(7V LQ $GYDQFH &RPSXWLQJ&RQIHUHQFH,$&& ,(((UG,QWHUQDWLRQDOYRO QRSS)HE >@6LQJK 3. 6KDUPD * $Q (IILFLHQW 3UHYHQWLRQ RI %ODFN +ROH 3UREOHPLQ$2'95RXWLQJ3URWRFROLQ0$1(7LQ7UXVW6HFXULW\ DQG 3ULYDF\ LQ &RPSXWLQJ DQG &RPPXQLFDWLRQV ,((( WK ,QWHUQDWLRQDO &RQIHUHQFH RQ YRO QR SS -XQH >@$O6KXUPDQ0@3D\DO 1 5DM DQG 3UDVKDQW % 6ZDGDV´'35$2'9 $ G\QDPLF OHDUQLQJ V\VWHP DJDLQVW EODFN KROH DWWDFN LQ $2'9 EDVHG 0$1(7´ ,QWHUQDWLRQDO -RXUQDO RI &RPSXWHU 6FLHQFH ,VVXHV ,-&6, 9RO,VVXHSS >@%DVLOLV 0DPDOLV 'DPLDQRV *DYDODV &KDUDODPSRV .RQVWDQWRSRXORVDQG *UDPPDWL 3DQW]LRX´&OXVWHULQJ LQ ZLUHOHVV VHQVRU QHWZRUNV´ =KDQJ5),' DQG 6HQVRU 1HWZRUNV $8B&3DJH3URRI3DJH >@, .URQWLULV = %HQHQVRQ 7 *LDQQHWVRV ) )UHLOLQJ DQG 7 'LPLWULRX ³&RRSHUDWLYH LQWUXVLRQ GHWHFWLRQ LQ ZLUHOHVV VHQVRU QHWZRUNV´ 6SULQJHU - :LUHOHVV 6HQVRU 1HWZRUNV SS >@'RXPLW 66 $JUDZDO '3 6HOIRUJDQL]HG FULWLFDOLW\ DQG VWRFKDVWLF OHDUQLQJ EDVHG LQWUXVLRQ GHWHFWLRQ V\VWHP IRU ZLUHOHVV VHQVRU QHWZRUNV LQ 0LOLWDU\ &RPPXQLFDWLRQV &RQIHUHQFH ,(((YROQRSS >@&KULV .DUORI 'DYLG :DJQHU ³6HFXUH URXWLQJ LQ ZLUHOHVV VHQVRU QHWZRUNV DWWDFNV DQG FRXQWHUPHDVXUHV´ $G +RF 1HWZRUNV ±(OVHYLHU%9 >@$QOL @,VPDLO%XWXQ6DOYDWRUH'0RUJHUDDQG5DYL6DQNDU³$6XUYH\RI ,QWUXVLRQ 'HWHFWLRQ 6\VWHPV LQ :LUHOHVV 6HQVRU 1HWZRUNV´ ,((( &20081,&$7,2166859(@.*LOO 6KXDQJ+XD @0XNHVK 7LZDUL .DUP 9HHU $U\D 5DKXO &KRXGKDUL .XPDU 6LGKDUWK&KRXGKDU\³'HVLJQLQJ,QWUXVLRQ'HWHFWLRQWR'HWHFW%ODFN KROH DQG6HOHFWLYH )RUZDUGLQJ $WWDFN LQ :61 EDVHG RQ ORFDO ,QIRUPDWLRQ´ ,(((WK ,QWHUQDWLRQDO &RQIHUHQFH RQ &RPSXWHU 6FLHQFHVDQG&RQYHUJHQFH,QIRUPDWLRQ7HFKQRORJ\,&&,7 >@+HUR 0RGDUHV 5RVOL 6DOOHK $PLU +RVVHLQ 0RUDYHMRVKDULHK´ 2YHUYLHZRI 6HFXULW\ ,VVXHV LQ :LUHOHVV 6HQVRU 1HWZRUNV´ $&0 UG ,QWHUQDWLRQDO&RQIHUHQFH RQ &RPSXWDWLRQDO ,QWHOOLJHQFH 0RGHOOLQJDQGVLPXODWLRQ >@KWWSZZZJRRJOHFRLQLPJUHV"LPJXUO KWWSYOVVLWLLLWNJSHUQHWLQ DQWYBPHGLDLPDJHVWKHRU\DQWZVQOHDFKSQJ LPJUHIXU

$87+25¶6352),/( 3UDFKL'HZDOKDVFRPSOHWHGKHU%7HFKLQ &RPSXWHU 6FLHQFH DQG (QJLQHHULQJ IURP 5DMLY *DQGKL 3URXG\RJLNL9LVKZDYLG\DOD\D 5*39 8QLYHUVLW\%KRSDO1RZVKHLVSXUVXLQJ 07HFK LQ &RPSXWHU 6FLHQFH DQG (QJLQHHULQJIURP&'$&1RLGDDIILOLDWHG WR **6,38 +HU UHVHDUFK DUHDV LQFOXGH 1HWZRUNHG V\VWHPV DQG DOJRULWKPVPRELOHQHWZRUNLQJDQG:LUHOHVV6HQVRUQHWZRUNV *DJDQGHHS 6LQJK 1DUXODUHFHLYHG KLV %7HFKLQ &RPSXWHU 6FLHQFH DQG (QJLQHHULQJ IURP *XUX 7HJK%DKDGXU ,QVWLWXWH RI 7HFKQRORJ\ *7%,7 DIILOLDWHG WR *XUX *RELQG 6LQJK ,QGUDSUDVWKD 8QLYHUVLW\**6,38 1HZ'HOKL1RZKH LV SXUVXLQJ 07HFKLQ &RPSXWHU 6FLHQFH IURP &'$& 1RLGD DIILOLDWHG WR **6,38 +H KDV SXEOLVKHG YDULRXV UHVHDUFK SDSHUV LQ YDULRXV QDWLRQDO LQWHUQDWLRQDO MRXUQDOV DQG FRQIHUHQFHV +LV UHVHDUFK DUHDV LQFOXGH 6HPDQWLF :HE ,QIRUPDWLRQ 5HWULHYDO 'DWD 0LQLQJ &ORXG &RPSXWLQJ DQG .QRZOHGJH 0DQDJHPHQW +H LV DOVR D PHPEHU RI ,(((6SHFWUXPDQG&6, 9LVKDO -DLQ KDV FRPSOHWHG KLV 07HFK &6( IURP 86,7 *XUX *RELQG 6LQJK ,QGUDSUDVWKD 8QLYHUVLW\ 'HOKL DQG GRLQJ 3K'LQ&RPSXWHU6FLHQFHDQG(QJLQHHULQJ 'HSDUWPHQW /LQJD\D¶V 8QLYHUVLW\ )DULGDEDG 3UHVHQWO\ +H LV ZRUNLQJ DV $VVLVWDQW 3URIHVVRU LQ %KDUDWL 9LG\DSHHWK¶V ,QVWLWXWH RI &RPSXWHU $SSOLFDWLRQV DQG 0DQDJHPHQW %9,&$0 1HZ'HOKL+LVUHVHDUFKDUHD LQFOXGHV:HE7HFKQRORJ\6HPDQWLF:HEDQG,QIRUPDWLRQ5HWULHYDO +HLVDOVRDVVRFLDWHGZLWK&6,,67(


3403

2016 3 International Conference on Computing for Sustainable Global Development , 16 18 March, 2016 Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)

A Novel Approach for Precise Search Results Retrieval Based on Semantic Web Technologies Usha Yadav

Gagandeep Singh Narula

CDAC NOIDA Email ID: [email protected]

CDAC Noida, INDIA Email ID: [email protected]

Neelam Duhan

Vishal Jain

CSE Department, YMCA University of Science and Technology, Faridabad, INDIA Email ID: [email protected]

Bharati Vidyapeeth’s Institute of Computer Applications (BVICAM), New Delhi, INDIA Email ID: [email protected]

$EVWUDFW :HE GRFXPHQWV DUH JURZLQJ H[SRQHQWLDOO\ DQG WKH UHVXOWV UHWULHYHG E\ WUDGLWLRQDO VHDUFK HQJLQHV DUH PDUNHG E\ LUUHOHYDQWDQGLQFRQVLVWHQWUHVXOWV,QRUGHUWRDOOHYLDWHWKLVSUREOHP DQG LQFUHDVH GHJUHH RI UHOHYDQFH WKHUH LV QHHG WR PRYH WRZDUGV GHYHORSPHQW RI VHPDQWLF VHDUFK HQJLQHV 6XFK HQJLQHV SURGXFH UHVXOWV E\ IRFXVLQJ RQ PHDQLQJ RI FRQWH[W UDWKHU WKDQ VWUXFWXUH RI FRQWHQW 7KH SDSHU GHVFULEHV SURSRVHG DUFKLWHFWXUH RI VHPDQWLF VHDUFK HQJLQHWKDWWDNHVZHEVHDUFKUHVXOWVDVLQSXWDQGUHILQHVWKHPZLWK QRWLRQ RI VHPDQWLFV 7KH RXWSXW SURGXFHG ZRXOG EH SUHFLVH DQG UHOHYDQWUHVXOWVLQFRQWH[WRIXVHU¶VTXHU\ .H\ZRUGV .QRZOHGJH 5HWULHYDO 2QWRORJ\ 6HDUFK HQJLQHV 6HPDQWLFZHEDQG6HPDQWLFVLPLODULW\

It produces relevant as well as irrelevant results in relation with terms-mobile phones, red lotus, flower and cover. The search experience does not consider stopping words, auxiliary verbs that reflects the meaning of given statement. Likewise in above query, the term “with” has lost its significance due to which results are being produced in context of lotus and red flower. These engines have no inference mechanism to deduce relationship between semantically similar words. E.g.: If a user has entered query on heart disease then it would not return pages that consist of cardiac term in it. In order to reduce this ambiguity and perform intelligent search, concept of SW came into existence in 1996 as envisioned by Tim Berners Lee [1]. SW is defined as global mesh of information in machine interpretable format [2]. It is practically not feasible to annotate the entire web content into semantic tags so that current search engines could behave like SSE. So, there is need to develop search engine that analyses user query and produces meaningful results with higher precision and low recall. The paper is divided into following sections: 6HFWLRQ ,, describes survey of researches conducted in context of evolution of SSE’s and their methodologies. 6HFWLRQ ,,, describes the aspects of SW and need for development of SSE’s. It also throws some light on SW technologies. In 6HFWLRQ,9, a novel architecture of SSE has been proposed that can enhance Google search results with the help of SW technologies followed by conclusion and references.

NOMENCLATURE KB

Knowledge Base

SW

Semantic Web

SSE

Semantic Search Engine

RDF

Resource Description Framework

XML

Extensive Markup Language

OWL

Ontology Web Language

JAR

Java Archive

GUI

Graphical User Interface

WWW World Wide Web CPI

IIRELATED WORKS

Classes, Properties, Instances IINTRODUCTION

Traditional search engines are massive source of retrieving information from web. The results are being produced by performing keyword based search. The main drawback of search engines is lack of relevance. Consider a query “Mobile phones with red cover” entered in traditional search engines.

c 978-9-3805-4421-2/16/$31.00 2016 IEEE

Several studies that have been conducted with an aim to build SSE and ranking of results as follows: Ilyas et.al [3] proposed conceptual architecture of SSE that uses KB for deriving inferences. This base is being created from mapping of RDBMS. But the architecture does not include any method to retrieve data from KB. Mukhopadhyay et.al [4] devised a framework for domain specific ontology based search engine that focuses on agricultural information about state of

1357 259

2016 3 Interrnational Confe ference on Com mputing for Suustainable Globbal Developmeent , 16 West Bengal. It prroduces relevant results by mapping m of classses ork is unable too retrieve relevvant andd instances. But this framewo resuults from heteerogeneous forrmats of dataa. Dong et.al [5] propposed domainn specific onto ology based seearch engine that t focuuses on transpport services. It includes use of Case Based Reaasoning algorithhm to design KB K and then appplies the conccept of threshold t to rannk the given results. r This appproach could not worrk well due to lack of annotaated data in context of transpport servvice. Qu. et.al [6] presented framework f of SSE S with the help h of JENA framew work. The sy ystem uses manual m ontoloogy creaation and JENA’s in-built seervice functionns. Sinha et.al [7] desiigned prototyppe for multip ple domains including boooks, meddicine, mobilees that stores their RDF coontent from web w pagges into reposittory. But the model m failed too create semanntic annnotations of RD DF content into o OWL format. OWL is stronnger langguage than RDF R and conv veys inherent meaning bettter. Mallik et.al [8] proposed p frameework for SPA ARQL and RDF R queery processing,, optimization and evaluatioon. Kara et.al [9] propposed ontologgy based search engine by taking t conceptt of rankking and semanntic web index xing into considderation. IIICONCEP PT OF SEMAN NTIC WEB AN ND ONTOLOG GY Futuuristic web (w web 3.0) or seemantic web (S SW) is enhancced verssion of web 2.0 that consistss of annotated documents d whhich are understood byy machines. $&KDOOHQJHVRI6: In spite s of variouss efforts led by y researchers, SW S has remainned a fuuture concept or o technology due d to followinng reasons: x SW technoologies and co omponents are not well definned in such ways w that app plying its techhnologies requuire extensive knowledge k by users. u x Higher scaalability and searching cost. %2QWRORJ\ Fig 1 defines seveen definitions of ontology thhat can be usedd in mal, diffferent discipliines viz phillosophical disscipline, form speccified, concepttualization of system s using logical theory and a manny more. Theyy stated that log gical theory heelps developerss to builld ontology ass “$ ORJLFDO WKHRU\ W ZKLFK JLYHV J DQ H[SOLLFLW SDUUWLDODFFRXQWRIIDFRQFHSWXDOL]]DWLRQ´>@

18 March, 2016

&1HHGGWRGHYHORS666( Followiing parameterss needs to be considered c for development of SSE E that would produce releevant results with higher precisioon to recall ratio. x Efficient E rankiing of web pages: p - Page ranking is a m method of graading web paages such thatt pages with h highest ranking and popularrity appear at top t of search r results [11]. The T syntax bassed search enggines employ v various algoriithms for sortting and arrannging results e either relevantt or irrelevant. It is therefore essential to i improve synttax based allgorithms to rank pages s syntactically ass well as semanntically. x User U experiencce: - The searrch experiencee gained over t traditional searrch engines is not satisfactorry. Users face i issues like poor representattion of query, grammatical m mistakes and innconsistent ressults with respeect to entered q query. So, there is need of substantial devvelopment of S that can be embedded with SSE w personalizaation features. x Accuracy A x Time T retrieval '6:7 7HFKQRORJLHV They arre listed below:: x XML: X - XML L is an acronyym for extenssible markup l language that uses u self descriiptive documennts. Users can d develop tags reelated to these documents usiing Data type d definition (DTD D). x XML X Schema: - Vocabulary to define XML L documents. x RDF: R - An acronym for fo Resource Description F Framework thaat maintains reelationships bettween objects u used in RDF model. m 5HVRXUFFH 3URSHUW\and 6WDWHPHQWWare componennts of RDF model. Understand it with examplee that Fertilizeer required to w is IS789. Here, fertilizer required is property, grow wheat wheat is i resource andd IS789 is vallue. Whole coombination of these is termed as Stattement.

Fig. 2. RDF Moddel [12]

IVPROPOS SED SSE OGLE search The maain aim of propposed SSE is to analyze GOO results and refines thhem with the notion of sem mantics. The w: layout is shown below $&RPSSRQHQWVRISURSSRVHG66( The prooposed framew work consists off three phases: x Generation G of user query grraph with the help of SW t technologies.

Fig. 1. The seven deffinitionsof ontolog gy

x G Generation off document reelation graph by b analyzing G GOOGLE searrch results.

1358


r by acchieving semanntic x Perform reelation based ranking similarity between b two grraphs. Firsst phase involves following components: a) *8, h search is perrformed is treaated The interrface on which as main component c of any a search enggine. In contextt of traditionaal search eng gines, queries are written by developeers and results are matched with pre-definned keywords stored in dattabases. But inn proposed woork, ontologyy is used as bacckend in interfaace. In givenn framework, input query is being passed through user interfacee as well as GOOGLE G seaarch engine. It is passed to search enggine in order to ults with thee help of SW S enhance search resu technologgies

me=”namespace” Syntax: whoose namespacee is metadata name n and uri of namesp pace is contentt description. H SearchingIRUUVHPDQWLFDOO\UHODWHGWHUPV This task adds lexical effect to thhe proposed framework. It finds synoonyms related to terms in w the help of o software callled WordNet given query with 3.0.

Fig. 5. The term “bookss” has various senses like it may refe fer to volumes, scripts and ledgers. l

I

ImportingH[LVWLQJRQWRORJ\\ ÉGÉ 3.4 beta The proposeed framework uses *PROTÉ for importinng existing ontology o related to given domain.

. Figg. 4. User designeed GUI with query “List the books frrom bookshop whoose price is less l than 500”

Query([[WUDFWLRQ The querry entered by user u needs to be b extracted. This T process involves sub tasks like query q processiing, U nameespaces, findding identificaation of URL’s, semanticcally similar words w and extraacting knowleddge from givven query. These sub taskks are discussed below. kenization/Stem mming) c) Query Prrocessing (Tok This taskk involves rem moval of stop words, w parsingg of given quuery, syntax an nd validation of query. It may m involve use u of Stanford d Parser *. G IdentificaationRI85,¶VDQGQDPHVSDFFHV Since UR RL’s and nameespaces are in HTML H format,, so this tassk involves conversion of o HTML/XM ML informatiion and links into semantic languages like l RDF/OW WL. This identtification leadss to searchingg of meta-tags within keywords that will not n leave any unu accessed data. E

Fig. 6. Quuery on domain onntology BookShopp

N IDE 8.0. It is done This onntology is then connected to NetBeans by addinng PROTÉGÉ libraries and JAR J files in javva class.


1359

2016 3 Interrnational Confe ference on Com mputing for Suustainable Globbal Developmeent , 16

18 March, 2016

Fig. 7. Importing ontolo ogy in NetBeans ID DE 8.0 Fig. 8. Coonnecting JENA with w NetBeans IDE E

61,,33(7VKRZLQJJ3527e*e2 2:/OLEUDULHV: packkagegaggs; importedu.stanford.smi.protegex.ow wl.model.OWLM Model; importedu.stanford.smi.protegex.ow wl.model.OWLN NamedClass; wl.ProtegeOWL;; importedu.stanford.smi.protegex.ow publlic class java { public staticc void main(Strin ng[] args) { OWLModdelowlmodel= (O OWLModel) PrrotegeOWL.creaateJenaOWLModdel(); OWLModdel.getNamespacceManager().setD DefaultNamespaace ("http://hello.com#"); OWLNam medClassworldcllass= ow wlModel.createO OWLNamedClasss("World"); System.ouut.println("Classs URI:" + worldcclass.getURI()); } }

JENA API A consists off in-built reasooners that are used u to derive inferencces from givenn ontology andd presents them m in form of graph.

Tilll this step, onttology has beeen imported inn NetBeans ID DE. Now w, it’s time to extract e knowleedge from that ontology. g) ExtractinngNQRZOHGJHIIURPJLYHQRQWRRORJ\ *Apachee JENA framew work can be used u to repressent relationshhip between CPI C from givenn ontology. It will w lead to foormation of KB B.

Fig. 9. 9 JENA Inferencce API layers

KNOWLE EDGE BASE

1. Is (Author, Furozan)Æ imp plies Furozan is an author. Data communications, Computeer Science)2. Includes (D Æ implies thhat computer sccience books inncludes data communicatioons

1360

61,33( (7 WR LPSRUW 0RGHO)DFFWRU\ FODVV IURP I IROGHU RUJDSDDFKHMHQDJUDSKK packageegui; import static s org.apachhe.jena.assembbler.JA.OntMoodelSpec; importoorg.apache.jenaa.rdf.model.MoodelFactory; public class c Gui { public static s void mainn(String[] argss) { Ont m; ModelFactory.createOntologyyModel m=M (OnntModelSpec.O OWL_MEM, nuull); } } d Phase of SSE E Second This phhase involves saame pre-processsing steps till generation of documeents. Then a relationship among a those documents d is extracteed with the use of Knowleddge Graph [13]]. The output produceed in this phasee is page graphh or document graph. g Third Phase P of SSE (Relation ( Baseed Ranking)


This phase requires matching of query graph and page graph. In both graphs, concepts are represented by nodes while relations are represented by edges. It involves use of relation based ranking algorithm that follows certain steps: x Access all page databases and build unordered result set including all pages consisting keywords in user query. x Analyze user query and construct sub graph. Number of concepts and relations are noted down from this graph. x For each page in the result set, generate page sub graph and compute all page spanning forests. From this graph, note number of concepts and relations too. x Then compare both graphs and merge them using probabilistic formula explained below, (TXDWLRQ P (AŀB) = 1 – (n(G(AŀB)) + r(G(AŀB)) / n(G(A)) + n(G(B)) + r(G(A)) + r(G(B)) [14] where(n(G(AŀB))is number of nodes common in both graphs. r(G(AŀB))is number of relations common in both graphs n(G(A))is number of nodes in graph A n(G(B))is number of nodes in graph B r(G(A))is number of relations in graph A r(G(B))is number of relations in graph B After ranking of improved results, the user is provided with enhanced search results that are visible in GUI. VCONCLUSION AND FUTURE SCOPE The paper addresses the problems of traditional keyword based search engines that processes query syntactically rather than semantically. In order to increase degree of relevance and higher precision to recall ratio, the paper describes proposed architecture of SSE which incorporates GOOGLE search results as input and enhances them with the help of SW technologies. Modules like query processing, import existing ontology, extraction of knowledge and many more have been introduced in proposed framework. At last, relation based ranking algorithm is being applied to compare query and document graphs. It leads to enhanced ranked results that are presented to user. As a future work, the work can be extended by developing agent based search engine that utilizes JADE ontology with user interface. APPENDIX Stanford Parser It is syntactic natural language parser that considers structure of sentences into consideration. It has components like crawler, URL server, store server and Indexer. WordNet It is an electronic lexical network developed since 1985 at Princeton University. It consists of diversity of vocabulary information. Total number of words in its database is 155287 (nouns, verbs, adverbs, adjectives) PROTÉGÉ

Protégé is one of the most widely used domain independent ontology editing tools. It exists in various versions like protégé beta, 3.4, web protégé which can be accessed freely through web [15]. JENA The first version of JENA was released in 2000. JENA2 was released in 2002. It is java framework for building semantic web applications that provides programmatic environment for RDF, RDFS, and OWL and consists of rule based inference engine. REFERENCES [1]

Berners-Lee, Tim, James Hendler, and Ora Lassila"The semantic web.", 6FLHQWLILFDPHULFDQ 284.5 (2001): 28-37. [2] Tim Berners-Lee, The Semantic Web Revisited, IEEE Intelligent Systems, 2006 [3] Qazi Mudassar Ilyas, Yang Zong Kai and Muhammad Adeel Talib, “A Conceptual Architecture for Semantic Search Engine”, IEEE, 2004 [4] Debajyoti Mukhopadhyay, Jhilik Bhattacharya, Sreemoyee Mukherjee and Aritra Banik, "A Domain Specific Ontology Based Semantic Web Search Engine", 7th International Workshop on MSPT Proceedings, 2007 [5] Hai Dong, Farookh Khadeer Hussain, Elizabeth Chang “Transport Service Ontology and Its Application in the Field of Semantic Search.” IEEE, 2008, Digital Ecosystems and Business Intelligence Institute, Australia [6] Junhua Qu, Chao Wei, Wenjuan Wang and Fei Liu, “Research On a Retrieval System Based On Semantic Web”, International Conference on Internet Computing and Information Services, 2011. [7] Sukanta Sinha,Rana Dattagupta and Debajyoti Mukhopadhyay, “Designing an Ontology based Domain Specific Web Search Engine for Commonly used Products using RDF”, CUBE 2012, September 3–5, 2012, Pune, Maharashtra, India. ACM 978-1-4503-1185-/12/09 [8] M. Arenas and J. Perez, ʊQuerying the Semantic Web Data with SPARQL‫ۅ‬, Proc. of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS 11), ACM, New York, NY, USA, 2011 pp. 305-316, doi:10.1145/1989284.198931 [9] Soner kara, Ozgur Alan, Orkunt Sabuncu, Samet Akpnar, Nihan K. Ciceki and Ferda N. Alpaslan, “An Ontology Based Retrieval System Using Semantic indexing” Information systems Volume 37 Issue 4, june 2012, PP:-294-305 [10] Sowa, J.F., “Ontology, metadata, and semiotics”, Lecture Notes in AI #1867, Springer-Verlag, Berlin, 2000, pp. 55-81 [11] J. Beal, ʊGeographical research and the problem of variant place names in digitized books and other full-text resources‫ ۅ‬Library Collections, Acquisitions, and Technical Services, vol. 34, Issues 2–3, 2010, pp. 7482. doi: http://dx.doi.org/10.1016/j.lcats.2010.05.002 [12] Gagandeep Singh, Vishal Jain , Dr.Mayank Singh, “Ontology Development Using Hozo and Semantic Analysis for Information Retrieval in Semantic Web” in ‘ICIIP-2013 IEEE Second International Conference on Image Information Processing ‘, Jaypee Univ. Shimla, 911 Dec 2013 [13] http://searchengineland.com/library/google/google-knowledge-

graph [14] Poonam Chahal, Manjeet Singh, “An Ontology Based approach for finding semantic similarity between web documents”, ,QWHUQDWLRQDO -RXUQDO RI &XUUHQW (QJLQHHULQJ DQG 7HFKQRORJ\´ Vol. 3, No.5 (December 2013), ISSN 2277-4106 Daniel L. Rubin,Natalya F. Noy and Mark A. Musen, “Protégé: A Tool for Managing and Using Terminology in Radiology Applications”, Journal of Digital Imaging. 2007 Nov; 20(Suppl 1)34-46


1361

Fig. 3. Proposed SSE framework

1362


Effective and Efficient Digital Advertisement Algorithms Vishal Assija, Anupam Baliyan and Vishal Jain

Abstract In today’s world when everyone wants the best deal when it comes to online shopping, it becomes tedious job for companies to target their audience with appropriate content and advertising media. This paper talks about such issues faced by various companies nowadays and tries to resolve those issues by implementing some algorithms or by introducing a new platform to showcase advertisements online. In this we also talk about efficient way of tracking users online and analyze their data to showcase content based on their preferences. Our focus is also on the usage of “AdBlock” plug-in, that is, how to stop it or prevent it to block advertisements online. At last we try to deliver advertisements in such a way that they should take as less as bandwidth they can and increase the page load time.

Keywords PPC ISP PPV CPM CPC CPA RTB DMP CPL ASCII Bots DSP ROI

1 Introduction The rise of the Internet has substantially changed the advertising industry’s business models as well as their client basis. One of the most striking innovation is how to sell advertisement space online. Advertisements in the form of banners, are sold through the traditional model, i.e., cost per impression, but also with methods based on a visitor taking some specifically defined action in response to an advertisement. This research examines the way in which advertiser can showcase the ads online as well as pricing strategy of web publishers operating in a market where they are V. Assija (&) A. Baliyan V. Jain Bharati Vidyapeeth’s Institute of Computer Application and Management, Delhi, India e-mail: [email protected] A. Baliyan e-mail: [email protected] V. Jain e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 A.K. Saini et al. (eds.), ICT Based Innovations, Advances in Intelligent Systems and Computing 653, https://doi.org/10.1007/978-981-10-6602-3_9 265

83

84

V. Assija et al.

unable to influence the price of the advertisement (i.e., whether cost per impression is better or cost per action) and effectiveness and efficiency of algorithms that are used to display advertisements online. Traditionally, advertisers sell online space to marketing companies where they can showcase their banner and in lieu of that need to pay for the space provided. But now this has been revolutionized by the technologies like “Google AdWords & AdSense”, where one can monitor how much revenue he/she is able to make via particular Ad campaign. So, to increase user interaction we need to understand his behavior and activities well so that the content which user likes will be shown at appropriate place, so that the chances of user engagement increases. Advertiser will be charged only if user clicks or sees the advertisements, i.e., pay-per-click (PPC) or pay-per-view (PPV). After the introduction of these type models, advertiser starts investing more in online space, since no user means no charges, while on the other hand, more users engagement means more revenue for the advertiser as well as successful advertisement campaign. While some of the users already have “ADBLOCK” plug-in for their browser, it is difficult to show advertisement on webpages. But as you know, Internet is full of ideas to overcome this problem we have “Pop-Ups” (they also can be blocked but up to some extent). My idea is to integrate advertisements deeply into webpage that none of the plug-in can block it and on the other side the bandwidth used to deliver the ads will be minimized. Main focus is on how to increase the revenue of online advertiser by displaying relevant ads to particular user and make user engagement, so that the hit ratio will increase tremendously. But to keep in mind that the users’ concerns like privacy and what data related to particular user, where, and how to be used still needs some answers from advertisers. In long run, it seems to be great model to increase user engagement by keeping all the user interests safe.

2 Literature Review If you look at late 90s when Internet was at its boom, someone introduced an innovative idea of displaying ads, i.e., “Banner” and the inventor is HotWired. At that time, browsers are not smart enough to block pop-ups that are displaying ads, so at that time this model is very successful as it gives good ROI to advertisers. But it has its own consequences like slow page loads because of loading the extra images to be shown as advertisement and script related to these ads (it seems relevant because displaying ads over a dial up connection takes too long to load page and relatively slow speeds that are provided by ISPs). It was just the beginning of new era of online marketing and advertisement which no one could ever imagined of and its fueled the portal war, whether reaching to customer in a traditional way is better or online where advertiser will only be charged if customer clicks on their banner ads. At that time banner ads powered the web first explosion of advertisements and generating revenue for websites owners. So, in association with HotWired, Wired Magazine introduced a new

266

Effective and Efficient Digital Advertisement Algorithms

85

search engine called HotBot which basically targets audience with specific ads and it was generating more than $25 million a year in revenue. That amazed everyone and attracts the attention of venture capitalist to start investing in it and secure the elusive “early mover advantage”.

And then four students from Harvard invented a search engine called “Google” (currently). Once it gained popularity, Google started finding ways to generate revenue by its millions of users to visit its site and search anything they wanted to know or explore. They came with an idea of showing relevant ads to users corresponding to their search queries or frequent behavior on website. By implementing such model, Google earned a lot of money (to be specific billions of dollars from its search advertising revenues). It is time for Google to evolve as a big online advertisement aggregator in the United States of America to surpass the Yahoo. First, most concern of every advertisers is to reach brand safe environments and well-lit areas also known as publishers without porn. And publishers are expecting some good amount of revenue from advertisers that do not harm their brand either. So, it is the start of complication, because there are so many places for advertisers to buy and for publisher to sell. To avoid such complications and to increase the process transparency in publishing of advertisements online, there is a need of intermediary companies which will overlook at all these things and provide high-quality ads to be published via publishers. The main business of these is to provide online space via buy/sell model and distinguish themselves as either DSPs (Demand Side Platform) or SSPs (Seller Side Platform). To understand how diverse the advertisement space online is, please take a look at picture given below

267

86

V. Assija et al.

If you try to figure out how these companies are making money on the Internet, then you will land in a pool which seems to be filled with water but when you jump on it, you realize that it is empty. Since generating revenue by displaying ads online is not as easy as it sounds. It requires a lot of skills and energy to get some good amount of revenue. You can take example of any big money making business say, legal, financial, medical, gaming, etc., that are going to be focused on online advertisements rather than traditional marketing. You can say that this is the Internet version of Wall Street jive. So make sure you understand all these acronyms listed below before exploring online advertisements world. CPM (Cost Per Impression): It simply means cost to be incurred for an online advertisement per thousand views. Here M stand for 1,000 (i.e., Roman Numeral). CPC (Cost Per Click): It is also known as PPC and it is called so because, as user hits on the banner ads only then advertiser will be charged. This is an online ads model used to direct traffic to websites (traffic refers to human being), where advertiser will be charged by publisher only if the user clicks on the ad displayed in banner. Some of the biggest players in this category are: Google AdWords, Yahoo! Search Marketing, and Microsoft adCenter. All of these operate under the model which is called “Bid-Based”. That simply means advertiser is on safe side, they can pay only if user clicks on their ads, else they will not be charged and it creates an effective way of displaying ads online. CPL (Cost Per Lead): In this model of online advertisement, advertiser pays only for a sign-up from (which is explicit) a consumer interested in ads offer. If you compare this to the aforementioned models, advertisers gain advantage of legitimate qualified sign-ups by only interested users and making Cost-Per-Lead models the holy grail of the Internet advertising ROI hierarchy. CPA (Cost Per Action): As the name suggests, in this model advertiser needs to pay only if specific action is performed by the user. (Actions like: click on the banner, filling of sign-up form, user engagement on a page for more than 5 min, purchase from some specific merchant, etc.). Action-based model attracts advertiser who prefers direct response from users (as in brand marketing it is not required). RTB (Real Time Bidding): In this type of model advertiser and publisher meet in real time in the online marketplace for the buying and selling of ad impressions. DMP (Data Management Platform): It forms the backbone required by the advertiser to collect user data for different advertisement operations. It works a cross bridge between servers that collect user data and the servers that displays advertisements to the users (it is used to build customer relationship management tools).

268


87

3 Related Work To understand the user behavior online we have used “Google Analytics Tool” on a website. After using it for a while we have seen that online marketing and advertisements companies can track users rigorously and they have all the information about a particular user that they need to display ads. Some of the screenshots of Analytics are shown below

The above picture shows the total number of sessions, users, and page views. It also tracks the session duration to know which page has high user engagement. Using Google Analytics we can also track which browser is used by users to open our website, so that we can optimize our website according to the users’ browsers usage history. And if you have enabled specific publisher account conversion ratio within Google analytics tool, then it will also shows how much revenue is generated by the browsers listed below.

We can also track users’ geographic locations to cater the needs of users better. For example, currently our website is in English language only but we are having huge traffic from Asia (India and China), in that case we can translate out website content to the specific languages to make user happy and in the end it will increase the traffic as well as user engagement.

269

88

V. Assija et al.

It shows user engagement on the basis of sessions. Dark blue color shows high user engagement from that particular geographic location and as the color fades, the user will also decrease. It also shows all this information in tabular form. (shown below)

270


89

4 Problem Statement While the number of Internet users increases tremendously, digital advertisement industry also grows rapidly. Although there are some catches to it. Like users are more aware of their data privacy that advertisements companies try to steal, and they know how prevent these things. For marketing companies, users data is very important to plan their campaigns and future advertisements plans but private mode and dynamic IP are those problems that need to be taken care of; otherwise it is impossible to find out whether the request made by a new user or previous user. Privacy is one concern and other than that is how to target specified media in the form of advertisement that would make a great impact on user. Simply means how to design such ads that would cater the need of user as well as marketing company. Interactive ads require “Flash Player” which most of the users do not have, so it means your ad will not displayed. Once you have user data the new problem that arises is how to use it to deliver ads that will suite user preferences. User data may contain—DOB, Name, Gender, Site visited, Product Watched Page Liked, etc. So how to use this data as an effective tool to deliver ads that will suite user requirements is a big question and need to be answered. AdBlock plug-in is other problem for digital advertisers, it will simply block advertisements on a webpage and remove any third-party link associated to it. Pop-Up blockers are already there to stop showing ads in a pop-up window. So all these things need to be resolved, otherwise all these things will be making a great loss to online advertisers.

5 Proposed Solution When we talk about online advertisements, one thing came to our mind—these advertisements were for users who are online. So all of the advertisement algorithms treat user preferences at utmost priority and deliver only those ads through which user will be benefited. Another thing that needs to be transparent is user data and user privacy. Which data advertiser is using while users visiting any site and how they are going to utilize it must be clear, so that user can understand the privacy policy easily. Advertisements take a lot of bandwidth while transferring and page load takes longer time than usual non-advertisement page. This needs to be resolved; otherwise ads will take a large chunk of useful bandwidth that will use to deliver some important information. This can be done via compressing ad banners and send them a string of characters. At user side it will be processed by browsers and will show some animated ASCII characters Ads which attracts users and helps in increasing revenue for online advertisers.

271

90

V. Assija et al.

Another thing to keep in mind is how to differentiate legitimate user and fake bots online. Fake bots are built to submit spurious information that will affect the advertisers’ policy of showcasing a particular ad to a group of users. So identifying fake bots is also a big challenge and need to be implemented as soon as possible. Advertisements blockers and plug-in which facilitates the blocking of advertisements are need to be taken care of. Every year, billion dollars revenue loss are done by these kinds of plug-ins. In respect of user terms, they serve the purpose of blocking advertisements, but on the other hand, users will miss the great opportunity of finding a great deal online according to their preferences.

6 Conclusion/Future Research A lot of research is required to deal with all these problems related to online advertisements; and proposed solutions may provide temporary solution to the problems stated, but in long term there may arise new problems which would affect the revenue of advertisement companies in many ways. To overcome this, we need a standard policy for online advertisements which should be followed by all the companies and all the Internet users must be aware of how their data is being utilized by these companies to display advertisements.

References Website References 1. 2. 3. 4. 5. 6. 7.

http://goo.gl/HGPD21—10 Aug. 2015 14:00 Hrs https://goo.gl/IiKcv5—10 Aug. 2015 15:30 Hrs http://goo.gl/PsvJyj—9 Aug. 2015 11:00 Hrs http://goo.gl/dPQ5W2—11 Aug. 2015 13:30 Hrs http://goo.gl/JQKDu—12 Aug. 2015 10:15 Hrs http://goo.gl/Lxm16Z—11 Aug. 2015 20:00 Hrs Allenby, G.M., Leone, R.P., Jen, L.: A dynamic model of purchase timing with application to direct marketing. J. American Stat. Assoc. 94(446), 365–374 (1999) 8. Black, J.: Online advertising: it’s just the beginning. BusinessWeek Online (July 12). Available at http://www.businessweek.com/technology/content/jul2001/tc20010712_790.htm (2001). Accessed 10 Oct 2005 9. Pechmann, C., Stewart, D.W.: Advertising repetition: a critical review of wearin and wearout. In: Leigh, J.H., Martin, Jr., C.R. (eds.) Current Issues and Research in Advertising, vol. 11 (1–2), pp. 285–329. Ross School of Business, University of Michigan, Ann Arbor (1988) 10. Hoffman, D., Novak, T.: When Exposure-Based Advertising Stops Making Sense (and What CDNOW Did About It). Working paper, Owen Graduate School of Management, Vanderbilt University (2000)

272


91

11. Zufryden, F.: A model for relating advertising media exposures to purchase incidence behavior patterns. Manage. Sci. 33(10), 1253–1266 (1987) 12. Sherman, L., Deighton, J.: Banner advertising: measuring effectiveness and optimizing placement. J. Interact. Mark. 15(2), 60–64 (2001) 13. Song, Y.: Proof That On-line Advertising Works. Atlas Institute. Available at https://iab.net/ resources/pdf/OnlineAdvertisingWorks.pdf. Accessed 12 Oct 2005 (2001) 14. Dahlen, M.: Banner advertisements through a new lens. J. Advertising Res. 41(4), 23–30 (2001)

273

Evolution of FOAF and SIOC in Semantic Web: A Survey Gagandeep Singh Narula, Usha Yadav, Neelam Duhan and Vishal Jain

Abstract The era of social web has been growing tremendously over the web. Users are getting allured towards new paradigms tools and services of social web. The amount of information available on social web is produced by sharing of beliefs, reviews and knowledge by various online communities. Interoperability and portability of social data are one of the major bottlenecks of social network applications like Facebook, Twitter, Flicker and many more. In order to represent and integrate social information explicitly and efficiently, it is mandatory to enrich social information with the power of semantics. The paper is categorized into following sections. Section 2 describes various studies conducted in context of social semantic web. Section 3 makes readers aware of concept of social web and various issues associated with it. Section 4 describes use of ontologies in achieving interoperability between social and semantic web. Section 5 concludes the giver paper. Keywords Semantic web (SW) and SIOC

Social web Ontology RDF FOAF XFN

G.S. Narula (&) C-DAC, Noida, India e-mail: [email protected] U. Yadav N. Duhan Department of Computer Engineering, YMCA University of Science and Technology, Faridabad, India e-mail: [email protected] N. Duhan e-mail: [email protected] V. Jain Bharati Vidyapeeth’s Institute of Computer Applications (BVICAM), New Delhi, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 V.B. Aggarwal et al. (eds.), Big Data Analytics, Advances in Intelligent Systems and Computing 654, https://doi.org/10.1007/978-981-10-6620-7_25 274

253

254

G.S. Narula et al.

1 Introduction Combining social and semantic web is a very challenging task. Various studies have been conducted in describing structure of information and maintaining relationship among plethora of web documents. Most of social networking sites are walled websites that means they only provides limited means for users to publish and access social data rather than integration of social data [1]. The era of social websites has gained tremendous height during the past 3–4 years. They have become the medium of marketing and promotions in an innovative way [2]. As the number of social websites increases, the need to achieve secure social data also increases [3]. It has led to an idea of aggregating social data with the help of semantic web technologies. Semantic web consists of various vocabularies/ontologies to deal with social datasets available on different sites in order to allow interoperability through machines rather than XML-based approaches [4]. The paper deal with various constraints associated with social web and describes the capabilities of semantic web technologies which results in evolution of social semantic web. It also examines how semantic web technologies can be used in achieving interoperability among social web applications.

2 State of Art Halpin and Tuffield [5] presented various issues regarding interoperability of data on social and semantic web, in their paper tilled social network and data portability using semantic web technologies. They also used FOAF and SIOC ontologies to identify persons and deduce their relationship respectively. SIOC is used to represent data for multiple users from different social sites. John G. Bresley et al. briefly described privacy issues, role of semantic web and social web in their article titled “Social Semantic Web”. They presented various suggestions for reducing gap between social web and semantic web. The article also includes survey of semantic web applications like semantic wiki that can be used to analyze social data. Specia and Motta [6] performed study on extracting ontologies from collaborative-tagged systems. Xu et al. [7] led to development of ontology as a unified model of social network and semantics.

3 Social Web Social web is a collection of social data published and shared by millions of users, which is being spread over various publishing media, networking media, discussion media and sharing media. The term social web was given by Howard Rheingold in 1996 [8]. The use of latest ICT has changed the social web by introducing various

275

Evolution of FOAF and SIOC in Semantic Web: A Survey

255

second-generation web applications like Wikipedia, blogs, social networking sites (Facebook, twitter), content sharing sites and community sites. The social networking sites are used for maintaining social relations among people all over the world. User accounts are created by maintaining user profiles based on different preferences. Similarly, community driver websites include quora digest, accdemia.edu that connects group of people to share their common interests and answers via query session either by entering comments or live chat assistance. Content sharing sites enables user to post their informative content on social web. The content may be images text, videos etc. It has led to knowledge sharing through social web (Figs. 1 and 2).

3.1

Issues in Social Web

(a) Indexing: Indexing means generation of metadata. It is known that folksonomies/collaborative tagging is the main source of creation of metadata on social web. But due to its ambiguities and inconsistency, it has not expanded much. With the use of semantic web, it is possible to convert folksonomies to ontologies and generate metadata in Resource Description Framework (RDF) having triplessubject, property and object (Fig. 3). Example: Ram likes Fosters. In this statement, Ram is object, likes is property and Fosters is resource. (b) Difficulty in extending and reusing data due to non standards: Social networks focuses on publishing contents of website like songs, images via read only interface rather than using dynamic APIS and interfaces. It has led to portability issues and lack of knowledge representation. To overcome this, there must be usage of semantic applications and browsers to publish content in RDF which eventually leads to interconnection of social networks with multiple data sources.

Fig. 1 Ontology on social web [Ontology is abbreviated as FESC (Formal, Explicit, Shared, and Conceptualization) [17]. Formal means that ontology must be understandable by machines. Explicit defines type of constraints associated with it. Shared means that ontology must be visible to group rather than individual. Conceptualization means it should represent some structured from that holds relevant concepts of given domain]

276

256

G.S. Narula et al.

Fig. 2 Defining classes and attributes of social web

Fig. 3 RDF model

(c) Fake Identity Management: Users make use of different names, aliases and nicknames to create their profiles in different groups. Each group has its terms and conditions that constitutes public privacy as well as closed profiles. (d) Transformation of social knowledge and facts: It is a cumbersome task to mange and transforms facts associated with social entities into semantic web databases. It might be possible that social and semantic interfaces differ in processing capabilities. Although there are well-defined ontologies related to social networks but there are no mechanisms and heuristics defined for integration and utilization of social results into RDF/OWL format.

3.2

Semantic Web

The idea of semantic web as envisioned by Lee [9] focuses on making data machine understandable as much as it is friendly to humans. Semantic Web aims to abridge knowledge gap between humans and machines. It is characterized by open source, flexible commercial modelling technologies and tools that are being used on various community projects to link public data sets from World Wide Web [10]. Semantic web has an initiative to work on knowledge representation and reasoning of given information [11]. The information may be either in structured, semi-structured and

277


257

unstructured form. It consists of semantic web documents (SWDS) that are encoded in semantic web languages like OWL (Ontology Web language), RDF, DAML + OIL and many more. Some of the technologies are listed in Table 1.

4 Achieving Interoperability in Social Web Applications Semantic web consists of various social ontologies to maintain interoperability and portability across various applications. Ontologies/Vocabularies include FOAF (Friend of a Friend), SIOC (Semantically Interlinked Online Communities) and XFN (Friends Network). The brief description of these ontologies is discussed in following subsections.

4.1

FOAF

FOAF is a simple RDF ontology that helps in identifying “who is who” and links them with other persons by using XML/RDF format [12]. It includes mainly three components—ontological definition, ontological properties and empirical properties [13]. One of the most common classes in FOAF ontology is foaf: person. It holds various properties like foaf: name, foaf: email, foaf: gender, foaf: knows and many more (Fig. 4; Table 2). FOAF profile of above ontology can be created using FOAF-a-Matic application [14]. The result is being generated in RDF (Fig. 5).

4.2

XFN

It is an acronym for friend network. It is one of the social ontologies that describe category wise social relationships among different persons [15]. The category may Table 1 Semantic web technologies Technology

Description

1. XML 1.1. XMLs (XML schema)

It is an extensible language that helps users in creating tags to their documents. It provides well-defined syntax for writing content of document It is language for defining XML documents It is an acronym for Resource Description Framework that is used to express data models consisting of objects, property and relationship It is an acronym for RDF schema. It is a description language for defining vocabularies and represents relationship among objects

2. RDF 2.1. RDFs

278

258

G.S. Narula et al.

Fig. 4 FOAF ontology with class foaf: person

Table 2 Components of foaf: person class in FOAF ontology Ontological definition

Ontological properties

Empirical properties (instances)

Agent

foaf: foaf: foaf: foaf:

foaf: foaf: foaf: foaf:

knows name email gender

nick first last title

include friend, co-worker, neighbour, family, etc. This ontology holds commutative and transitive dependencies. If A is a friend of B then B is also friend of A. A!B,B!A If A is friend of B and B is neighbour of C, then A is neighbour of C but may/may not be friend of C. A ! B;

4.3

B!C,A!C

SIOC

Evolution of SIOC: High usage of social sites has led to birth of various constraints like privacy, lack of interoperability, digital signatures and security threats. It becomes difficult to query and interlink social data. So, there must be some unified models/vocabularies to handle these issues. The best way to do this is to combine semantic web technologies with social paradigms in order to propose a new social semantic prototype system. Semantic web technologies include use of RDF, OWL, queries to encode social data into machine understandable format. For combining social and semantic paradigms, there is need to provide some framework for

279


259

Gagandeep Singh Narula Mr Gagandeep Singh Narula Smily 30b2f5faee64d084a0780f724390bff8c0486bf6 VishalJain 295863d5ad5b5ea880bde725f16cb21e2d1848fe UshaYadav b97f8105b2c7820ac793cda0b1865ab2d45a9c0c

Fig. 5 RDF code snippet

Fig. 6 Functions of SIOC

280

260

G.S. Narula et al.

Fig. 7 SIOC ontology model on group “GATE 2016 CSE”

modelling activities and integration of online community information. This framework can be achieved by using SIOC [16]. The functions of SIOC are given in Figs. 6 and 7. SIOC ontology model deals with various classes and their properties related to user groups created on websites. Consider there is group created by a user named as “GATE 2016 CSE”. The creator of this group is marked as Admin and other users are termed as group members. A single ontology class is represented as: SIOC: name of class. Inverse additional properties like reply, update are represented using sioct: reply and sioct: update respectively. Sioct is an acronym for SIOC types of module.

4.4

FOAF + SIOC

Both ontologies can be used in collaboration to enable a model for interoperability and portability of social data onto semantics. Different user profiles of same person on multiple sites can be combined in single RDF group data. In Fig. 8, Gagan is a person whose friends are Vishal and Usha. It is represented by foaf: knows property. Gagan has different profile account on sites like Facebook and LinkedIn with different names. Assume NarulaG is admin of group GATE 2016 CSE and Gaggs is one of the members in this group. But both belong to same person Gagan. So these accounts must be merged in single RDF data instead of having multiple accounts.

281


261

Fig. 8 FOAF + SIOC

5 Conclusion and Future Scope Although online services of social sites for publishing and accessing data has facilitated the process of information sharing among different groups but it has faced same challenges like predefined API’s for each application, lack of knowledge management, fake identity management and many more. So, the paper discusses the need to achieve interoperability in social web applications, access relevant social data and satisfying users to a great extent. Semantic web aims to abridge knowledge gap between humans and machines. Various issues of social sites have been addressed in the following paper. It also provides solutions to cope with these issues by connecting with social ontologies (FOAF, XFN) and interlinking data with online communities (SIOC). The work can be extended to develop generic interface that is compatible with semantic web technologies and standards. The interface can be used to publish and access data in RDF format which eventually leads to interconnection of social network with different data sources.

References 1. Bojars, U., Breslin, J.G., Peristeras, V. Tummarello, G., Decker, S.: Interlinking the social web with semantics. IEEE Intell. Syst. 23, 29–40 (2008) 2. Lee, F.: PRISM forum SIG on Semantic web. 12 May 2009 3. Mäkeläinen, S.I.: “Tiedonhallinta Semanttisessa Webissä”-seminar, University of Helsinki (2005) 4. Sloni, D.K.: Safe Semantic web and security aspect implication for social networking. IJCAES, June 2012

282

262

G.S. Narula et al.

5. Halpin, H., Tuffield, M.: A standards-based, open and privacy-aware social web. In: W3C Social Web Incubator Group Report, 6 Dec 2010. W3C Incubator Group Report. Retrieved 6 Aug 2011 6. Specia, L., Motta, E. Integrating Folksonomies with the Semantic web. In: Lect. Notes Comput. Sci. 624–639 (2007) 7. Xu, Z., Fu, Y., Mao, J., Su, D.: Towards the semantic web: collaborative tag suggestions. In: Collaborative Web Tagging Workshop at WWW 2006, Edinburgh, UK (2006) 8. Howard, R.: The virtual community: homesteading on the electronic frontier, p. 334. The MIT Press (2000) 9. Lee, T.B.: The Semantic web. Sci. Am. (2007) 10. Lee, T.B., Hendler, J., Lassila, O.: The Semantic web. Sci. Am. 34–43 (2001) 11. Mika, P.: Social networks and the Semantic web. SIKS Dissertation Series No. 2007-03, 18 Dec 2006 12. Lee, T.B.: The 1st World Wide Web Conference, Geneva, May 1994 13. W3C Semantic web activity. In: World Wide Web Consortium (W3C), 7 Nov 2011. Retrieved 26 Nov 2011 14. http://www.ldodds.com/foaf/foaf-a-matic.html 15. Manola, F., Miller, E.: RDF primer. In: W3C Recommendation, 10 Feb 2004 (2004) 16. Bresley, J.G., Dekkar, S., et al.: SIOC: content exchange and semantic interoperability between social networks. In: W3C Workshop on Social Networking, Barcelona, 15–16 Jan 2009 17. Singh, G., Jain, V.: Information retrieval (IR) through Semantic web (SW): an overview. In: Proceedings of CONFLUENCE 2012—The Next Generation Information Technology Summit at Amity School of Engineering and Technology, pp. 23–27, Sept 2012 18. Giri, K.: Role of ontology in Semantic web. DESIDOC J. Libr. Inf. Technol. 31(2), 116–120 (2011) 19. Jeremic, Z., et al.: Personal learning environments on the social semantic web. Semantic web-linked data for science and education. ACM DL 4(1), 23–51 (2013) 20. Singh, G., et al.: Ontology development using Hozo and Semantic analysis for information retrieval in Semantic web. In: IEEE Second International Conference on Image Information Processing (ICIIP) (2013)

Author Biographies Gagandeep Singh Narula received his B.Tech. in Computer Science and Engineering from Guru Tegh Bahadur Institute of Technology (GTBIT) affiliated to Guru Gobind Singh Indraprastha University (GGSIPU), New Delhi. Now, he is pursuing M.Tech. in Computer Science from CDAC Noida affiliated to GGSIPU. He has published various research papers in various national, international journals and conferences His research areas include Semantic Web, Information Retrieval, Data Mining, Cloud Computing and Knowledge Management. He is also a member of IEEE Spectrum.

283


263

Usha Yadav received her B.E. in Information Technology with Honours from Maharshi Dayanand University, Rohtak in 2009 and M.Tech. with Honours in Computer Engineering from YMCA University of Science and Technology, Faridabad in 2011. She is pursuing her Ph.D. in Computer Engineering from YMCA University of Science and Technology, Faridabad. She is currently working as a Project Engineer in CDAC, Noida and has three years of experience. Her areas of interest are search engines, social web and semantic web.

Dr. Neelam Duhan received her B.Tech. in Computer Science and Engineering with Honours from Kurukshetra University, Kurukshetra and M.Tech. with Honours in Computer Engineering from Maharshi Dayanand University, Rohtak in 2002 and 2005, respectively. She completed her Ph.D. in Computer Engineering in 2011 from Maharshi Dayanand University, Rohtak. She is currently working as an Assistant Professor in Computer Engineering Department in YMCA University of Science and Technology, Faridabad and has an experience of about 12 years. She has published over 30 research papers in reputed international Journals and International Conferences. Her areas of interest are databases, search engines and web mining.

Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing Ph.D. in Computer Science and Engineering Department, Lingaya’s University, Faridabad. Presently, He is working as Assistant Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE.

284

EasyOnto: A Collaborative Semiformal Ontology Development Platform Usha Yadav, B.K. Murthy, Gagandeep Singh Narula, Neelam Duhan and Vishal Jain

Abstract With an incessant development of the information technology, ontology has been widely applied to various fields for knowledge representation. Therefore, ontology construction and ontology extension has become a great area of research. Creating ontology should not be confined to the thinking process of few ontology engineers. To develop common ontologies for information sharing, they should satisfy the requirements of different people for a particular domain. Also, ontology engineering should be a collaborative process for faster development. As Social Web is growing, its simplicity proves to be successful in attracting mass participation. This paper aims in developing a platform “EasyOnto” which provide simple and easy graphical user interface for users to collaboratively contribute in developing semiformal ontology. Keywords Ontology development

Social Web Semantic web

U. Yadav (&) B.K. Murthy G.S. Narula CDAC, Noida, India e-mail: [email protected] B.K. Murthy e-mail: [email protected] G.S. Narula e-mail: [email protected] N. Duhan Department of Computer Engineering, YMCA University of Science and Technology, Faridabad, India e-mail: [email protected] V. Jain Bharati Vidyapeeth’s Institute of Computer Applications (BVICAM), New Delhi, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 B.K. Panigrahi et al. (eds.), Nature Inspired Computing, Advances in Intelligent Systems and Computing 652, https://doi.org/10.1007/978-981-10-6747-1_1 285

1

2

U. Yadav et al.

1 Introduction Ontology represents information specific to a particular domain. Ontologies comprise of logical theories that encode knowledge about a particular domain in a declarative way [1, 2]. Terms representing specific domain and the relations between those terms are well described in ontologies. Ontology has become an advanced technology in artificial intelligence and knowledge engineering, playing an increasingly important role in knowledge representation, knowledge acquisition and ontology application. However, ontology creation is known to be a very time-consuming and difficult process. To have common ontologies for information sharing, they should satisfy the requirements of different people. To ensure this, ontology engineering should be a highly collaborative process. Also it is difficult for domain experts to spend much time in developing ontology. Social Web applications are easy to understand and use for ordinary people. Social Web applications, like wikis and online communities, enable collaboration among people. Collaboration can help in establishing consensus or common understanding required for meaningful information sharing. People can socialize and enjoy on the Social Web. Therefore, the Social Web has proven to be very successful in drawing mass participation, and it is exploding with user-generated contents. Therefore, this papers to create a platform, named “EasyOnto” which aims to be as easy and as simple as possible to enable large number of users who might not have any technical knowledge about ontology can also contribute in creating concepts freely. Section 2 shows related works in developing ontology collaboratively. Section 3 describes problem identification. Section 4 describes detailed explanation of the system is given. Section 5 presented semiformal ontology model.

2 Related Works Depending upon the different set of requirement, various approaches and methodologies are proposed and implemented. The myOntology project [3] also uses wikis for community-driven horizontal lightweight ontology building by enabling general users to contribute. The myOntology project proposes to use the infrastructure and culture of wikis to enable collaborative and community-driven ontology building. It intends to enable general users with little expertise in ontology engineering to contribute. It is mainly targeted at building horizontal lightweight ontologies by tapping the wisdom of the community. Semantic wikis [4] assist in collaborative creation of resources by defining properties as wiki links with well-defined semantics. Collaborative knowledge contributed by various users is presented in more explicit and formal manner, thus enhancing the capabilities of wikis. Using simple syntax, semantically annotating

286

EasyOnto: A Collaborative Semiformal Ontology Development …

3

navigational links are encoded between resource pages which show the relations between them. Irrespective of degree of formalization and semantic capabilities, semantic wikis have few common features such as links annotation, context-aware presentation, improved navigation, semantic search and reasoning support. Dall’Agnol et al. [5] presented a methodology which requires three modules to be done for the ontology creation procedure. These are knowledge gathering, modelling of concepts and ontology evaluation. Web 2.0 allows social tagging process, and Social Web data are annotated and categorized by associating it with tags thus developing folksonomy. Creating and managing these tags collaboratively results in knowledge acquisition. After this phase, folksonomy tags are then converted into ontology elements by the ontology engineers, and then in further ontology evaluation procedure, it is validated. Buffa et al. [6] had expressed his views on recent research development in semantic wikis. “The use of wikis for ontologies” and “The use of ontologies for wikis” are the most used approach for semantic wikis. Many of the researches done over semantic wikis used the first approach in which the wiki acts as the front end of the collaborative ontology maintenance system. Semantic Media Wiki [7] is an extension to Media Wiki, which permits semantic data to be encoded within wiki pages. Extended wiki syntax helps in encoding the semantic into wiki text. Every article corresponds to exactly one ontological element (class or property). Every annotation in the article makes statements about this element. The links between the semantic wiki pages are referred to as “Relations”. All of this is converted into formal ontology. In paper [8], researchers developed a system to support primary work of ontology development between ontology developers collaboratively, without the need for domain expert to be present. Folksonimized ontology (FO) is proposed by Wang et al. [9], uses three 3E steps technique, namely extraction, enrichment and evolution. A new blended approach is presented which allows semantic capability of folksonomies is used by ontologies and vice versa. Visual review and visual enhancement tool help in implementing and testing the completed system.

3 Problem Identification There are number of issues while creating collaborative ontologies. Some of the issues are presented below: • Common ontology for information sharing should satisfy the requirement of different people. It is challenging to keep the ontology development process easy and simple so as to gain mass participation. • As each ontology engineer is provided by his own workspace to develop ontology in collaboration, handling concurrency issues among various participants is difficult.

287

4

U. Yadav et al.

• Creating ontology is very tedious and time-consuming job, so involving domain expert for longer time during the development is not feasible. Keeping in mind these issues, our objective is to develop very simple and easily understandable system which allows mass users to contribute fully in knowledge acquisition phase. Due to large participation, there will be faster development of knowledge acquisition phase, and it will also not involve the presence of domain expert. Thus, it proves to be very time-efficient methodology for ontology development. Once semiformal ontology belonging to any domain is formed, then domain expert can validate and refine the system.

4 Collaborative Semiformal Ontology Construction System “EasyOnto” is Social Web application system which allows any interested user to contribute in construction of ontology of any domain of his own choice. Now following stepwise easy and simple procedure, user will contribute in ontology acquisition as shown in Fig. 1. User can follow the sequence or hop to any of the steps presented. As multiple users are performing on the system, similar domains, instance, concepts, semantic annotations will be automatically merged. To show the complete working of the system, knowledge of a particular domain is chosen. Ontology development of “Vehicle” domain is shown. Instances, categories and relation specifying vehicle domain are chosen. At last the conceptual model developed by following the procedure can be used as a knowledge base for any application related to that domain. Step 1: Choose domain/add new domain On the main page, user will be shown many domains to choose from to develop ontology for that domain. User can select any domain of his own interest or add new domain. User can add new domain in the textbox provided and click on submit.

STEP 1 Choosing domain / add new domain

STEP 2 Add new Instance, Category, Relation

STEP 3 Add mappings between the instance and Category

STEP 5 Formal Ontology using Protégé

Semi formal Ontology Model

Add mappings Between Categories

Fig. 1 Collaborative semiformal ontology construction system

288

STEP 4 Allocate property to map instance and category

STEP 6 Add parent class


5

Fig. 2 Choose domain/add new domain

Figure 2 shows various domains to choose from, and “vehicle” domain is chosen by user. Now user needs to click start. Step 2: Add new instance, category and relation In this step, user can add instance, category or relations. User need to choose instance, category or relationship by drop-down menu and enter text in textbox provided and click on submit button. Once user chooses any drop-down option, corpus related to that option which is created by other users will be shown to the user. The terms being added will be displayed immediately in corpus. If user does not want to add, he can proceed further by clicking on “Continue” button. (i) Instance is any term or keyword which belongs to the domain and which can be classified under any category. Figure 3 shows that user selected to add new instances related to vehicle domain like swift desire, pulsar, i20. (ii) Category is any term which can classify any instances under them. Figure 4 shows that user entered two category “car” and “bike” which is shown in category corpus and click continue. (iii) Relation refers to the relationship which exists between terms and can be defined. Based on the domain, user can add any relation to relate instance and category. For example Fig. 3 Add new instance

289

6

U. Yadav et al.

Fig. 4 Add new category

Fig. 5 Add new relation

“HasModelName” relation that can be added which is used to relate “Eon” instance and “Car” category as “Car HasModelName Eon”. This is shown in Fig. 5. Step 3: Enrich with semantic annotation Semantic annotation refers to any meaning which user wants to associate with the term selected. User need to choose instance or category from their corpus. Once it is chosen, it can be semantically annotated by entering meaningful information about them. Figure 6 shows that with category related to vehicle_model, its meaning “model of a vehicle” is added as a semantic annotation. User can also choose any relation and provide range of its subject object. Figure 7 shows “HasModelName” relations enriched with “Vehicle_Model” as subject range and “vehicle_type” as object range. Step 4: Add relevant mappings between the instance and category Instance and categories can also be mapped easily. In this step, instance from instance corpus can be mapped to any category defined in category corpus. For an example, instance such as Eon, i20 can be mapped to “Car” category and pulsar, royal Enfield can be mapped to “Bike” category. User needs to select instance and

290


7

Fig. 6 Enrich term with semantic annotation

Fig. 7 Enrich relation with semantic annotation

category from corpus, and it will be shown in the space provided below the corpuses. One instance can be mapped to number of categories. For this, user can add as many categories for a specific instance by click on “Add more” button as shown in Fig. 8. Step 5: Allocate relation to map instance and category In this step, user can choose relation from relation corpus to appropriately map instance and category. User can pick instance, category and relation and drop them in the space provided. As shown in Fig. 9, various mappings are done. Step 6: Add parent class Depending upon the semantic annotation added in step 3, system can automatically assign parent class to some of the instance and category. It depends on the subject and object range added in respect with each relation.

Fig. 8 Add relevant mappings between the instance and category

291

8

U. Yadav et al.

Fig. 9 Allocate relation to map instance and category

Fig. 10 Add parent class

Suppose the example “Car has modelName Eon”. Now the subject range for relation “HasModelName” is vehicle_type and object range is “vehicle_model”. So system will autogenerate that “Car” will have vehicle_type as its parent concept and similarly “Eon” will have vehicle_model as its parent class as shown in Fig. 10. Now, if the user wants to add more parent class, he can add by choosing “Add more” button. Step 7: Add mappings between categories Relations between the categories can also be defined and hence mapping between them can easily be done using relation corpus. The user can pick any category to map other category depending upon some relation and drop down in the space provided. Suppose user adds a new relation in step 1 as “isRelatedTo”. In correspond to this relation, user can add mapping between categories. Figure 11 shows how mapping between categories is done, in which “Car” category is mapped to “Bike” category with the use of “RelatedTo” property. All the data generated till this step are stored in an organized manner in database which can easily be mapped to any formal ontology development tool. This platform is ready to be validated and refined by the domain experts online by going through the same web pages presented in this section. Once the model is validated, it is ready to be converted into a new ontology using an ontology generation tool, which will be presented next.

292


9

Fig. 11 Add mappings between categories

5 Semiformal Ontology Construction Model Following the above-explained procedure, system will able to produce semiformal ontology model which can then be validated and refined by refined by domain experts. Once it is validated, formal ontology can be generated either in RDF/OWL format. All the data entered by the user in a system are stored in database in a structured manner which can be easily mapped to ontology elements such as ontology classes, instances and properties. For generating formal ontology, java-based ontology development tool “Protégé” can be used. Various plug-ins are provided by Protégé to map database values to ontology elements. Converting semiformal ontology to formal ontology is beyond the scope of this paper. Figures 12 and 13 show the layout of formal ontology in Protégé editor. Representation of generated ontology can be written in OWL or RDF as follows: vehicle

Fig. 12 Ontology design on domain “Motor Vehicle”

293

10

U. Yadav et al.

Fig. 13 Defining instances of Maruti

rdfs:subclass of>

6 Conclusion and Future Scope A Social Web application, named “EasyOnto”, was presented which allows mass participation who does not know much about ontology or does not have expertise of ontology development, to easily create semiformal ontology. This informal ontology, after validation and refinement, can be further converted into formal ontology. This system allows accelerating ontology acquisition phase involves mass participation and removes the need of involvement of domain expert in initial phases of ontology development. Future work involves reusing existing ontology, giving out credit to user who contributes in developing ontology, so as to increase more participation and to improve scalability of the system.

294


11

References 1. Cimiano, P.: Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer, New York (2006) 2. Hitzler, P., Krötzsch, M., Rudolph, S.: Foundations of Semantic Web Technologies. Chapman & Hall/CRC (2009) 3. Siorpaes, K., Hepp, M.: myOntology: the marriage of ontology engineering and collective intelligence. In: Bridging the Gap Between Semantic Web and Web 2.0 (SemNet 2007), pp. 127–138 (2007) 4. Schaffert, S.: Semantic social software: semantically enabled social software or socially enabled semantic web? In: Proceedings of the SEMANTICS 2006 Conference, pp. 99–112, OCG, Vienna, Austria (2006) 5. Dall’Agnol, J.M.H., Tacla, C.A., Freddo, A.R., Molinari, A.H., Paraiso, E.C.: The use of well-founded argumentation on the conceptual modeling of collaborative ontology development. In: CSCWD, pp. 200–207 (2011) 6. Buffa, M., Gandon, F., Ereteo, G., Sander, P., Faron, C.: SweetWiki: a semantic wiki. J. WebSemant. 6(1), 84–97 (2008) 7. Krötzsch, M., Vrandečić, D., Völkel, M.: Semantic MediaWiki. In: Proceedings of the 5th International Semantic Web Conference (ISWC06), pp. 935–942, Springer, Berlin (2006) 8. Zaini, N., Omar, H.: An online system to support collaborative knowledge acquisition for ontology development. In: International Conference on Computer Applications and Industrial Electronics (ICCAIE 2011) 978-1-4577-2059-8/11. IEEE (2011) 9. Wang, S., Zhuang, Y., Hu; Z., Fei, X.: An ontology evolution method based on Folksonomy. In: 2014 International Symposium on Computer, Consumer and Control (IS3C), pp. 336, 339, 10–12 June 2014

295

Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)

Ontology Development using Hozo and Semantic Analysis for Information Retrieval in Semantic Web 1

Gagandeep Singh1, Vishal Jain2 and Dr. Mayank Singh3

B.Tech, Guru Tegh Bahadur Institute of Technology, GGS Indraprastha University, Delhi Research Scholar, Computer Science and Engineering Department, Lingaya‟s University, Faridabad 3 Associate Professor, Computer Science and Engineering Department, Krishna Engineering College, Ghaziabad, U.P 1 [email protected], [email protected], [email protected] 2

Representation documents. This paper is divided into five sections: Section 2 defines the IR technology. It describes IR Process and its Architecture, types of documents present on web. Section 3 gives brief overview of Semantic Web (SW) including its challenges, its technologies and it‟s comparison with World Wide Web (www). It gives a proposed methodology for building ontology with the help of Ontology Editors that makes use of Knowledgeable Representation languages like OWL, RDF, DAML+OIL etc. In Section 4, we have described about one of Ontology Editors named as Hozo. We have developed ontology on “Computer Appreciation” using Hozo. Section 5 gives information about Semantic Analysis in Ontology based information retrieval and search system. We have also implemented one of semantic search engines named as SenseBot in respective paper.

Abstract- We are living in the world of computers. This modern era deals with a wide network of information present on web. A huge number of documents present on web have increased the need for support in exchange of information and knowledge. It is necessary that user should be provided with relevant information about given domain. Traditional Information Extraction techniques like Knowledge Management Solutions were not so advanced that they can lead to extraction of precise information form text documents. It leads to the concept of Semantic Web that depends on creation and integration of Semantic data. The Semantic data in turn depends on building of Ontology. Ontology is considered as backbone of Software system. It improves understanding between concepts used in Semantic Web. So, there is need to build an ontology that uses well defined methodology and process of developing ontology is called Ontology Development.

II. INFORMATION RETRIEVAL Definition: - Information Retrieval (IR) is defined as process of identifying and retrieving unstructured documents containing the specific information stored in them. IR mainly focuses on retrieval of natural language text. A. Types of Documents Documents may be Structured, Unstructured, Semistructured or combination of them. (a) Structured documents: A document is said to be structured if it is written in well defined syntax and has components. Structured database is a Table where we have multiple attributes of user‟s record. It is shown below:

Keywords - Information Retrieval (IR), Ontology, Semantic Web (SW), Software Development Life Cycle (SDLC), Hozo I. INTRODUCTION Information Retrieval (IR) technology is major factor responsible for handling annotations in Semantic Web (SW) [1]. Traditional text Search Engines are not optimal for finding the relevant documents. It is produced by various approaches of ontologies and Semantic data. These purely text based Search Engines fails because of following reasons:  Improper style of natural languages: - There are chances that syntax of languages is not appropriate.  High level unclear concepts: - Some concepts which are used in document but present Search Engines can‟t find those words.  Timely Scenario: - Keywords matching is not used to find timely specified documents. The ability to translate knowledge from different languages is considered as major factor for building powerful Artificial Intelligent (AI) systems. Various AI research communities like Natural Language Processing (NLP). Ontology has changed the way of present web thus making it more expressive and full of Knowledgeable

TABLE1. STRUCTURED DATABASE

S.No 1. 2.

Address Canada USA

Id 129 128

IR engines can easily find out components in structured document due to its unique components. (b) Unstructured documents: These documents are written in natural languages. They do not have well defined syntaxes and positions where IR engines could find records satisfying user problems. Unstructured documents are randomly generated documents on any topic.

296 978-1-4673-6101-9/13/$31.00 ©2013 IEEE

Name Gagan Vishal

113


 No optimal Software or Hardware is provided. Following are aspects of Semantic Web (SW):  The Semantic Web (SW) leads to an environment where information and services can be interpreted semantically and are processed in machine understandable form.  SW relies on ontology as a tool for modeling an abstract view of real world and Semantic analysis of documents.  SW is an XML (Extensible Markup Language) application. B. Semantic Web (SW) vs. World Wide Web (www) Both Semantic Web (SW) and World Wide Web (www) are different from each other in various aspects which are described in the form of table as shown:

(c) Semi-structured documents: These documents share common structure and meaning of collection of textual documents. It is different from structured query in a way that they do not have same column for each row in table. B. IR Process and Architecture The procedure of retrieving information is as follows: Background knowledge is stored in form of ontology that can be used at any step. As we have ranked list of documents, they are indexed to form document in represented way. These documents produce ranked results which are given to admin. Admin solves user query which leads to transformation of user query.

Ranked •list of •documents

Text documents

Result

Admin

Solves user query

TABLE2. SEMANTIC WEB VS WORLD WIDE WEB

Semantic Web (SW)

Figure1: Information Retrieval Process [2]. The architecture of Information Retrieval Engine is as follows: It is based on ONTOLOGY BASED MODEL which represents the content of resource from given ontology. It has following parts:  OMC (Ontology Manager Component):- It is used by Indexer, Search Engine and GUI.  INDEXER: - It indexes documents and creates metadata.  SEARCH ENGINE  GUI supports user in query formation INDEXER

SEARCH ENGINE

GUI

OMC

Figure2: IR Architecture III.

CONCEPT OF SEMANTIC WEB AND ONTOLOGY The idea of Semantic Web (SW) [3] as envisioned by Tim Bermers Lee came into existence in 1996 with the aim to translate given information into machine understandable form. The Semantic Web (SW) is an extension of current www in which documents are filled by annotations in machine understandable markup language. Semantic Web (SW) uses Semantic Web documents (SWD‟s) that are written in SW languages like OWL, DAML+OIL.

World Wide Web (www)

1. It is an extension of www that will manipulate contents of information automatically without human involvement.

1. It is human focused web.

2. It discovers documents for gathering relevant information.

2. It discovers documents for people.

3. It deals with resources like pages, images, photos and people.

3. It only deals with media resources like web pages, photos, images.

4. SW holds different kinds of relations showing association among different kinds of resources.

4. Www only holds of hyperlinks between resources.

5. SW makes use of ontology that allows users to organize information into science of concepts.

5. It does not use concept of ontologies.

6. SW has formal semantics of context i.e. it uses web ontology languages for generating data.

6. It does not have formal semantics of context. The contents are machine readable but not machine understandable.

7. Complete information is accessible to Semantic Search Engines like Hakia.

7. Only few pages of information are accessible to traditional Search Engines like Google.

C. Semantic Web (SW) Technology SW technologies are listed below:  XML: - XML is extensible language that allows users to create their own tags to documents. It provides syntax for content structure within documents. XML Schema: - It is language for defining XML documents. XML document is a tree.  RDF: - It stands for Resource Description Framework. It is simple language to express data models which refers to objects and their

A. Challenges and Aspects of Semantic Web (SW) In spite of various efforts led by researchers, SW has remained a future concept or technology due to following reasons:  Complete SW parts have not been yet developed and the developed parts are so poor that they cannot be used in real world.

297

114




Applying constraints: - Constraints represent named relationships between domain and range class.  Verification: - After designing preliminary web ontology model, it is necessary that it should be tested for its correctness. (b) Design Phase: - The phase is backbone of Semantic Web. The physical structure of designed ontology is based on RDF model which is associated with three triplesSubject, Predicate and Object. Predicate: - All characteristics of resources and relationship are taken as Predicate. E.g. each student is assigned unique RollNo called as „HasRollNo‟. Subject: - All domain classes of characteristics and relationships of resources are taken as Subject. E.g. there are various average students each having unique URI, so they are grouped in ‘AvgStudentsGroup’. Objects: - Refers to Range class relationships. E.g. HasRollNo contains range class „NUMBER‟ which is literal. (c) Formalization Phase: - This phase is result of output of ontology obtained in design phase with the help of some tools.

relationships. These models are called RDF Models. RDF model consists of Resource, Property and Statement. Resource may be web pages or individual elements of XML document. Resource having its name is called as Property. Statement is combination of Resource and Property along with its value. E.g. Vishal plays Guitar. Object

Property

Resource RDF

Resource

Property

Statement

Figure3: RDF Model D. Ontology The term Ontology [4] can be defined in different ways as: Ontology is abbreviated as FESC (Formal, Explicit, and Specification of Shared Conceptualization) which is defined as: Formal: - It specifies that it should be machine understandable. Explicit: - It defines type of constraints used in model. Shared: - It means that ontology is shared by group. It is not restricted to individuals. Conceptualization: - It refers model of some phenomenon to identify relevant concepts of that phenomenon. Ontology is also defined as set of concepts and relationships arranged in hierarchical fashion. E. Ontology Development Ontology development [5] needs well defined methodology that must follow certain guidelines:  Ontology being developed should follow Software Engineering standards.  Ontology development strategy should be simple and practical. The phases that are being used in developing ontology also satisfy Software Engineering principles and thus called as Software Development Life Cycle (SDLC) phases. They are described below: (a) Specification Phase: - This phase has its few activities.  Domain Vocabulary definition: - It defines common name and attributes for domain concepts.  Identifying Resources: - A Resource is anything that has URI. So, if some concepts have number of instances, then they can be grouped into a class.  Identifying Axioms: - They are structures that represent behavior of concepts.  Identifying relationships: - Relations are defined within resources.  Identifying data characteristics: - Defines features of types of resources and their relationship.

Design Phase • Formalization Phase

Domain vocabulary definition Resource Identification Identifying Axioms

Specification Phase

Identifying Relationships Identifying data characteristics

Verification Applying constraints

Figure4: Ontology development phases [6] IV. HOZO- AN ONTOLOGY EDITOR Version used: - 5.2.36 beta Developed at Mizoguchi Lab, ENEGATE Co. Ltd. Hozo is different from other ontology editors in following aspects:  Its user friendly environment lets users to work easily on it.  Hozo has API named as HozoAPI ver 1.15 that accesses existing ontologies.  Slot definition option is available.  Inheritance information is clear and easily accessible by two options: One is from Super

298

115




Classes through is-a link. Other is from Class constraint. Hozo provides facility of correcting errors at time of validating ontology.

TABLE4. DATA CHARACTERISTICS Domain class Range class Computer Generation, String Components, Input and Output devices

Name HasName

Ontology Editor

Hastype

Ontology Manager (Dependencies)

Developer

Identifying data characteristics: -



Generation Components Generation

Hasyear



Ontology Server

String Number

Applying Constraints: -

TABLE5. RESOURCES RELATIONS ALONG WITH CONSTRAINTS Name Domain class Range class

Figure5: Hozo Components [7] .

Has H/w system

Computer

Components

Has S/w system

Computer

Components

Haskeyboard

H/w system

Input devices

HasPrinter

H/w system

Output devices

HasRing

Computer

Network System

HasBus

Computer

Network System



Validating: - In Hozo, there is a feature named Ontology Consistency Check feature that utilizes Hozo inference structure to verify whether ontology is developed properly or not. (b) Design Phase In context of Hozo, the output obtained from specification phase results in an ontology file that is considered as output of developed ontology. It is available in different formats like:  Text/HTML  XML  RDF  OWL

Figure6: Ontology editing screen 4.1 Case Study We have presented a case study to implement all the phases that are involved in ontology development methodology. This case study illustrates the ontology building on „Computer Appreciation‟ with the help of ontology editor called HOZO. (a) Specification Phase  Domain definition and resources identification: There are number of computers classified on basis of speed, each has its unique URI so they are grouped in sub class called „CLASSIFICATION‟ under Super class „COMPUTER‟. TABLE3. DEFINING CLASSES AND INSTANCES Concepts Instances Features of Predicate (Nodes) Resources Classification Home Purpose, Name, Hasname etc. computer, examples. PC etc. Generation First, Purpose, Name, Hasname etc. Second etc. examples. Components H/w Types, name, Hastype system, s/w examples, Hasname etc. system purpose. Input devices Scanner, Types, purpose Hastype and CPU, purpose. keyboard Output devices Monitor, Types, Hastype and Printer, examples, examples. speakers. purpose.





Figure7: Sample slice of ontology using RDF

Defining Axioms about Resources: - A computer has Hardware system and Software system. Hardware system is categorized into Input devices and Output devices.. Relationship Identifying and naming: - Relation between „H/w system‟ class and „Input devices‟ class is named as Haskeyboard.

Figure8: Sample slice of ontology using OWL (c) Formalization Phase This phase describes developed ontology pictorially whose source code was developed using RDF syntax.

299

116


where we have to search all possible paths between any two nodes in semantic graph. Finding associations between all possible paths in a graph is made possible by using path association algorithm. Steps are as follows: Finding possible paths between two classes at schema level

It shows Hozo user – interface for showing ontology hierarchy.

Comparing each path with other paths Figure9.1: Hozo user-interface

If there is intersection between two nodes Two paths meet at same node in schema Result is used to perform search at data level that determines associations between nodes. (d) Displaying Results: - It refers how semantic associations are being displayed.

Figure9.2: Hozo user – interface Another interesting feature of Hozo is that it produces map layout view of our developed ontology by using „Generate Map‟ function.

Provides user interface Ranking algorithm Semantic search engine uses languages like SPARQL, RQL Ontology Data conversion

Figure10: Map Generation using Hozo Data sources Data Sets Figure11: Components of Semantic enhanced ontology based search engine

V. SEMANTIC ANALYSIS The word Semantic Association Analysis means discovering complex and meaningful relationships between objects and these relationships are called as Semantic Associations. Following are aspects about Semantic analysis as:  It leads to generation of knowledge driven information from available data resources.  It uses semantic query framework for analyzing relationships by using various semantic query languages like SPARQL, RQL, and SERQL etc.  There are semantic search engines that analyses relationships and creates associations between resources. Examples include Swoogle, Weet-IT. A. Components of Semantic Analysis (a) Ontology development: - The process of developing ontology has became easier with the help of free, open source editors like Hozo that uses ontology languages like DAML+OIL, RDF etc. E.g. If we want to create ontology on travelling process, then we can import the concepts from existing ontology. We need not to develop from root node. (b) Dataset Construction: - Dataset is also called as Test Bed or Knowledge Base. It is collection of instances for creating ontology. (c) Semantic Association Discovery: - It uses Graph Traversal algorithm for determining semantic associations

B. Semantic Search Engine Structure The system uses a search engine called SenseBot that is designed to produce summary in response to keywords that are to be searched by user. About SenseBot: - It understands the meaning of search query and uses relevant results to generate the summary of valid results. Below figure illustrates the results of query from Semantic association analysis.

Figure12: Results of Query CONCLUSION This paper highlights the common problem of users of retrieving relevant information about their queries. It emphasis on the concept of Information Retrieval (IR) and various IR approaches for extracting knowledge driven documents from the cluster of interlinked web documents.

300

117


proceedings of the 2nd International Semantic Web Conference (ISWC), pp 453-468, 2003.

Here introduces the concept of ontology and their role in Semantic Web. Since, ontology is considered as backbone of software system, so it should be well designed without creating any ambiguities. This paper also shows proposed methodology for ontology development using SDLC phases. The concept of Semantic Web has revolutionized emerging technology by extracting information from various web documents and integrating them in machine form. We have developed ontology on Computer Appreciation using one of ontology editors named Hozo. In this paper, research issues in Semantic analysis have been described that plays vital role in Semantic web. It enables meaningful relations between set of entities and finds all possible paths by using Graph traversal algorithm. It also describes architecture of Semantic enhanced ontology based search engine.

[16]. L. Stojanovic, “Migrating data intensive web sites into the Semantic web”, In Proceedings of the 17th ACM symposium on applied computing (SAC), ACM Press, pp 1100-1107, 2002. [17]. Aleman-Meza B, Arpinar I.B, “A Context aware Semantic association Ranking”, Technical Report LSDIS Lab, Computer Science, Univ of Georgia, pp 03-010, 2003. [18]. Dayal U, Kuno H, “Making the Semantic Web Real”, IEEE Data Engineering Bulletin, Vol.26, No.4, pp 4-7, 2003. [19]. Kaushal Giri, “Role of Ontology in Semantic Web”, DESIDOC Journal of Library & Information Technology, Vol.31 No.2, March 2011, pp 116-120 [20]. Urvi Shah, Tim Finin and Anupam Joshi, “Information Retrieval on the Semantic Web”, Scientific American, pp 35-45

About the Authors

REFERENCES

Gagandeep Singh Narula has completed B.Tech from Guru Tegh Bahadur Institute of Technology (GTBIT) affiliated to Guru Gobind Singh Indraprastha University (GGSIPU), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval.

[1]. Urvi Shah, James Mayfield, “Information Retrieval on the Semantic Web”, ACM CIKM International Conference on Information Management, Nov 2002. [2]. Gagandeep Singh, Vishal Jain, “Information Retrieval (IR) through Semantic Web (SW): An Overview”, In proceedings of CONFLUENCE-The Next Generation Information Technology Summit, 27-28 September 2012, pp 23-27. [3]. Accessible from T. Berners Lee, “The Semantic Web”, Scientific American, May 2007


[4]. Berners Lee, J. Lassila, “Ontologies in Semantic Web”, “Scientific American”, May 2001, pp 34-43. [5]. Helena Sofia Pinto, Joao P. Martins, “Ontologies: How can they be built? Knowledge and Information Systems, pages 441-464, 2004. [6]. Amjad Farooq and Abad Shah, “Ontology Development Methodology for Semantic Web System”, Pakistan Journal of Life Social Sciences, Vol.6 No.1, May 2008, pp 50-58. [7]. Kozaki K, “Hozo: An Environment for Building Ontologies”, In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW), pp 213-218, October 2002. [8]. Deligiannidis L, Sheth A, “Semantic Analytics Visualization”, Intelligence and Security Informatics Proc. ISI-2006, pp 48-59, 2006. [9]. J. Mayfield, “Ontologies Engineering Review, 2007

and

text

retrieval”,

Knowledge

Dr. Mayank Singh have done his M. E in software engineering from Thapar University and PhD from Uttarakhand Technical University. His Research areas are Software Engineering, Software Testing, Wireless Sensor Networks and Data Mining. Presently He is working as Associate Professor in Krishna Engineering College, Ghaziabad. He is associated with CSI, IE (I), IEEE Computer Society India and ACM.

[10]. Cristani, R. Cuel, “A Survey on Ontology Creation Methodologies”, International Journal on Semantic Web and Information Systems, Vol.1 No.2”, 2005. [11]. Uschold, M. And King, “Towards A Methodology for Building Ontologies”, IJCAI-95 Workshop on Basic Ontological Issues in Knowledge Sharing, Montreal and Canada, 2006. [12]. Uschold, M. And Gr Ninger, “Ontologies: Principles, Methods and Applications”, Knowledge Engineering Review, Vol.11 No.2, pp 93-137. [13]. Updegrove A, “The Semantic Web: An interview with Tim Berners-Lee”, 2005. [14]. S. Staab, R. Studer and Y. Sure, “Knowledge Processes and Ontologies”, IEEE Intelligent Systems Vol. 16, No.1, pp 2-9, 2001 [15]. Kozaki K, R. Mizoguchi, “An Environment for Distributed Ontology Development Based on Dependency Management”, In

301

118

Ontology Based Pivoted normalization using Vector – Based Approach for information Retrieval Vishal Jain1 and Dr. Mayank Singh2 1

Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad Associate Professor, Computer Science and Engineering Department, Lingaya’s University, Faridabad 1 [email protected], [email protected] future values of other variables while Description ABSTRACT finds useful patterns describing the given data. An ample amount of documents present on web puts the users in state of dilemma. Users get confused about relevance of documents. Relevance means how Initial Target Data Pre-Processed Final Data Model closely the given query matches large number of Data (P1) (P2) Data (P3) (P4) (P5) documents. Many information extraction techniques Figure 1: KDD Process [2] are used for extracting documents but they all are in Building Ontology needs attention of domain expert vain. The paper deals with the problem of that represents concepts and relations between them classification, analyzing and extraction of web for a given domain. There are many algorithms used documents by using one of information extraction for extracting and discovering knowledge from methods called Ontology Based Web Content Mining structured data like Naïve Bayes, K-Means etc. The Methodology. We have evaluated proposed proposed methodology builds ontology for a given methodology in two specific domains- weather domain by using phases of data mining like Data domain (web pages containing information about preparation, Data Mapping, extracting knowledge weather forecasting and analysis) and Google TM from mapped data etc. Then, classification algorithm collection (web pages containing news). is used for writing generated ontology expressed in The proposed methodology is procedural i.e. it OWL and XML languages. follows finite number of steps that extracts relevant There are various uses of Ontology: documents according to user’s query. It is based on • Used for knowledge sharing and reuse. principles of Data Mining for analyzing web data. • Can improve understanding between Data Mining first adapts integration of data to concepts. generate warehouse. Then, it extracts useful • It is useful in Semantic Web that is information with the help of algorithm. The task of information in machine form. representing extracted documents is done by using • Some search engines use ontology for Vector Based Statistical Approach that represents finding relevant pages related to given each document in set of Terms. query. The paper is divided into following sections: Section Keywords 2 gives information about following concepts: Data Mining, Ontology, Ontology Web Content • Domain Ontology Mining Methodology, WORDnet, Vector Based • Stages of Ontology Based Web Content Approach Mining Methodology • Increasing accuracy of classification of web 1. INTRODUCTION documents using WORDnet. Data Mining is called as Knowledge Discovery in Section 3 describes representation schemes of Databases (KDD) [1]. It is multi-level field i.e. it documents extracted during Ontology Based Phrase includes different areas like Database Systems, Extractor by using Statistical Vector Based Information Retrieval (IR), Machine Learning etc. Approach. Section 4 concludes about given paper. Prediction and Description are considered as two goals of Data Mining where Prediction involves use of some variables or records in database to predict 2

302

It takes data from given database located at back end server. Data Preparation is included in order to completely understand the meaning of data and lists all tables and attributes that are present in database. Data Mapping states that data is to be represented according to some algorithm. Mapper is used for data mapping. Mapper converts Input data into normal format so that it satisfies user’s requirements in Building Classification algorithm phase.

2. CLASSIFYING AND ANALYSING WEB DOCUMENTS Many information extraction methods and techniques were used but they all are in vain. So we need more intelligent system to gather useful information from huge amount of data. Problem: - To find meaningful and informative documents with help of Data Mining algorithms and then interpreting mining results in expressive way. Solution: - Ontology Based Web Content Mining Methodology Approach involved: - The proposed methodology uses concept of Domain Ontology [3]. Domain Ontology organizes concepts, relations and instances into given domain. This approach is used because it resolves synonyms and reducing confusion among agents

2.3 Gathering of Information about Documents It involves use of Ontology Based Phrase Extractor. Its specification is as follows: • Input = Web documents + Domain Ontology and User Abstraction level (K) • Output = Documents associated with vector terms (ti) and weights (wi). Process: - Extractor prepares XML file containing instances of ontology with their relationships in hierarchy level. In WORDnet, phrase collection means relevant phrases with their associated concepts of ontology. To extract concepts, we use disambigutive function dis (t) that shows semantically concept for terms (ti) based on given topic. Phrase Extractor as name suggests scans the phrases and as it finds some relevant matter, it refers to related concepts. Each web document is represented as vector of < term ti>, pairs which is extracted from Phrase extractor module.

2.1 Ontology Web Based Content Mining Ontology Based Web Content Mining represents conceptual information about given domain. It shows document representation, extraction of relevant information from text documents and creates classification models. This methodology is followed that uses the ideas and principles of Data Mining to analyze web data. Creation of Ontology for given domain

Gathering of information about documents

Includes

Database (Input)

2.4 Classification Algorithm This ontology building algorithm [4] is written on basis of decision tree as follows: INPUT OUTPUT (i) Decision Tree Ontology

Building Classification Algorithm

Data preparation Data Mapping

(ii) Distinct Nodes (iii) Distinct Tree branches (iv) Target Attribute (v) GetBranches ()- It is function to get all branches having given node (vi) GetLeafBranch ()- A function to get branch of leaf node (vii) GetClass ()- To get class that shows tree branch (viii) CreateIndividual ()To create instance of leaf node

Classification of new documents

Figure 2: Stages of Ontology Web Based Content Mining 2.2 Building Ontology for given domain Importance: - Since traditionally domain experts were not so intelligent that they could represent complete knowledge related to query. So, there is need for updating knowledge frequently. It leads to building of ontology.

303

Algorithm is as follows: Begin For each node N of decission nodes Class C = new n (owl: Classs) C.Id = N.nname DatatypePrroperty DP = new n (owl: DataatypeProperty) DP.Id = N..name + “Valu ue” // It will addd property off Dp.AddDoomain (C); Class c in its child node// // For each branch B of GettBranches (N) Dp.AddDoomain (B.GetC Class (C)) End for End Working: - The algorithm m generates onntology writtenn in OWL laanguage. It crreates class forr each distinctt node and assigns them with unique ID and name. Each childd node of speccific parent noode is assignedd name and value v by acquiiring property of parent nodee using DataatypeProperty DP D class. All the braanches includin ng specific nodde are returnedd using GettBranches () function. For generatingg ontology, traverse t decisio on tree from rooot node till wee get all uniqque nodes.

2.4.11 Experiment This experiment is i done to im mprove accuraacy in classsification of web documeents using Domain D Ontoology. We testeed system in tw wo domains- weather w foreccasting and GoogleTM. There is sppecific abstrraction level (k) ( related to each domainn. Our systeem is tested foor both domaiins before andd after abstrraction. We have usedd WORDnett for experrimental evaluuations.

10 00 8 80

80 72

6 60

76 56

87 72

55 65

4 40 A C C U R A C Y %

Before Abstracction After Abstracction

2 20 0 C4.5

Baays Naïve SV VM N Net Bayes ALGORITHMS

Resuults- The abbove experim ment shows that classsification Dataa Mining alggorithms like C4.5, Bayees Net and SVM are im mproved by using WOR RDnet. We can c classify simple wordss and expreessions of diffferent datasets. Naïve Bayyes is algorrithm that shows accurrate result before b abstrraction and shows s less acccurate result after abstrraction 3. REPRESENT TATION OF O EXTRAC CTED DOC CUMENTS USING VE ECTOR BA ASED APP PROACH Vector based appproach is one of stattistical approoaches. Such approaches break b documennts or givenn query into TE ERMS. Terms are words thatt occur in giiven query andd are extractedd automaticallyy form docuuments. These words are couunted and meaasured statisstically. Our aim is to remoove different forms f of same word. E.g. play is wordd entered by user. u It has various v form ms like played, playing etc. User U can specify fy only one form f of above word w in searchhing query. 3.1 Relation R betw ween Term Veectors in Doccument Spacce The terms can bee phrases, n-ggrams etc. We will repreesent documennt as set of terrms. Take OR Ring of thesee terms. We geet set of terms that t represents entire docuument called ass Space.

2.4 Increaasing accuraccy of classificcation of web b documents using WORD Dnet About WO ORDnet: - WO ORDnet was innvented by A. Miller [5].. It is a large lexical databaase of English. Nouns, Veerbs, Adverbs, Adjectives com mbine into setss of synonym ms. Each of synonyms represents conceptt and solves queries throug gh search and lexical results. In this papper, we have ussed WORDnett version 3.1. Itt contains 1555287 synonym ms. WORDnett is one of bestt example off ontology used d in experimennts

304

Inference: - If all documents are relevant, then idf is zero. We can say that for distinguishing relevant and non relevant documents, the terms in document must be different from given topic so that they can be used for comparing with other documents. Why idf is multiplied by tf? It is done so that good descriptor terms have more importance than bad terms. Good terms are those that occur in small number of documents while Bad terms are those that occur in large number of documents.

T1 OR T2 OR T3 OR …..Tn = Space A document consisting set of terms (space) is called Document Space. Numeric weights are assigned to each term in document that estimates effectiveness of document comparing them with other documents. Each term has different weight in same document. The weights assigned to each term in document Di are expressed as coordinates of document i.e. Di (x,y). So, it is called as Vector from origin to point defined by weight of terms. Y

3.3 Normalization of Term Vectors Weights are normalized according to variable document size. Here we will describe Normalization of term frequency (tf). In this, tf is divided by maximum term frequency tfmax i.e. tf/tfmax. It is defined as frequency of term that occurs mostly in documents. So, we generate factor that lies between 0 and 1. This kind of normalization is called as Maximum Normalization. It is given as: [0.5 + (0.5 * tf /tfmax)] where tf varies from 0.5 to 1. Effect of tf: - The importance of term in given document depends on its frequency of occurrence as compared to other terms in same document. Terms are variables. They can change anytime. Drawback: - Since, normalization factor depends only on frequency of documents. The problem is that terms having higher weights can replace terms with lower weights. E.g. A document is about computer design. It includes various components, hardware software. Let us consider hardware is highly weighted term that occurs six times in document. It will occur most because it is used in building computer. Then, the frequency of this term will replace all other terms by factor of 3. Solution: - Logarithmic Term Frequency In this, we take natural log plus constant i.e. log (tf) +1. Its normalization factor does not depend on maximum term frequency (tfmax). It reduces the effect of term with high frequency such that two terms Tf1 & Tf2 >0 then [Log (tf2) +1 / log (tf1) +1] < tf2 / tf1.

P’ (x,y)

P X Term Space: - Each document is represented as dimension. It has some coordinates (weights). Each point is considered as vector. If term is not found in document, then it is assigned as zero weight. Representation of terms in matrix form Combination of document space and term space is represented by Document - by Term Matrix. In this, each row is document Di (in term space) and each column is term (in document space). Di x Term Representation of Query in document space A query entered by user is a set of terms having same weights assigned to it. Query may be in natural language also. In this, it is processed like document which includes removal of redundant words. If query contain terms that are not in document, then it represent dimensions in document space. 3.2 Assignment of weights to terms Weight of terms means importance of term i.e. how relevant it is. Weights are assigned by special scheme called as Term Frequency * Inverse Document Frequency (tf * idf) Term Frequency (tf): - It defines number of terms occurred in document. So, it varies from one document to another. Inverse Document Frequency (idf): - It means how many times the given term is distributed in document. It gives probability of terms occurred in document. idf= ln N/n where N= number of documents n = number of relevant documents

3.3.1 Normalization by Vector Length In this, every component of vector is calculated. Each component is divided by Euclidian length of vector.

305

Use of pivoted normalization increases probability of retrieving longer documents although longer documents have both relevant s well as non relevant terms.

R=xi+yj Euclidian Length of R =√ x2 + y2 Cosine Normalization = x / √ x2 + y2, y / √ x2 + y2 = x / √r2cos2 + r2sin2 It is called cosine normalization because normalized vector √cos2 + sin2 has length = 1. It is written as n^=1. Cosine normalization reduces the effect of single term with high frequency by combining it with other low weighted terms. Since vector length is function of all vector components i.e. tdf *idf weights. So, weight of high frequency term is reduced by idf factor. Cosine normalization takes into account the weights of all terms in a given document. It is done for short documents rather than longer documents because short documents are about single topic relevant to given query. For every document or query, there is a stage when all the terms that are retrieved are also relevant. It is indicated by probability of relevance = Probability of retrieval. It has led to development of Correction Factor. It is factor that maps old normalization function (Cosine normalization) into new function. Concept of Pivot Normalization This correction factor rotates the old normalized function clockwise around crossover point (point where probability of relevance = Probability of retrieval) so that normalization values below that point are greater and values above it are lesser. The crossover point is called PIVOT. It is called as Pivot Normalization.

4. CONCLUSION The paper presents Ontology Web Based Content Mining methodology that helps in classification, identification and extraction of large number of documents present on web. It follows certain number of steps for generating ontology. We have conducted an experiment using WORDnet. The main credit of this work goes to domain ontology in representing documents. Use of WORDnet leads to improvement in classification of web documents with the help of synonyms as it has large collection of similar words related to particular search. Ontology Phrase extractor produces web documents that consists of multiple pages with multiple categories. Each web document is represented as vector of < term ti>, pairs. They are represented using one of statistical Information retrieval (IR) approaches known as Vector- based Approach. The paper also represents terms in document space and normalized them using concept of Pivoted normalization.

ACKNOWLEDGEMENT I, Vishal Jain would like to give my sincere thanks to Prof. M. N. Hoda, Director, Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi for giving me opportunity to do Ph.D from Lingaya’s University, Faridabad.

Old = new K=1 (Slope)

New Old

REFERENCES [1]. U. Fayyad, R. Uthuruswamy, “Data Mining and Knowledge discovery in databases”, “Communications of the ACM, 39(11)”, 1996, pages 1-15 [2]. Vishal Jain, Gagandeep Singh, Dr. Mayank Singh, “Implementation of Multi Agent Systems with ontology in Data Mining”, “International Journal Of Research In Computer Application & Management (IJRCM), Vol. No. 3, Issue No.1 ISSN 22311009”, January 2013, pp 111-117 [3]. Litvak, M. Last, Kisilevich, “Improving Classification of Multi-Lingual Web

Pivot normalization focuses on correcting the document normalization. Before pivoted normalization, old normalization = new normalization. When it is rotated clockwise around pivot, then new normalization = Slope * old normalization + constant. Here slope should be less than 1. Putting arbitrary value of pivot for both old and new normalizations, we get final result i.e. [Pivoted normalization = Slope * old normalization + (1 – slope) * Pivot].

306

Dr. Mayank Singh has completed his M. E in software engineering from Thapar University and PhD from Uttarakhand Technical University. His Research area includes Software Engineering, Software Testing, Wireless Sensor Networks and Data Mining. Presently He is working as Associate Professor in Krishna Engineering College, Ghaziabad. He is associated with CSI, IE (I), IEEE Computer Society India and ACM.

Documents using Domain Ontologies”, “ECML/PKDD-2005”, October 2005 [4]. Abd-Elraham Elsayed, Samhaa Ram, Mahmod Rafea, “Applying data mining for ontology building”, “ACM Conference 26 (10)”, 2003. [5]. Miller, G.A., Beckwith, Gross, “Wordnet: An Online Lexical Database”, “International Journal of Lexicography”, 2004, pages 235-244 [6]. John McCrae, Mauricio Espinoza, “Combining Statistical and semantic approaches to translation of ontologies and taxonomies”, “In Proceedings of SSST-5, Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation”, 2011, pages 116–125 [7]. R. Tous and J. Delgado,” A Vector Space Model for Semantic Similarity Calculation and OWL Ontology Alignment”, “DEXA 2006” , pp. 307–316 [8]. Zahra Eidoon, Naseer Yazdani, “A Vector Based Method of Ontology Matching”, “IEEE Third National Conference on Semantics, Knowledge and Grid”, 2007 [9]. A. Doan, J. Madhavan, A. Halevy, “Ontology Matching: Machine Learning Approach”, “Handbook on Ontologies in Information Systems, S. Staab and R. Studer”, May 2004, pages 397-416 [10]. Edengrass, “A Survey on Information Retrieval Methods”, 2000. About the Authors Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing PhD from Computer Science and Engineering Department, Lingaya’s University, Faridabad. Presently he is working as Assistant Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE.

307

Proceedings of “Wilkes100 - Second International Conference on Computing Sciences”

Ontology Based Web Crawler to Search Documents in the Semantic Web Vishal Jain and Dr. Mayank Singh 1

Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad 2 Associate Professor, Krishna Engineering College, Ghaziabad Email: [email protected], [email protected]

Abstract. The term Semantic Web (SW) given by Tim Berners Lee is considered as vast concept within itself. Semantic Web (SW) is defined as collection of information linked in a way so that it can be easily processed by machines. It is information in machine form. It contains Semantic Web Documents (SWD’s) that are written in RDF or OWL languages. They contain relevant information regarding user’s query. Crawlers play vital role in accessing information from SWD’s. A Crawler is a type of software that systematically browses documents and extracts information from them for purpose of Indexing. So, there is need to develop some prototype systems that perform task of Information Retrieval (IR) using crawlers. There are several prototype systems like OWLIR, SWANGLER, and SWOOGLE etc. The paper illustrates the outline of SWOOGLE which is one of crawler based indexing and retrieval system for finding SWD’s. It describes relations that are derived from RDF and OWL languages and lists Ontologies thus providing complete description of given problem. Keywords: Semantic Web (SW), Ontology, SWOOGLE, Semantic Web Documents (SWD’s)

1. Literature Survey The earlier version of SWOOGLE was ver1.0. It has facility of advanced database search query. Due to its capability of retrieving SWD’s, SWOOGLE has emerged as Semantic Search Engine with its advanced versions like SWOOGLE 2005 ver 2.1, SWOOGLE 2007, ver 3.1. The process of finding SWD’s from input keywords is very challenging task. When SWOOGLE did not come into existence, then documents are retrieved using conventional Information Retrieval (IR) approaches and traditional search engines. These engines are not so intelligent that they could retrieve relevant documents. They retrieves only ordinary text documents instead of markup documents. The result is lots of documents are produced that may be relevant or irrelevant. Some researchers tried to use Knowledge Management (KM) solutions in complex environments that are major phase in emergence of SW and Ontology. With existence of SW, there arises SWD’s that is combination of text documents as well as structured documents written in ontology languages. At that time, there were no crawlers and users find Ontologies by combining results retrieved from documents with the help of Ontology editors. These editors represent concept and relationships between terms that matches given query. Then, Web crawlers came into work. It includes Google Bot Crawler and yahoo. They are able to retrieve relevant documents and satisfies user’s query but are unable to deliver Ontologies and generate metadata. 2. Introduction Semantic Web (SW) came into existence due to problem in conventional search engines that dissatisfies users by retrieving inadequate and inconsistent results. The documents retrieved by conventional search engines are like horse of different colors. These engines work on predefined standard terms that work in centralized environment, thus accessing standard Ontologies. With advent of SW and Ontology, users are able to develop new facts and use their own keywords/terms in different environments. With use of ontology, user can perform following tasks: (a) Users can use Interface Description Languages (IDL) and services for different environments. IDL means defining new data objects and their relations. ©Elsevier Publications 2013. 308

Vishal Jain and Dr. Mayank Singh

(b) Users can communicate with different agents using shared ontology like FOAF (Friend of a Friend). Semantic Web (SW) [1] is combination of SWD’s that are expressed in ontology languages (RDF, OWL). Ontology [2] refers to categorization of concepts and relationships between terms in hierarchical fashion Although SWD’s retrieves relevant information because they are characterized by semantic methods and ideas, but it is tedious job to find URL’s of SWD’s. So, there is need to develop some crawler based prototype systems that focuses on extraction of metadata for each SWD’s. The paper is categorized into following sections: Section 2 makes readers aware of SWOOGLE, its significance including its architecture. Section 3 describes information in favor of SWOOGLE describing how it is better than other prototype systems and Ontology repositories. It also gives information about types of SWD’s, use of crawlers and helps in finding Ontologies by using Ontology Rank algorithm that identifies whether given document is Semantic Web Ontology (SWO) or Semantic Web Database (SWDB). Section 4 provides current status of searching through SWOOGLE via pictorial representation. 3. Outline OF SWOOGLE SWOOGLE [3] is treated as crawler-based indexing prototype system that retrieves documents based on set of classes, properties and methods and produces URI’s matching the query. 3.1 Why SWOOGLE? What is its significance? As we know, SW is a web that works like HTML documents. These documents are different from SWD’s because HTML documents follows conventional search engines which are unable to extract required information in short and simple way. Keeping this in mind, we have developed a prototype SW search engine called SWOOGLE for extracting SWD’s that is used by users and software agents. With the help of SWOOGLE, we can “AEQ” RDF and OWL documents where A stands for Access, E stands for Explore and Q stands for Querying. Querying includes we can clear our misconceptions by putting query. 3.2 Defining and Analyzing SWOOGLE SWOOGLE is crawler based indexing and retrieval system for SW. Indexing means generation of metadata i.e. it extracts metadata for each SWD and gives relationship between those documents. Documents are indexed by some Information Retrieval (IR) system which either uses character N-grams or URI’s (Uniform Resource Identifier) as keywords to find relevant documents. It provides web interface where user can ask query by submitting URL of either SWD or web page directly. Analysis: After we have developed Swoogle, it is found to be analyzed on three activities which are listed below: Helps in searching appropriate Ontologies.  Searching Data Instance  Characterize Semantic Web We will discuss them one by one. (a) Searching appropriate Ontology: - Conventional Search engines failed many times to find required events for particular task. Swoogle helps in finding Ontologies as it allows user to query for documents. (b) Finding Data Instance: - Swoogle allows user to query SWD’s with keywords that uses Classes/Properties. (c) Characterizing Semantic Web: - Collection of data by researcher’s leads to characterization of SW. User can answer any question about ontology

309

Ontology Based B Web Craawler to Searchh Documents in the Semanticc Web

3.3 SWOO OGLE Architeecture Four compponents includee in its architeccture. They are as follows: (a) SWD’ss discovery (b) Metadaata creation (c) Analysiis of data (d) Interfacce All four coomponents worrk independenttly and interactt with each other through dataabase.  SW WD’s discovery ry: - It discoverrs Semantic Web Documents and keeps up to data informaation about objjects.  Metadata M creatiion: - It gives SWD S cache andd generates meetadata at both semantic and syntactic s level..  Data D Analysis: - It uses cachee SWD’s and metadata m to prroduce analysiss with the helpp of IR analyzer and SW WD analyzer.  Innterface: - It prrovides data serrvices to SW community. c

Web

U ’s URL

Cache

Relatioonal

Bassic

Analytical

D Analysis Data

IR Analyzer

F Figure1: SWO OOGLE Architecture [4] 4. How Sw woogle is betteer than other Prototype P Systtems and Onttology Reposittories? There are many prototyp pe systems thaat are designedd to solve userr queries like OWLIR O (Ontollogy Web Lannguage and Inform mation Retrievaal), SWANGLE ER and SWOO OGLE. OWLIR is one of prototy ype systems thaat takes text doocuments as Innput argumentss. It does not diirectly consider RDF or OWL doocuments as in nput. It annotattes text docum ments with SW markup, produuces results andd then indexes them. To find SW WD’s with thee help of OWL LIR, we have to build Custoom Indexing System. S After it, i we can pass both structured as well as text documents. Soo, obviously it is better but noot optimal systeem.

©Elsevieer Publications 2013.

3

310


SWANGLER directly considers RDF documents encoded in XML language and produces documents that are suitable to given query. It can become optimal system but it fails due to following problems: (a) XML namespace is not valid to search engines like Google. (b) Tokenization rules are designed for natural languages. SWOOGLE is termed as optimal crawler based prototype system that maintains interoperability between SWD’s. As Semantic Web contains RDF documents, so SWOOGLE directly takes RDF documents as input and lists Ontologies that matches query. It can either use N-gram or URI refs as keywords to find relevant documents. OWLIR and SWANGLER encode only 1 triple for each term. If there are more than 1 triple, they are replaced by single URI. SWOOGLE can analyze lot of SWD’s with lot of triples. It captures more metadata on classes and properties to support huge collection of documents. So, SWOOGLE is better and optimal than other prototype systems. Comparison with Ontology Systems: There is difference between SWOOGLE and other SW engines and query systems. Ontology Based Annotation Systems like SHOE, CREAM, and WEBKB focuses on creating metadata of online documents without seeing whole documents. Their ontology standards are different from SWD’s versions. These systems simply store RDF documents rather than solving them and querying them. So, they are not capable of handling millions of documents because their own Ontologies are not suitable for SWD’s. 4.1 Types of Semantic Web Documents (SWD’s) Semantic Web Document (SWD) is a document written in SW languages like OWL, DAML+OIL etc that is online and easily accessible to all web users. SWD is only means of information exchange in SW. (a) Semantic Web Ontologies (SWO’s): - A document is said to be SWO when required portion of given statement defines new classes and properties or inherit the definitions of terms used by other SWD’s. (b) Semantic Web Databases (SWDB): - A document is said to be SWDB when it does not define new terms. It matches given query with terms that are stored in database. 4.2 Use of Crawlers in Finding SWD’s The simplest way to find SWD’s is to use conventional search engines but they will not return relevant results. We have developed set of crawlers like Google Crawler, Focused Crawlers for finding SWD’s. Google Crawler: - It searches URL’s using Google search engine. It uses extensions like rdf, owl, daml. To make our search more expressive, we have introduced use of keywords. Searching URL’s depends on Google Crawler (Google Bot), Google Indexer and Google Query Processor. The process follows as:  Web pages downloading are done by a web crawler named GoogleBot. It is a web crawling robot that retrieves pages on web and hands them off to Google Indexer. GoogleBot has many computers attached to it that requests and fetches web pages. Each web page has an associated ID number called docID. When given URL is entered, it is assigned a given docID.  There is URL Server that sends list of URL’s to be fetched by crawler. Fetched web pages are sent to Store Server. Store Server compresses these pages and stores them in repository.  Google Indexer makes documents uncompressed. It removes all bad links in every web page and stores important information. It ignores some punctuation marks as well as converting all letters to lowercase. After Indexer, there is Google Query Processor which retrieves stored documents and return search results with the help of Doc Server.

311

Ontology Based Web Crawler to Search Documents in the Semantic Web

Store Server

GoogleBot/ Crawler

URL’s Server List of URL’s

Web Pages

Web Server sends Query to Indexer And tells relevant pages

Indexer

Repository (contains fetched web documents)

Doc Server (The query travels to Doc Server which retrieves stored documents) Google user gets search results

Figure 2: Illustration of Google Bot/ Crawler [5] Focused Crawler: - It finds documents within given website. It uses extensions like jpg, html to reduce complexity. JENA2 is based on SWOOGLE that analysis content of SWD’s first and then produces them. 3.3 Finding Ontologies using Ontology Rank Algorithm For finding Ontologies, we should aware of language features and RDF statistics of SWD’s that are described below: SWOOGLE Basic Metadata: - It contains symbols and semantic features of SWD’s.

Language Features

RDF Statistics

Ontology Annotations

Figure3: Categories of Basic Metadata (a) Language Features: - It lists features of SWD’s and their properties. It includes:  Encoding: It has three types of encoding used in SWD’s i.e. XML/RDF, N-Triples and N3.  Language: - It shows SW languages that are OWL, RDF, RDFS, DAML  OWL Species: - It shows language species of SWD’s written in OWL language only. Its species are OWLLITE, OWL-DL, and OWL-FULL. (b) RDF Statistics: - It focuses on how SWD’s define new classes and properties and individuals. There are three things namely: Class (C), Property (P) and Individuals (I). RDF statistics shares properties related to nodes of RDF graphs of SWD’s. A node is defined as Class if and only if it is not empty node and should be instance of some rdfs: Class (rdfschema). A node is termed as Property iff it is not an empty node and should be instance of rdf: Property. An Individual is a node which is instance of any user defined class. Ontology Rank Algorithm: It ranks all the Ontologies that are returned by SWOOGLE while finding SWD’s. Ranking means till how much extent we can use particular ontology. Let (gag) be one of SWD. Let C (gag), P (gag) and I (gag) be Class, property and Individual of given SWD. 5

©Elsevier Publications 2013.

312


Then Ontology Ratio for given SWD is calculated as: R (gag) = |C (gag) + P(gag)| / |C(gag) + P(gag) + I(gag)| If R (gag) =0, then our SWD is pure SWDB else it is pure SWO. (c) Ontology Annotations: - It shows properties that describes SWD as ontology. Its properties are label, comment and version info. 4. Illustration of SWOOGLE This section describes the layout of SWOOGLE version 3.1 used in year 2007. It allows users to specify any string arbitrarily in order to find relevant SWD’s in response to that particular string. SWOOGLE analyses whole document and generates only relevant parts of document in ranked order like URL’s, terms, description and namespaces about documents.

Figure3: SWOOGLE Start-Up Page We have searched string Economic Crisis. So, it will return SWD’s that matches these keywords in ranked order. We will get separate documents for keyword economic and for keyword crisis. It is shown below:

Figure 5: SWOOGLE query result From above screen shot, we have seen that our first SWD is encoded in N3 and its Ontology ratio is 0.61 Second document is encoded in RDF/XML with ontology ratio of 0.97. Related namespaces for second SWD is shown below:

313

Ontology Based Web Crawler to Search Documents in the Semantic Web

Figure 6: Namespaces about given SWD The current version of SWOOGLE returns following statistical information regarding number of SWD’s retrieved, number of triples generated and other parameters. We can say that SWOOGLE can handle huge collection of documents.

Figure 7: SWOOGLE statistical information 5. Conclusions The paper has given us way of extracting Semantic Web Documents (SWD’s) by using one of crawler-based prototype indexing and retrieval system named SWOOGLE. It generates metadata for given SWD’s and lists Ontologies related to given keywords. It is better than other prototype systems like OWLIR and SWANGLER that requires building of Custom Indexing Module. They use their own ontology standards which are not suitable for SWD’s.

7

©Elsevier Publications 2013.

314


OWLIR and SWANGLER treat markup as structured information and perform results over it. SWOOGLE stores metadata about RDF documents in its database so that it can retrieve SWD’s based on Classes(C), Properties (P) and Individuals (I). SWOOGLE is designed to work with all SWDB’s and is better than current web search engines like Google because Google work with natural languages only. Acknowledgement I Vishal Jain give my sincere thanks to Prof. M. N. Hoda, Director, BVICAM, New Delhi for giving me opportunity to do P.hD from Lingaya’s University, Faridabad. References [1]. Accessible from T.Berners Lee, “The Semantic Web”, “Scientific American”, May 2007 [2]. Berners Lee, J.Lassila, “Ontologies in Semantic Web”, “Scientific American”, May (2001) 34-43 [3]. Tim Finin, Anupam Joshi, Vishal Doshi, “Swoogle: A Semantic Web Search and Metadata Engine”, “In proceedings of the 13th international conference on Information and knowledge management”, pages 461-468, 2004. [4]. Gagandeep Singh, Vishal Jain, “Information Retrieval (IR) through Semantic Web (SW):An Overview”, “In proceedings of CONFLUENCE 2012- The Next Generation Information Technology Summit at Amity School of Engineering and Technology”, September 2012, pp 23-27. [5]. M. Preethi, Dr. J. Akilandeswari, “Combining Retrieval with Ontology Browsing”, “International Journal of Internet Computing, Vol.1, Issue-1”, 2011. [6]. T.Finin, J. Mayfield, A.Joshi, “Information retrieval and the semantic web”, “IEEE/WIC International Conference on Web Intelligence”, October 2003. [7]. U.Shah. T.Finin and A.Joshi. “Information Retrieval on the semantic web”, “Scientific American”, pages 34-43, 2003 [8]. Stojanovic, N. Studer, R. Stojanovic, “An approach for ranking of query results in the Semantic Web”, “The Semantic Web – ISWC”, 2003, pp 500-516 [9]. Swati Ringe, Nevin Francis, Palanawala, “Ontology Based Web Crawler”, “International Journal of Computer Applications in Engineering Sciences, ISSN 2231-4946, Vol. II, Issue III”, September 2012. [10]. Goetz Graze, “Query Evaluation techniques for large databases”, “In Proceedings of ACM COMPUTING SURVEYS”, 2003 About the Authors Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing PhD from Computer Science and Engineering Department, Lingaya’s University, Faridabad. Presently he is working as Assistant Professor in Bharati Vidyapeeth’s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE.

Dr. Mayank Singh has completed his M. E in software engineering from Thapar University and PhD from Uttarakhand Technical University. His Research area includes Software Engineering, Software Testing, Wireless Sensor Networks and Data Mining. Presently He is working as Associate Professor in Krishna Engineering College, Ghaziabad. He is associated with CSI, IE (I), IEEE Computer Society India and ACM.

315

Information Retrieval (IR) through Semantic Web (SW): An Overview Gagandeep Singh1, Vishal Jain2 1

B.Tech (CSE) VI Sem, GuruTegh Bahadur Institute of Technology, GGS Indraprastha University, Delhi 2 Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad 1 [email protected],[email protected]

ABSTRACT A large amount of data is present on the web. It contains huge number of web pages and to find suitable information from them is very cumbersome task. There is need to organize data in formal manner so that user can easily access and use them. To retrieve information from documents, we have many Information Retrieval (IR) techniques. Current IR techniques are not so advanced that they can be able to exploit semantic knowledge within documents and give precise results. IR technology is major factor responsible for handling annotations in Semantic Web (SW) languages and in the present paper knowledgeable representation languages used for retrieving information are discussed.

(a)

Complete Semantic Web (CSW) has not been developed yet and the parts that have been developed are so poor that they can’t be used in real world.

(b)

No optimal software or hardware is provided.

“SW is not technology, it is philosophy” [1]. It is defined as collection of information linked in a way so that they can be easily processed by machines. From this statement, we conclude that SW is information in machine form. It is also called Global Information Mesh (GIM) [2]. This is also known as framework for expressing information.

II.1 PRINCIPLE OF SW Both, Semantic Web (SW) and World Wide Web (www) are entirely different from each other. SW is machine understandable while www is machine readable. Current SW languages like Resource Description Framework (RDF) do not work with www.The following. Fig. 1 describes the structure of semantic web.

Keywords: Semantic Web (SW), Information Retrieval (IR), Ontology, Hybrid Information Retrieval (HIR).

1. INTRODUCTION We view the future web as combination of text documents as well as Semantic markup. Semantic Web (SW) uses Semantic Web documents (SWD’s) that must be combined with Web based Indexing. Current IR techniques are not so intelligent that they are able to produce semantic relations between documents. Extracting information manually with the help of Extensible Markup Language (XML) and string matching techniques like Rabin Karp matcher has not proven successful. To use these techniques normal user has to be aware of all these tools.

Mesh of information + Language for expressing that information = Semantic Web

Fig. 1: “Semantic Web principles”

II.2 SEMANTIC WEB ARCHITECTURE Architecture consists of following parts:

So, keeping this in mind we have moved to concept of Ontology in Semantic Web. It represents various languages that are used for building semantic web (SW) and increase accuracy.

2. SEMANTIC WEB (SW) In spite of many efforts by researchers and developers, SW has remained a future concept or technology. It is not practiced presently. There are few reasons for this which is listed below: 316

•

Uniform Resource Identifier (URI) and UNICODE: Semantic Web contains URI’s to represent data in triples based structures with the help of syntaxes designed for particular task.

•

UNICODE supports intellectual text of style.

•

RDF and rdf schema: - RDF processes metadata and provides interoperation to work together between applications that exchange machine understandable information on web while Rdf schema is RDF vocabulary description language and represents

Information Retrieval (IR) through Semantic Web (SW): An Overview Ƈ 115

relationship between group of resources. There is RDF model (Figure2) described below representing properties and their values.

3. INFORMATION RETRIEVAL (IR) IR involves identifying and extracting relevant pages containing that specific information according to predefined guidelines. There are many IR techniques for extracting keywords like NLP based extraction techniques which are used to search for simple keywords. Then we have Aero Text system for text extraction of key phrases from text documents.

RDF RESOURCE

PROPERTY

STATEMENT

III.1 IR PROCESS and ARCHITECTURE Fig. 2: “RDF Model”

How we retrieve information? The answer to this question explained below.

Resource may be web pages or individual elements of XML document. Resource with its name is called Property. Statement is combination of Resource and Property and its value.

Background knowledge is stored in form of ontology that can be used at any step. As we have ranked list of documents, they are indexed to form document in represented way.These documents produce ranked results which are given to admin. Admin solves user query which leads to transformation of user query.

E.g.:- Gagan plays football. In this sentence, Gagan is object, plays are his property and football is resource. Football •

plays

Gagan

Ontology: - Ontology is abbreviated as FESC which means Formal, Explicit, specification of Shared Conceptualization [3].

Ranked ͻůŝƐƚŽĨ ͻĚŽĐƵŵĞŶƚƐ

Formal specifies that it should be machine understandable. Explicit defines the type of constraints used in the model. Shared defines that ontology is not for individual, rather it is for group. Conceptualization means model of some phenomenon that identifies relevant concept of that phenomenon.

dĞǆƚ ĚŽĐƵŵĞŶƚƐ

ZĞƐƵůƚ

ĚŵŝŶ

^ŽůǀĞƐ ƵƐĞƌ ƋƵĞƌǇ

Fig. 4: “Retrieval of Information”

Inference: - This is defined as producing new data from existing one or to reach some conclusion, e.g. Adios is a French word which is replaced by Good bye that is understandable by user.

III.2 ARCHITECTURE It is based on ONTOLOGY BASED MODEL [5] that represents the content of resource from given ontology. It has following parts:

/ŶĨĞƌĞŶĐĞ

• Data

KŶƚŽůŽŐǇ

• Z&ΘƌĚĨƐ

• •

hZ/

OMC (Ontology Manager Component):- This is used by Indexer, Search Engine and GUI. INDEXER: This indexes documents and creates metadata. SEARCH ENGINE GUI supports user in query formation.

Data /EyZ

'h/

hŶŝĐŽĚĞ

^Z,E'/E

[4]Fig. 3:“SW Architecture”

KD

Fig. 5: “Architecture”

317

116 Ƈ Confluence 2012 – The Next Generation Information Technology Summit

4. HIR (HYBRID INFORMATION RETRIEVAL) Since the standard IR approaches used can create differences among the documents by adding additional features. So, to avoid differences and make complete document we have used HIR approach and documents developed using HIR, called HYBRID documents.

•

IR: - gathering information about documents for query.

•

Q & A: Ask simple questions and answers.

•

Complex Q & A

OWLIR works with two retrieval engines- HAIRCUT and WONDIR. HAIRCUT: - It is abbreviated as Hopkins Automated Information Retrieval (HAIR) for Combining Unstructured Text (CUT). It is used for specifying required query terms. This is also language modeling approach to find similarity between documents.

HYBRID DOCUMENTS = HTML Text documents + Semantic markup

HIR = Semantic approach WONDIR: - It is abbreviated as Word or N-gram based Dynamic Information Retrieval Engine (DIRE). This is written in Java and provides basic indexing, retrieval and storage facilities for documents.

Fig. 6: “Hybrid Documents”

A. Components of HIR Table 1: “Components of HIR”

B. Owlir Architecture

Standard Text IR

Semantic IR

It is described as follows:

It contains Vector Space Model, Indexing and Markup similarity

It contains Inference, Ontology mapping, Markup relations.

(a)

Information Extraction (IE):- This part is included in its architecture in order to make text documents to semantic web documents and this can be done using IE tools. Approach involved: - OWLIR uses Aero Text system which is used to extract text of key phrases and elements from free text documents.

Markup/ Text relationship: It defines information about how many times the Semantic markup occurs in data. It converts text query into Semantic markup. Markup Similarity: It allows ranking results together with text documents.

(b) Inference System (IS): - OWLIR uses metadata information of text to find semantic relations. These relations will decide scope of search and provides effective responses. OWLIR functionality is based on DAML Jess KB where Jess is Java Expert System Shell.

5. PROTOTYPE SYSTEMS After using several approaches for retrieving information from documents, we have developed three prototype systems that make use of knowledgeable representation languages for solving queries. They are (i) OWLIR, (ii) SWANGLER and (iii) SWOOGLE. These are discussed below.

DAML Jess KB facilitates reading and interprets DAML+OIL pages and gives reason to users for using that information.

A. OWLIR DAML Jess KB + Domain specific rules= Effective Search Engine

Problem: - When we want to retrieve text and semantic documents, there is not surety that we get relevant ones and traditional text search engine uses text only. Then what is the way to retrieve SW documents.

Fig. 7: “Search Engine” Solution: - OWLIR

C. Flow of Information in OWLIR

Analysis: - OWLIR is an acronym for Ontology Web Language and Information Retrieval. It is a system for retrieval of text as well as semantic markup documents in languages like DAML+OIL, RDF and OWL.OWLIR follows three processes which are used to access both semantic web pages and text documents.

Documents are processed by extraction tools like Aero Text. It produces DAML+OIL [6] markup. Then, RDF triples are generated from DAML+OIL pages. Additional RDF triples are extracted from web and forms Inference Engine (IE). 318

Information Retrieval (IR) through Semantic Web (SW): An Overview Ƈ 117

b: Collection of data by (c) Characterizing Semantic Web researchers’ leads to characterizaation of SW. User can answer any question about ontolog gy. Event Information

AeroText

DAML+OI L/RDF

RDF triples

Web

Inference Engine

Is Swoogle Better than Ontolog gy Repositories? Yes, it is better. Ontology reposiitories like DAML+OIL, Schema Web do not automaticallly discover documents. They requires user to submit URL’s. U They store RDF documents rather than solving them m and querying them.

Fig. 7: “Information flow””

D. Swangler

E.2 Swoogle Architecture

This is one of prototype systems and mettadata engine. It annotates RDF documents encoded in XM ML and produces documents that are compatible with Gooogle and other engines. Google treats SWD’s as text files. But this creates following two main problems: (i) XML name space is not valid to searrch engines like Google. (ii) Tokenization rules are designed for naatural languages.

Four components include in its architecture. a They are as follows: (i) SWD’s discovery (ii) Metadata creation (iii) Analysis of data (iv) Interface All above four components work w independently and interact with each other throug gh database. These are detailed below: (i) SWD’s discovery: It disccovers Semantic Web Documents and keeps up to data in nformation about objects. (ii) Metadata creation: It gives SW WD cache and generates metadata at both semantic and synttactic level. (iii) Data Analysis: It uses cache SWD’s and metadata to produce analysis with the help off IR analyzer and SWD analyzer. (iv) Interface: It provides data services to SW community.

Solution: - SWANGLING is used to enrich SWD’s with extra RDF statements. RDF files are modifiied and then put on web for Google to discover. When itt is discovered, Google indexes the contents using swanglerr.

E. Swoogle We have developed prototype search engine called SWOOGLE [7] to facilitate the developmeent of Semantic Web. With the help of Swoogle, we can A Access, Explore and Query (AEQ) RDF and OWL documennts.

Web

Swoogle is crawler based indexing and retrrieval system for Semantic Web. It extracts metadata for eeach discovered document and gives relationships betweeen documents. Documents are indexed by IR system whichh uses character N-gram as keyword to find relevant documeents.

Cache URL’s

E.1 Analysis After we have developed Swoogle, it iis found to be analyzed on three activities which are listedd below:(i) Helps in searching appropriate ontologies. (ii) Searching Data Instance. (iii) Characterize Semantic Web.

Basic

R Relational Analyticall

Data Analysis

These are discussed below in details: (a) Searching appropriate Ontology: - Conventional Search engines failed many times to find required events for particular task. Swoogle helps in findinng ontologies as it allows user to query for documents.

IR Analyzer

(b) Finding Data Instance: - Swoogle allow ws user to query SWD’s with keywords that uses Classes/Prooperties.

Fig. 8: “Swoogle Arcchitecture” 319

118 Ƈ Confluence 2012 – The Next Generation Information Technology Summit

[6] [7]

6. CONCLUSIONS The emphasis in the paper is on the concept of Semantic Web and various approaches used for retrieving information from web. Web contains millions of documents and to retrieve relevant information from them, we have gone through various prototypes which act as search engine. Information Retrieval over collection of those documents offers new challenges and opportunities. We have presented framework for integrating search that supports Inference engine. We can use Swangling technique to enrich SWD’s to text documents. Use of OWLIR confirms as semantic markup within documents can be used to improve retrieval performance or not. Swoogle is desired to work with all SWD’s. It is better than current web search engines like Google that work with natural languages only.

[8]

[9]

[10] [11] [12]

[13]

7. REFERENCES [1] [2]

[3] [4] [5]

Accessible from T.Berners Lee. “The Semantic Web”. “Scientific American”, May 2007. Urvi Shah, James Mayfield,” Information Retrieval on the Semantic Web”, “ACM CIKM International Conference on Information Management”, Nov 2002. http://www.mpiinf.mpg.de/departments/d5/teaching/ss 03/xmlseminar/talks/CaiEskeWang.pdf. Berners Lee, J.Lassila, “Ontologies in Semantic Web”, “Scientific American”, May (2001) 34-43. David Vallet, M.Fernandes, “An Ontology-Based Information Retrieval Model”, “European Semantic Web Symposium (ESWS)”, 2006.

[14]

[15] [16] [17]

[18]

320

http://www.daml.org/ontologies. Tim Finin, Anupam Joshi, Vishal Doshi, “Swoogle: A Semantic Web Search and Metadata Engine”, “In proceedings of the 13th international conference on Information and knowledge management”, pages 461468, 2004. U.Shah. T.Finin and A.Joshi. “Information Retrieval on the semantic web”, “Scientific American”, pages 34-43, 2003. T.Finin, J. Mayfield, A.Joshi, “Information retrieval and the semantic web”, “IEEE/WIC International Conference on Web Intelligence”, October 2003. http://www.semanticwebsearch.com. http://www.semanticweb.info/schemaweb. Berners-Lee and Fischetti, “Weaving the Web: The Original Design of the World Wide Web by its inventor”, “Scientific American”, 2005. Kiryakov, A. Popov, L. Manov, “Semantic annotation, indexing and retrieval”, “Journal of Web Semantics”, 2005-2006. N.Shadbolt, T.Berners Lee and W. Hall, “The Semantic Web revisited”,”IEEE Intelligent Systems” 2006. J. Mayfield, “Ontologies and text retrieval”, “Knowledge Engineering Review”, 2007. J. Carroll, J. Roo, “OWL Web Ontology Language”, “W3C recommendation”, 2004. J. Kopena, A.Joshi, “DAMLJessKB: A tool for reasoning with Semantic Web”, “IEEE Intelligent Systems”, 2006. Jeremy, Lan, Dollin, “Implementing the Semantic Web Recommendations”, “In proceedings of the 13th international conference World Wide Web”, 2004.

Comparative Study of Search Engine and Semantic Search Engine: A Survey

Comparative Study of Search Engine and Semantic Search Engine: A Survey 1

Vishal Jain1, Gagandeep Singh2 and Dr. Mayank Singh3

Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad B.Tech (CSE) VII Sem, GuruTegh Bahadur Institute of Technology, GGS Indraprastha University, Delhi 3 Associate Professor, Computer Science and Engineering Department, Lingaya’s University, Faridabad 1 [email protected], [email protected], [email protected] (b) By submitting a given query through multiple IR Abstract We all are aware of two letter word named methods. Information Retrieval (IR) which is nothing but a Traditional text search engines fails for finding process of retrieving or gathering information from a optimal documents because of following reasons: given document or a file. The concept of Information • Improper style of natural language: - These Retrieval has gained much height for many years engines are not capable of understanding because of large collection of information that is complex way of writing documents. available in form of documents on Internet and to • High level unclear concepts: - Some concepts arrange and retrieve utilized words from them is which are included in document but present cumbersome task. The information can be structured, search engines can’t find those words. unstructured or semi-structured. This paper consists of four sections. In section 1, we have defined brief • Semantic Relations: - We can’t find relevant description about information retrieval and its goals documents for word specified in part of in world of Semantic Web. Semantic Web uses document. E.g. If we have searched for Beer, Semantic Web documents (SWD’s) for exchanging then it will not find type or part of beer. information and solving queries. Section 2 describes Information Retrieval [1] mainly focuses on retrieval types of documents involved in IR process. Section 3 of unstructured documents (natural text language deals with how the concept of Ontology and Semantic documents). These documents may include videos, Web came into existence and tells about importance photos and audios etc. IR addresses retrieval of of prototype systems. It also deals with need of documents from an organized well defined huge mapping of Semantic Web in order to process collection of documents available on net which may information between ontologies. In Section 4; we be email, maps, news etc. Various goals of IR are have described difference between Traditional described below: Search Engines and Semantic Search Engines via • IR aims on retrieving unstructured documents. Case Study. . • IR engine may produce collection of relevant documents to user according to specified query Keywords entered by user. Information Retrieval (IR), Ontology, Semantic Web • IR engine also arranges documents according to (SW), Semantic Mapping. its rank which involves Page Rank algorithm. If a document ‘A’ has more effective results than 1. Introduction document ‘B’, then ‘A’ will organized first. It Information Retrieval is the retrieval of information has been discussed in further sections. or data, either structured or unstructured. It retrieves in response to query statement which may be 2. Types of documents in IR Process unstructured or structured also. Unstructured Query is Documents may be structured, Unstructured, Semilike sentence which is written in common structured and combination of all of them. understandable language while structured query is in Structured Documents: - A document is structured form of expression which is combination of equations if it is written in well defined syntax and has its and operands. IR deals with fusion of streams of components. Structured database is table where we output documents produced from multiple retrieval have multiple attributes of user’s record. All rows methods. They combined to form single ranked have same columns. stream which is shown to user. There are two methods for solving queries: (a) By submitting a given query to multiple document collections. 2

57 321

IR engines can find out the records and its contents with the help of well defined syntax. E.g. In above table. If we need to find address of all students, then every Address column within Students table has same meaning. So, we can say that syntax of structured database will have unique component and it is always possible for IR engine to find all records of given component. Table1: “Structured Table named Student” S.No

Name

Address

Phone Number

1

X

Rohini

88888888

2

Y

Dwarka

33444444

section is quite difficult. So, we can say that IR documents structure is still metadata.

3. Concept Ontology

of

Semantic

Web

and

Evolution of Ontology and Semantic Web: - In past years, there was great demand of Knowledge Management (KM) Solutions for performing various tasks like Dataflow Management, Web Conferencing, Decision support systems etc. But these conventional KM solutions were unable to become part of organizations because of centralized structure which is a storehouse of central knowledge only. After this inconvenience, concept of Semantic Web and Ontology was introduced. In spite of many efforts led by researchers and developers, Semantic Web (SW) has remained a future concept or technology. It is not practiced presently. There are few reasons for this as follows: Complete SW has not been developed yet and the parts that have developed are not advanced enough to act in real world applications. No optimal Software or Hardware is provided. A complete definition of Semantic Web defines that it works like web but in SW, documents are filled by annotations in machine understandable markup language. These annotations will provide metadata about documents as well as machine interpretable statements that specify some particular meaning of context. “SW is not technology, it is philosophy”. It is about how we think about it. [2].

Unstructured documents: - They are written in natural languages. They don’t have well defined syntax and position where search engine could find records with given meaning. It is random collection of documents that do not assume whether they have same topic or not. The query for retrieving may contain characters of both structured and unstructured contents of documents as illustrated by example. The documents are about lists of book shops and that contains topics by Indian authors only. In this sentence, unstructured part is “about” and structured part is “that contains topics by Indian authors only” Semi- structured documents: - They share common structure and meaning of collection of textual documents. It is different from structured documents in a way that they don’t have same columns for each row in a table. Their records are different. Unstructured documents with structured headers: - These type of documents may be email messages, book, sms etc where content of body of document is in natural language text (unstructured) but their header is structured i.e. it contains metadata which is data about document instead of information content of document. E.g. In book, metadata is Author, Title, Publisher etc. These documents have importance in a way that search engines can easily find documents written by given author or from date of publishing.

Semantic Web has defined its documents called Semantic Web documents (SWD’s) which are characterized by semantic methods and ideas. They are written in Semantic Web languages like OWL, DAML+OIL that is available online and easily accessible to all web users. SWD’s are different from HTML documents as HTML documents follows conventional search engines which are no able to extract required information in short and simple way. SWD’s are means of exchanging information in Semantic Web. We can also find rank of documents on basis of Page Rank algorithm. Types of SWD’s: - There are two types of Semantic Web documents- Semantic Web Ontology (SWO) and Semantic Web Database (SWDB).

Structure of a document as various documents: IR documents have different way of structure i.e. they have large number of documents, sub documents. In a book, there is structure like Index, Contents, Preface which contains paragraphs that consists of sentences and sentences are made of words. Search engines can find given section of book easily but finding the contents of given topic in that

322


•

Table2: “Comparison between SWO and SWDB” [3] Semantic Web Ontology (SWO) •

•

A document is said to be SWO when required portion of given statement defines new classes and properties or inherit the definitions of terms used by Semantic Web documents. It does not introduce individuals.

Semantic Web Database (SWDB)

•

A document is said to be SWDB when it does not define new terms

•

It introduces individuals and makes statements about them.

Encoding: - It shows three types of encoding used in SWD’s. They are RDF/XML, N-Triples and N3. Language: - It has four Semantic Web languages and shows relationship among them. It includes OWL, RDF, RDFS, and DAML. OWL species: - It shows language species of SWD’s written in OWL language only. There are three species of OWL which are OWL-LITE, OWL-DL and OWL-FULL.

RDF Statistics: - It shows properties related to nodes of RDF graph of SWD’s. Our goal is to focus on how SWD’s define new classes and properties and individuals. There are three things namely CPI (Classes, Properties and Individuals). A node is treated as class if and only if it is not an anonymous node. A node is treated as Property if it is an instance of rdf schema (It is RDF vocabulary description language to represent relationship between group of resources). Individual is node which is an instance of any user defined class. RDF Statistics is obtained by converting SWD’s into RDF graph. Let (gag) be a given SWD. We can put it into graph. Then C (gag), P (gag) and I (gag) are Classes, Properties and Individuals respectively. So, Ontology Ratio R (gag) is obtained as:

Ontology [4] is defined as concept which deals with occurrence of events, their instances and user defined relations among concepts. It represents background knowledge on Semantic level. Semantic level consists of semantic entities, their concepts, relations instead of simple words which are used in thesauri. 3.1 Prototype Systems Importance: - The prototype systems are developed to avoid problems occurred in Ontology languages like OWL (Ontology Web Language), RDF (Resource Description Framework) etc. These problems are as follows:• In Ontology based model, quality of results were not as much as accurate due to lack of information. • It requires large cost in creating ontology language. • There are limited facts and rules related to ontology. The prototype systems include OWLIR, SWANGLER and SWOOGLE. We have discussed about SWOOGLE [5] and its analysis in this section. Swoogle: - It is crawler based indexing and retrieval system for Semantic Web. It extracts metadata for each discovered document and gives relationship between documents. With the help of Swoogle, we can find appropriate ontologies, search data instance and characterize Semantic Web. It has three types of metadata: (a) Basic Metadata: - It considers syntactic and semantic features of SWD’s. This metadata has three categories as follows: Language Features RDF Statistics Ontology Annotations.

R (gag) = |C (gag)| + |P (gag)|/| |C (gag)| + |P (gag)| + |I (gag)| | If R (gag) is 0, then It is Pure Semantic Web Database (SWDB) else it is Semantic Web Ontology (SWO). Ontology Annotation: - It shows properties that describe SWD’s as ontology. It has following properties: • Label (rdfs : label) • Comment (rdfs : comment) • Version Information (owl : version info) (b) Relational Metadata: - It is used to define semantic relations among SWD’s. We need relational metadata since it is very difficult to analyze relations at RDF node level. So, Swoogle emphasis on relations among SWD’s which also elaborate RDF node level relations. (c) Analytical Metadata: - It includes classification of SWO/SWDB and ranking algorithm. 3.2 Page Rank Algorithm This algorithm was introduced by Google. It was designed to find the measure of probability that how many times the random user visits the given web

Language Features: - It lists properties and features of SWD’s. They are given below:

59 323

page or Semantic Web documents (SWD’s). Page Rank algorithm [6] is defined as:

(G) = 0.003. In this way, we can find rank of SWO’s using search engine named Swoogle.

PR (g) = PR direct (g) + PR link (g). Direct Method: - PR (g) = (1-d) where d is damping factor. Method via links: - PR (g) = d [(PR (W1) + ……. PR (Wn))/(C (W1) + ……… C (Wn))] where W1…….Wn are web documents that link to g.

3.3 Semantic Mapping on Ontologies Importance: - There is need of mapping in ontologies because Ontology gives complete description of a given problem that can be communicated among people and application systems. Mapping of ontology leads to definition of Taxonomy which is defined as concepts of ontology arranged in hierarchical manner. These concepts identify the data entities of interest. Each concept has its own attributes and relationships. Semantic Mapping: - It is defined as process of finding solution to given problem with the help of relative concepts used in other domain. It is needed in order to realize full growth of Semantic Web and processes information between ontologies. Consider a scenario: - There are two authors belonging to different countries. One is Australia and other is Denmark. Let two countries develop their web pages and decide to enable their web content. They make use of ontology in making web pages. Here is mapping between the topics of two books belonging to different authors. Both books contain different topics but some are common to both due to which we can find suitable book and give validation to concept of Semantic Mapping. It is shown in figure 2.

But this Page Rank algorithm was not proved useful as it gives non uniform probability. It does not use any probabilistic approach for determining rank of web pages. So, Swoogle has designed RRS (Rational Random Surfer) model which shows various types of links between SWD’s. 3.2.1 Finding Rank of SWO using RRS Consider a given graph, in figure 1, which has nine nodes namely A, B, C, D, E, F, G, H and I. These nodes represent web documents for which we have to find that how many times a user has accessed corresponding web pages. Assume Probability for RRS to visit any of these SWO’s from SWDB is 0.001, then Probability that user visits D is P(D) + P (H) + P (I) = 0.003. Similarly, Probability to visit B is P (B) + 0.003 + P (E) = 0.005. Probability to visit C is P (C) + P (F) + P

A

C

B

D

H

E

F

G

I Fig1: “Finding Rank of SWO’s”

324


Country (University of Copengham)

Country (University of Sydney) Australia

Denmark

Author

Author

Name

Name

Book Title

Title

Topics

Book Topics

About Space Sun Jupiter Red Planet

Ruling Planets Mars Venus Earth

Fig2: “SEMANTIC In figure 2, we have seen that Mars and Red Planet are treated as equivalent concepts in different books. Such a correspondence is called Semantic Mapping.

MAPPING” [9] include Google docs, Google Adsense, Google Books and many more. Google docs allow users to create, edit and upload documents in an online environment. Google AdSense allows website owners to publish their ads on websites and earn money by users when they clicked on those ads.

4. GOOGLE (Traditional Search Engine) VS HAKIA (Semantic Search Engine): A Case Study It seems that Google is a solution to our every problem. Whatever we typed on it, we get some search results which may or may not be relevant to us. Today, Google serves as a multi-purpose search engine that offers information about News, Movies, Games, and Documentation etc. Have we ever thought how these search results are produced to us in few seconds??? The answer to this question will make us flabbergasted and we may move towards Semantic Search Engines. The search results are ranked in order by using SEO (Search Engine Optimization). The relevant results are shown on first page while others are on next page. It is so because Google is Content Writer that means it does not produce exact results of our given query. It can give method to solve our problem but does not solve it completely because it depends on keywords and phrases. It will produce those results with the help of keywords and algorithms which match our given query. The only way to get effective and optimal results is using Semantic Search Engines. There are many search engines which are based on Semantic approach like Hakia, Zitgist, Falcons, Sindise etc.

Hakia: - It is an Internet Search Engine that works on Semantic based approach. It is different from traditional search engines like Google, Yahoo. Hakia was founded by nuclear scientist Riza Berkan and an economist Kouri in 2004. It is search engine that focuses on meaning rather than straight keywords and phrases. It has simple design that has silver sphere dot on ‘I’ in site’s logo. 4.1 Working of Google Working of Google means producing search results to users in context of their queries. It involves mainly three components: Google Crawler, Google Indexer and Google Query Processor. The process follows as: • Web pages downloading are done by a web crawler named GoogleBot. It is a web crawling robot that finds and retrieves pages on web and hands them off to Google Indexer. GoogleBot has many computers attached to it that requests and fetches web pages. Each web page has an associated ID number called docID. When given URL is entered, it is assigned a given docID. • There is URL Server that sends list of URL’s to be fetched by crawler. Fetched web pages are sent to Store Server. Store Server compresses these pages and stores them in repository. • Google Indexer makes documents uncompressed. It removes all bad links in every web page and stores important information. It ignores some punctuation marks as well as converting all letters to lowercase.

Google: - Google began in 1996 as research project by Larry Page and Sergey Brin. They developed an algorithm called as Page Rank algorithm which ranks pages according to website’s relevance. It is determined by number of pages that links to website. There is company named Google Inc. which is an American multinational corporation which provides various products and services. Google products

61 325

•

.

After Indexer, there is Google Query Processor which retrieves stored documents and return search results with the help of Doc Server.

URL’s Server

List of URL’s

Google Bot

Web Server sends Web Pages

Query to Indexer and Tells relevant pages

Indexer

Google user gets search results Doc Server

Fig3: “WORKING OF GOOGLE” [11]

4.2 COMPARISON BETWEEN GOOGLE AND HAKIA There are few differences between them in terms of indexing, methodology, algorithms and query evaluation which are listed in table below: Table3: “GOOGLE vs. HAKIA” [12] GOOGLE HAKIA 1. It is one of traditional search engines that produce 1. It is one of semantic search engines that work on results of given query within the given context. Semantic based approach which is useful for having deep information about the given query. 2. The information retrieved by Google is dependent on 2. The information retrieved by Hakia is independent of keywords, phrases or predefined algorithms that lead to these keywords and predefined algorithms that produce spam results. exact results instead of spam oriented results. 3. It uses HTML, XML language for creation of 3. It uses Semantic Web languages like OWL, RDF for metadata. creation of metadata. 4. It does not focus on stop words like is, or, and, how 4. It focuses on stop words and punctuation marks because it does not produce exact results what user is because it takes into account each and every small looking for. Omitting these words will not affect results. character as it affects search results. 5. Google does not give suggestions until we press 5. Hakia presents some suggestions before pressing Search button. Search button. 6. It displays all web pages that may or may not satisfy 6. It will show only those results that will answer our user’s query and to select relevant page from many query. pages is difficult task. 7. It does not produce results in galleries. 7. It presents results in gallery with categories of different contents matching the query.

326


8. It makes use of keywords to expand query instead of using any methodology. 9. It does not highlight any words or phrases that are most useful in answering query. 10. It does not offer any option to see users who are searching for same information. 11. It uses Page Rank algorithm that is used to rank search results in order of popularity of page.

8. It uses Fuzzy logic to expand query. Fuzzy logic is a problem solving methodology that uses original queries to produce accurate results. 9. It highlights the sentences or words that give answer to user query. 10. It uses an app “Meet Others” that allows various users to come together and discuss important issues. 11. It uses Semantic Rank algorithm that is used to rank search results in order of content of page in response to given query. Better the content, better the rank of page.

TABLE 4: “GOOGLE SEARCH RESULTS vs. HAKIA SEARCH RESULTS” t

Google does not highlight relevant results of query.

Hakia highlights the results that satisfy a query.

Google shows result of stocks in form of web pages.

Hakia uses MoodTrade.com to show stock

TABLE5: “DIFFERENCE IN FINDING REPORTS OF STOCK MARKETS”

63 327

Table6: “RESULTS VIA REGULAR SEARCH IN GOOGLE vs. RESULTS VIA DEEP SEMANTICS IN HAKIA”

Google does not produce results in galleries. It also does not give links of other content matching the query. It displays results via regular search only.

Hakia present results in galleries and gives information about other contents related to query. It shows results via surface semantics, deep semantics and regular search.

4.3 EVALUATION OF QUERY The following table describes process of query evaluation in Google as well as Hakia. Table7: “COMPARISON OF QUERY EVALUATION PROCESS” Query Evaluation in Google

Query Evaluation in Hakia

Analyze given query.

Analyzes entire content of web page

Convert query web page into docID.

Uses Qdexing (Query Detection & Extraction)

Scan document till it matches keywords.

Extract all possible queries of content.

Compute its rank by Page Rank algorithm.

Select relevant queries and find their relevancy by using Semantic Rank algorithm

with natural languages only. These traditional search engines failed to understand the structure and semantics of SWD’s. These engines expect documents to be unstructured text but there are

5. CONCLUSION This paper concludes that current Search engines like Google, Yahoo do not work well with Semantic Web Documents (SWD’s) since they are designed to work

328


various documents involved in Information Retrieval (IR) process. The documents may be Unstructured, Structured, and Semi- Structured. So, for extracting SWD’s from Semantic Web, we have used Semantic Search Engines that works on Semantic based approach for producing accurate results. The information retrieved from Semantic Search Engines like Hakia, Zitgist, and Sindise is independent of keywords and phrases. These keywords produce many results which may or may not satisfy user’s query. This paper also lists the differences between Hakia and Google. We have discussed the prototype crawler named Swoogle that has multiple crawlers to discover SWD’s. It analyzes SWD’s and computes their rank using Rational Random Surfing (RRS) model. Semantic Web Documents (SWD’s) are categorized into two documents: Semantic Web Ontology (SWO’s) documents and Semantic Web Database (SWDB) documents. Finding rank of SWO’s using RRS model has been depicted by graph. The paper also shows that there is need of mapping between Ontology and Semantic Web in order to completely describe the classes, properties and instances of given query. 6. REFERENCES [1]. T. Finin, J. Mayfield, C. Fink, A. Joshi and R.S. Cost, “Information Retrieval and the Semantic Web”, January 2001. [2]. J. Mayfield and T. Finin, “Information Retrieval on the Semantic web: Integrating Inference and retrieval”, “In proceedings of the SIGIR 2003 Semantic Web Workshop”, March 2003. [3]. P. Martin and P. Eklund, “Embedding knowledge in web documents”, “In proceedings of the 8th International World Wide Web Conference (WWW8)”, pages 324341, May 1999. [4]. S. Luke, L. Spector, D. Rager and J. Hendler, “An Introduction to Ontology”, “In Proceedings of the First International Conference on Autonomous Agents (Agents 97)”, pages 59-66, 1997. [5]. Tim Finin, R. Scott, Anupam Joshi, Vishal Doshi, “Swoogle: A Search and Metadata

[6].

[7]. [8]. [9]. [10].

[11].

[12].

[13]. [14].

[15].

[16].

Engine for semantic web”, “In Proceedings of the Thirteenth ACM Conference on Information Knowledge Management”, 2004. Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, “The Page Rank Citation Ranking: Bringing Order to Web”, “A Workshop on IEEE Intelligent Systems”, 1999. http://www.schemaweb.info/, schema web. http://www.daml.org/ontologies/, daml ontology library, by daml. James Hendler, “Agents and Semantic web”, “A Workshop on IEEE INTELLIGENT SYSTEMS”, 2001. Vassilis, Kotis and George A. Vourus, “Semantic Retrieval and ranking of Semantic Web documents using free- form queries”, “Int. J. Metadata, Semantics and Ontologies, Vol.3, No.2”, 2008. M. Preethi, Dr. J. Akilandeswari, “Combining Retrieval with Ontology Browsing”, “International Journal of Internet Computing, Vol.1, Issue-1”, 2011. G. Madhu and Dr. A. Govardhan, Dr. T.V. Rajinikanth, “Intelligent Semantic Web Search Engines: A Brief Survey”, “International Journal of Web & Semantic Technology (IJWesT) Vol.2, No.1”, January 2011. Goetz Graze, “Query Evaluation techniques for large databases”, “In Proceedings of ACM COMPUTING SURVEYS”, 2003. Dekang Li, “An Information- Theoretic Definition of Similarity”, “In Proceedings of the 5th International Conference on Machine Learning”, November 1998. Er Maedche, Steffen Staab, “Ontology Learning for the Semantic Web”, “In Proceedings of the 10th International Conference on IEEE Intelligent Systems”, 2008. Gagandeep Singh, Vishal Jain, “Information Retrieval through Semantic Web- An Overview”, “In Proceedings of CONFLUENCE 2012 - The Next Generation Information Technology Summit”, pages 2327.

65 329

National Conference on Communication Technologies & its impact on Next Generation Computing CTNGC 2012 Proceedings published by International Journal of Computer Applications® (IJCA)

Cloud Computing in Trust Building Knowledge Discovery for Information Retrieval Vishal Jain

Mahesh Kumar Madan

Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad.

Professor and HOD, Computer Science and Engineering Department, Lingaya’s University, Faridabad,

store is to be addressable so that each record in it can be designated, or pointed to, by a symbolized address (name). It is to have direct access so that the information processer can be switched to read the desired record directly, once its name is known, without requiring a search [6] (Rouse, 2010).

ABSTRACT This paper discusses more about cloud computing in trust building knowledge discovery for informational retrieval. The paper also contains some proposed model, algorithm, and experimental results. To arrive at the conclusion of this paper statistical analysis was collected.

3. RETRIEVAL OPERATIONS

KEYWORDS

In many conventional retrieval situations, a search request is constructed by choosing appropriate keywords and content terms and appropriately interconnecting them by Boolean connections ( and, or, not) to express the intent of the requestor. For example, a request covering “tissue culture studies of human breast cancer” may then be transformed into the statement shown below;

Information Retrieval, Knowledge Discovery, Ontology, Cloud Computing

1. INTRODUCTION Cloud computing is a term used to refer to anything which involves offering hosted services using the internet. It can be public or private. Cloud computing is becoming popular in the IT industry. Over the past few years, the supply-and-demand of this new area has been seeing a huge increase of investment in infrastructure and has been drawing broader uses in the United States.

{Breast neoplasm or carcinoma, ductal} and {human or not (any term indicating animal or disease)} and {tissue culture or culture media or chick embryo} and English

3.1 Retrieval application The most common type of retrieval situations is exemplified by a reference retrieval system performing „on demand‟ searches submitted by a given user population. Normally, only the bibliographic information is stored for each item, including authors‟ names, titles, journals or places of publication, dates, and applicable keywords are usable for search purposes. Sometimes, the words of the documents titles can also be searched. Less commonly, more extended text portions such as abstracts, summaries, or even full texts may be stored, in which case a text search (as opposed to a simple keyword search) becomes possible.

2. INFORMATIONAL RETRIEVAL Informational retrieval (IR) is concerned with the structure, analysis, organization, storage, searching, and dissemination of information. An IR system is designed to make available a given stored collection of information items to a user population desiring to obtain access. The stored information is normally assumed to consist of bibliographic items such as the books in a library or documents of many kind; by extension, an IR system may be used to access collections of drawings, films, museum artifacts, patents and so on. In each case, the IR system is designed to extract from files those items that most nearly correspond to existing user as needs as reflected in requests submitted by the user population[1].

In any case, the responses provided by the system consist of references to the bibliographic items that match the user queries. In most conventional situations, the retrieved information is submitted to the users in no particular order of importance. An ordering indecreasing query-document similarity can however be obtained in the more advanced systems, which can then be used advantageously for search negotiation and feedback purposes.

Most operational retrieval services are implemented online using console terminal devices to introduce search queries and to obtain retrieval output. In that case, the information searches may take place interactively in such a technique that data delivered by the users during the search operation is used to obtain improved search output [2]. Furthermore, networks of information centers may be created by supplying suitable connections between individual centers, thereby affording the user population a chance to access the resource of the whole network.

3.2 Algorithm An algorithm is the precise characterization of a method of solving the problem, presented in a language understandable to the device. In certain, an algorithm has the following properties.

When there is a large store of data like a set of customer records, it becomes expensive to search the entire store sequentially to find a particular piece of data. One would like, instead to be able to go directly to the point where the relevant data is to be found and extract it without search. A memory that allows one to do this is termed as random access. A better description for this is addressable, direct access, for there is nothing random about the way in which one approaches it. The

1. 2. 3. 4.

330

Application of the algorithm to a specific input set or problem description outcomes in a finite order of actions. The order of actions has anexceptional initial action. Every action in the system has an exceptional initial action. The system ends with either an answer to the problem, or a report that the problem is unresolved for that set of data.

28

National Conference on Communication Technologies & its impact on Next Generation Computing CTNGC 2012 Proceedings published by International Journal of Computer Applications® (IJCA) This concept can be illustrated with an example. Find the square root of the real number x. As it is stated, this problem is algorithmically either trivial or unsolvable, owing to the irrationality of most square roots. If one accepts 2 as the square root of 2, for example, the solution is trivial. The answer is the square root sign (√) concatenated with the input. In Symbol, the entire algorithm is

To determine if there are only a small number of distinct configurations, one needs to detect when the so-called principle of optimality holds. This principle asserts that every decision that contributes to the final solution must be optimal with respect to the initial state. When this principle holds, dynamic programming drastically reduces the amount of computation by avoiding or evading the listing of some decision systems that cannot perhaps be optimal.

OUTPUT END =

As simple consider computing the n-th Fibonacci number, Fn where Fn = Fn−1 + Fn−2 and F0 =F1 =1. The first few elements of this famous sequence are 1, 2, 3, 5, 8, 13, 21, 34,…The obvious recursive algorithm for computing Fn suffers from the fact that many values of Fi are computed over and over again. However, if one follows the dynamic programming strategy and create a table that contains all values of Fi as they are computed, a linear time algorithm results [5].

INPUT

However, if we want a decimal expression, then the square root of 2 can never be calculated exactly. Hence, the requirement of a finite number of actions is violated.

3.2.1 Quality Judgments of Algorithms Whichever computer program is a semi-algorithm, and any problem that always halts is an algorithm (of course, it may not solve the problem for which the programmer intended it). Given a solvable problem, there are many algorithms (programs) to solve it, not all of equal equality. The primary practical criteria by which the quality of an algorithm is judged are time and memory requirements, accuracy of solutions, and generality. To cite an extreme example, since a properly defined game of chess comprises of a finite number of possible moves, there exists an algorithm to determine the “perfect” chess game. Simply examine all possible move sequences, in some specified order. Unfortunately, the time required to execute any algorithm based on the idea is measured in billions of years, even at today‟s computer speeds. The memory requirements for such an algorithm are similarly overbearing [4].

5. CONCLUSION In conclusion, the most exact answer to this topic is based on the model theory for FOPC, (First Order Predicate Calculus). This defines the structure of a possible world and conditions under which an expression would be true in it, and takes the meaning of a logical expression to be the constraint on this structure imposed by insisting on its truth. It is used as a tool in the analysis of more complex phenomena [4].

6. REFERENCES

The accuracy of algorithm is a characteristic often more closely related to time than to memory requirements. For instance, the square root algorithm presented is not very accurate. Changing the test constant from 0.00005 to 0.00000000005 will produce 0.00000381 as the square root of zero at the cost of more iteration through the circlet of the algorithm. No further memory is needed, and the further iterations require only a small fraction of a second. Further improvement may be obtained from the corresponding algorithm in double-precision at a cost of both run time and additional memory space. In each case the basic algorithmic concept is unchanged [5].

4. DYNAMIC PROGRAMMING Dynamic programming arises when the only algorithm one thinks of is enumerating all possible configurations of the given data and analyzing every one of them to see if it is a solution. An essential idea is to keep a table that contains all previously computed configurations and their outcomes. If the total number of the configurations is large, the dynamic programming algorithm will necessitate substantial time and space. However, if there are merely small amount of distinct configurations, dynamic programming avoids recomputing the solution to these problems over and over [3].

331

[1]

Christauskas, C., &Miseviciene, R. (2012). Cloud -Computing Based Accounting for Small to Medium Sized Business. Engineering Economics, 23(1), 14-21.

[2]

Han, Y. (2011). Cloud Computing: Case Studies and Total Costs of Ownership.Information Technology & Libraries, 30(4), 198-206.

[3]

Jia, Z., Xue, S., Zhang, D., & Li, Q. (2010). Study of Improvement on Programming Method from Cloud

[4]

Computing to Grid Computing. Proceedings of The International Symposium on Electronic Commerce & Security Workshops, 244-248.

[5]

Kumar, P., & Gupta, S. (2011). Abstract Model of Fault Tolerance Algorithm in Cloud Computing Communication Networks.International Journal on Computer Science & Engineering, 3(9), 3283-3290.

[6]

Lizheng, G., Shuguang, Z., Shigen, S., & Changyuan, J. (2012). Task Schedule Optimization in Cloud Computing Basing on Heuristic Algorithm.Journal of Networks, 7(3), 547.

[7]

Rouse, M. (2010). Cloud computing.retrieved from,http://searchcloudcomputing.techtarget.com.

29