Search Engine ensures efficient Web-page ranking and retrieving. Page ranking is ... Keywords: Search Engine, Ontology, Relevant Page Graph Model, Index. Based Acyclic .... Check Left Side Page Ontology Link Field until Link. Not Found ...
Introducing Dynamic Ranking on Web-Pages Based on Multiple Ontology Supported Domains Debajyoti Mukhopadhyay1,4, Anirban Kundu2,4, and Sukanta Sinha3,4 1
Calcutta Business School, D.H. Road, Bishnupur 743503, India Netaji Subhash Engineering College, West Bengal 700152, India 3 Tata Consultancy Services, Whitefield Rd, Bangalore 560066, India 4 WIDiCoReL, Green Tower C- 9/1, Golf Green, Kolkata 700095, India {debajyoti.mukhopadhyay,anik76in,sukantasinha2003}@gmail.com 2
Abstract. Search Engine ensures efficient Web-page ranking and retrieving. Page ranking is typically used for displaying the Web-pages at client-side. We are going to introduce a data structural model for retrieval of the searched Webpages. We propose two algorithms in this paper. The first algorithm constructs the Index Based Acyclic Graph generated by multiple ontologies supported crawling and the second algorithm is for calculation of ranking of the selected Web-pages from Index Based Acyclic Graph. Keywords: Search Engine, Ontology, Relevant Page Graph Model, Index Based Acyclic Graph Model, Web-page Ranking.
1 Introduction Search Engine exhibits a list of Web-pages as a result of a search made by the users. In this scenario, the display order of the links of Web-pages is very important factor. Different Search Engines use several ranking algorithms to rank the Web-pages properly with respect to the users’ point of view [3]. Relevant Page Graph Model consists of multiple domain specific Web-pages [2]. This model takes huge time to retrieve the data. In this background, we incorporate a new Index Based Acyclic Graph Model which provides faster access of Web pages to the users. This paper involves the basic idea of searching Web-pages from Index Based Acyclic Graph and also provides the order of selected Web-pages at the user-end.
2 Existing Model of Relevant Page Graph Model In this section, Relevant Page Graph (RPaG) is described. Every Crawler [5] needs some seed URLs to retrieve Web-pages from World Wide Web (WWW). All Ontologies [1], Weight Tables and Syntables [4, 6] are needed for retrieval of relevant Web-pages. RPaG is generated only considering relevant Web-pages. In RPaG, each node contains Page Identifier (P_ID), Unified Resource Locator (URL), four Parent Page Identifiers (PP_IDs),Ontology relevance value (ONT_1_REL_VAL, T. Janowski, H. Mohanty, and E. Estevez (Eds.): ICDCIT 2010, LNCS 5966, pp. 104–109, 2010. © Springer-Verlag Berlin Heidelberg 2010
Introducing Dynamic Ranking on Web-Pages
105
ONT_2_REL_VAL, ONT_3_REL_VAL), Ontology relevance flag (ONT_1_F, ONT_2_F and ONT_3_F) fields information. A sample RPaG is shown in Fig. 1. Each node in this figure of RPaG contains four fields; i.e., Web-page URL, ONT_1_REL_VAL, ONT_2_REL_VAL and ONT_3_REL_VAL. Here, “Ontology Relevance Value” contain calculated relevance value if these overcome “Relevance Limit Value” of their respective domains. Otherwise, these fields contain “Zero (0)”.
Fig. 1. Arbitrary Example of Relevance Page Graph (RPaG)
Definition 1. Weight Table - This table contains two columns; first column denotes Ontology terms and second column denotes weight value of that Ontology term. Weight value must be in the interval [0,1]. Definition 2. Syntable - This table contains two columns; first column denotes Ontology terms and second column denotes synonym of that ontology term. For a particular ontology term, if more than one synonyms exists then it should be kept using comma (,) separator.
3 Proposed Approach with Analytical Study In our approach, we have constructed IBAG from RPaG and further have searched the Web-pages from IBAG for a given “Search String”. Finally, a search string is given as input on the Graphical User Interface (GUI); and as a result, corresponding Webpage URLs are shown as per ranking mechanism followed. 3.1 Index Based Acyclic Graph Model In this section, Index Based Acyclic Graph (IBAG) has been described. A connected acyclic graph is known as a tree. In Fig. 2, a sample IBAG is shown. It is generated by our prescribed algorithm which is described in Section 3.2. RPaG pages are related in some Ontologies and the IBAG generated from this specific RPaG is also related to the same Ontologies. Each node in the figure (refer Fig. 2) of IBAG contains Page Identifier (P_ID), Unified Resource Locator (URL), Parent Page Identifier (PP_ID), Mean Relevance value (MEAN_REL_VAL), Ontology link (ONT_1_L, ONT_2_L, ONT_3_L) fields. In each level, all the Web-pages’ “Mean Relevance Value” are kept in a sorted order and all the indexes which track that domain related pages are also stored. In Fig. 2, ‘X’ means currently the ontology link does not exist. The calculation of MEAN_REL_VAL is described in Method 1.1 of Section 3.2. Using “Maximum Mean Relevance Span Value” (α), “Minimum Mean Relevance Span Value” (β) and “Number of Mean Relevance Span level” (n) we calculate Mean Gap Factor (ρ) = (α - β) / n. Now we define ranges such as β to β+ ρ, β+ ρ to β+ 2ρ, β+ 2ρ to β+ 3ρ and so on.
106
D. Mukhopadhyay, A. Kundu, S. Sinha
Fig. 2. Index Based Acyclic Graph (IBAG)
3.2 Construction of IBAG from RPaG In this section, the design of an algorithm is discussed. It generates IBAG from RPaG. Different methods are shown for better understanding of the algorithm. Algorithm 1. Construction of IBAG INPUT: Relevant Page Graph (RPaG) Constructed from Original Crawling, Number of Mean Relevance Span Level, Maximum Mean Relevance Span and Minimum Mean Relevance Span OUTPUT: Index Based Acyclic Graph (IBAG) Step 1: Take Relevant Page Graph (RPaG) Constructed from Original Crawling, Number of Mean Relevance Span Level, Maximum Mean Relevance Span and Minimum Mean Relevance Span from user and generate one Dummy Page for each Mean Relevance Span Level Step 2: Take one Page (P) from RPaG and Call CAL_MEAN_REL_VAL (Page P) and find Mean Relevance Span Level Step 3: If this Mean Relevance Span Level contains only Dummy Page; Then replace the Dummy Page and goto Step 4; Otherwise goto Step 5 Step 4: For Each Supported Ontology Set Ontology Index Filed of That Level = P_ID of Page P End Loop goto Step 6 Step 5: Insert Page (P) in IBAG as follows: Call Find_Location (Incomplete IBAG, Page P) Call Find_Parent (RPaG, Incomplete IBAG, Page P) Call Set_Link (RPaG, Incomplete IBAG, Page P) Step 6: goto Step2 until all the pages traverses in RPaG Step 7: End Method 1.1: Cal_Mean_Rel_Val Cal_Mean_Rel_Val (Page P) MEAN_REL_VAL:= •(Relevance (Relevance Value for each Ontology) / Number of supported Ontology. Return MEAN_REL_VAL END
Introducing Dynamic Ranking on Web-Pages
107
Method 1.2: Find_Location Find_Location (Incomplete IBAG, Page P) All Left Side Page Mean Relevance Value is Grater Than Page P Mean Relevance Value and All Right Side Page Mean Relevance Value is Lesser Than Page P Mean Relevance Value and return Location. END Method 1.3: Find_Parent Find_Parent (RPaG, Incomplete IBAG, Page P) If More than one parent exists in RPaG Then For Each Parent Page Call Cal_Mean_Rel_Val(Parent Page of Page P in RPaG) End Loop Take Maximum MEAN_REL_VAL Page among those Parent Pages in RPaG as a Parent of Page P in IBAG End If If Page P Location is Left Most Position Then For each left side page in parent level IBAG of right side Parent Page of page P If parent of P in RPaG found Then Add Page P as a Child of that Parent Page in IBAG and Return; End If End Loop Add Page P as a Child of Right Side Page Parent in IBAG Else If Page P Location is Right Most Position Then For Each right side Page in parent level IBAG of left side Parent Page of Page P If parent of P in RPaG found Then Add Page P as a Child of that Parent Page in IBAG and Return; End If End Loop Add Page P as a Child of left Side Page Parent in IBAG Else If Left Side Page Parent of Page P in IBAG = Parent of Page P in RPaG Then Add Page P as a Child of Left Side Page Parent in IBAG Else If Right Side Page Parent of Page P in IBAG = Parent of Page P in RPaG Then Add Page P as a Child of Right Side Page Parent in IBAG Else If Left Side Page Parent of Page P in IBAG != Right Side Page Parent of Page P in IBAG Then Find ‘Parent Page of P in RPaG’ between those two Parents in IBAG If Found Then Add Page P as a Child of that Parent Page in IBAG
108
D. Mukhopadhyay, A. Kundu, S. Sinha
Else Add Page P as a Child of left Side Page Parent in IBAG End If Else Add Page P as a Child of Left Side Page Parent in IBAG End If Return; END Method 1.4: Set_Link Set_Link (RPaG, Incomplete IBAG, Page P) For Each Supported Ontology Check Left Side Page Ontology Link Field until Link Not Found and Then If Link Came From Index Then Set Page P Ontology Link Field = Ontology Index Filed of That Level and Ontology Index Filed of That Level = P_ID of Page P Else Set Ontology Link Field of Page P in IBAG = Ontology Link Field of Left Side Tracked Page in IBAG and Ontology Link Field of Left Side Tracked Page in IBAG = P_ID of Page P End If End Loop END 3.3 Procedure for Web-Page Selection and Its Related Dynamic Ranking In this section we have described an algorithm which typically selects Web-pages from IBAG from the given Relevance Range and have selected Ontologies from User-side. Finally, Web-page URLs are shown based on their calculated rank. Algorithm 2. Web-page Selection INPUT: Relevance Range, Ontology Flags, Search String, Index Based Acyclic Graph (IBAG) OUTPUT: Web Pages According to the Search String Step 1: Initially taken one Search string, Index Based Acyclic Graph (IBAG) Step 2: Parse the Input Search string and find ontology terms. If there doesn’t exist any ontology terms then exit Step 3: Select all Web pages according to their Range and Selected Ontology Step 4: Call Cal_Rank (Input String Ontology Terms, Selected Web Pages) Step 5: Display Web pages according to their Rank Step 6: End
Introducing Dynamic Ranking on Web-Pages
109
Method 2.1: Cal_Rank Cal_Rank (Input String Ontology Terms, Selected Web Pages) For Each Web Page For Each Input String Ontology Term RANK = RANK + Number of occurrence of Input String Ontology Terms in the Web page * Weight Value of Ontology Term; End loop Set RANK Value of the Web Page and then make RANK = 0; End loop END
4 Conclusion In this paper, a prototype of Multiple Ontology supported Web Search Engine is shown. It retrieves Web-pages from Index Based Acyclic Graph model. This prototype produces faster result as well as it is highly scalable and the Ranking algorithm generates the order of the Web-page URLs. ï
References 1. Heflin, J., Hendler, J.: Dynamic Ontologies on the Web, Department of Computer Science University of Maryland College Park, MD 20742 2. Mukhopadhyay, D., Sinha, S.: A New Approach to Design Graph Based Search Engine for Multiple Domains Using Different Ontologies. In: 11th International Conference on Information Technology, ICIT 2008 Proceedings, Bhubaneswar, India. IEEE Computer Society Press, California (2008) 3. Kundu, A., Dutta, R., Mukhopadhyay, D.: An Alternate Way to Rank Hyper-linked Webpages. In: 9th International Conference on Information Technology, ICIT 2006 Proceedings, Bhubaneswar, India. IEEE Computer Society Press, California (2006) 4. WordNet, http://en.wikipedia.org/wiki/WordNet 5. Mukhopadhyay, D., Biswas, A., Sinha, S.: A New Approach to Design Domain Specific Ontology Based Web Crawler. In: 10th International Conference on Information Technology, ICIT 2007 Proceedings, Bhubaneswar, India. IEEE Computer Society Press, California (2007) 6. WordNet, http://en.wikipedia.org/wiki/George_A._Miller