Intelligent Clock Rate based user Interest Prediction for Efficient Web ...

6 downloads 257 Views 457KB Size Report
Keywords: Data mining, web search, user interest prediction, clock rate, interest mining. ... user, the search engine can produce efficient web results for the user. .... near-optimal optimization of the utility-privacy tradeoff in an efficient manner.
Transylvanian Review Vol XXIV, No. 10, 2016

Transylvanian Review Centrul de Studii Transilvane| str. Mihail Kogalniceanu nr. 12-14, et.5, Cluj-Napoca Email: [email protected] Online Submission System: http://transylvanianreview.org/

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

Intelligent Clock Rate based user Interest Prediction for Efficient Web Search using Data Mining *1 1

Sasikumar P. and 2Karthikeyan M.

Assistant Professor, Department Of Computer Science and Engineering, Selvam College of Technology. 2 Principal, Tamil Nadu College of Engineering.

Abstract The problem of user interest prediction has been studied in a different situation, and there are some methods has been investigated earlier. The issue of interest prediction could be adapted for various problems like web search, product motivation and so on. The previous approaches suffer to identify the current benefit of the user and could not determine how the interest changes. To improve the performance of interest prediction methods, a novel intelligent clock rate interest mining scheme has been discussed in this paper. The smart clock rate interest mining algorithm, first preprocess the web log data to split them into the time domain. Then the method identifies the list of user interests being visited by the user. From the interest recognized the clock rate algorithm, computes the time spent at each interest and computes the induction of clock rate in specific concern. By monitoring the clock rate at each interval, the algorithm calculates the clock rate mining weight (CRMW) for each interest, which is calculated based on the time rate. The computed CRMW will be used to identify the future scope of the user and helps to improve the performance of web search. The method enhances the performance of web search by 7% than the previous methods. Keywords: Data mining, web search, user interest prediction, clock rate, interest mining.

*Corresponding author: Assistant Professor, Department Of Computer Science and Engineering, Selvam College of Technology. 1732

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

By splitting the entire time window into tiny units, the user interest at each time window can be identified. In order to say that the user has interested in specific topic, it is necessary to compute the frequency of interest in each time window. Such approach is discussed in this paper. The clock rate is the process of computing the time analysis based on the time spent by the user on a particular web page. The user would spend some time on each page whatever he is visiting. But the user will not spend more time on all the web pages he is visiting. The user spends more time when he feels that the page is more informatics. By monitoring the time devoted to the web pages, the clock rate algorithm works. The clock rate algorithm computes the weight for each interest at each time window. The algorithm calculates the clock rate for each interest and analyzes how the clock rate increases between different time window. Based on the clock speed weight the method identifies the change of interest and identifies the persistent interest. The interest mining process uses the clock rate scheme in determining the benefit of user at each time window. Based on the weight computed, the method selects a top rated benefit for the user. By identifying the interest in each time window, the problem of user interest prediction can be performed in an efficient manner.

Introduction The web search is the common activity being performed by any web user. The user looks for different information which is known or unknown. Whatever the category the user simply submits the query to the search engine and gets some results. There are many activities involved in the background, but the user looks for précised results. The user wants the feel that the search engine always works for his queries and waiting for him. So to produce the efficient results to the user the search engine has to work more. The search engine must be precise that what the user looks for. So producing the result according to the user requirement can be named as user interest identification and user interest prediction. The user may be looking at various information, and he might be searching for many information. Some of them may be temporary interest, and some of them are persistent. So the web search engine has the responsibility in identifying the continuing benefit of any web user. Through the identification of the continuing benefit of the user, the search engine can produce efficient web results for the user. For example, the user may search for food, sports, and yoga. The user may search each of them at different times. But the search engine has to identify which of them is searched consistently and which has more impact than the others. Through the identification of the most dominated interest, the search engine can produce efficient results to the user. How the persistent of the interest can be identified is the big question here. The data mining process can be used in the problem of user interest prediction. The user search histories are stored in a weblog, which has various information about the search. From the history of web search, the method can extract various information like the URL being visited, the query submitted, the time spent, actions performed and more. The data mining process can be applied in extracting such information and from the obtained results, the process can generate intelligent inference results. The information gathered can be used in identifying the interest of the user. The data mining process can also be used in predicting the user interest. The probability model is one, which can be used in predicting the user interest. The probability model computes the probability for each interest being visited by the user at the next visit. Similarly, there are some approaches can be used in predicting the user interest. By identifying the user interest, the user can be produced with efficient results. The user interest may be changing at each time, so that the search engine must identify the user interest in efficient manner. The user interest can be identified using various measures and the methods can use many factors. There are methods which identifies user interest based on the time spent, number of visits and so on. However using such time spent value would not produce efficient results.

Related Works

There are some methods has been described for the problem of user interest prediction and this section discuss some of the methods. An Evaluation of Modified Web Search for Separate User [1] focused on the evaluation of the results of individual user’s User Conceptual Index based search and introduced three measures for the purpose. Context includes issues like the nature of info available, the information now being examined, when and what requests in use and so on. The Separate oriented search encompasses fundamentals like the user’s goals, prior besides tacit knowledge, past information seeking performances, among others. Exploring Web Search Results Using Coordinated Views [2], In HotMap, the incidences of each of the query footings from the user’s queries are portrayed visually using color-coding. This lets the users identify effortlessly “hot” papers based on the recurrent arrival of the query terms within the text surrogates. In addition to this graphic picture, the search results can be dynamically resorted based on the term query incidences, supporting an communicating exploration of the exploration results. Application of user access pattern for web personalization has remained discussed in [3]. The method uses sequential patterns to perform web personalization. The method incorporates the sequential pattern mining algorithm which identifies the frequent sequential Web access patterns. The generated sequential patterns are 1733

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

formulated in a tree structure, and the same will be used to perform matching and to generate proposed results or recommendation. Efficient Multiple-Click Models in Web Search [4], presents a click model which logs the clicks and then contain the submitted query, a ranked list of returned documents, whether each of them is clicked or not, and other information that might be useful. Click models learn from user clicks to help understand and incorporate users’ implicit feedback. And they follow a probabilistic approach which treats user clicks as random events, and the goal is to design generative models which can approximate underlying probabilities of clicks with high accuracy. SimRank: A Page Rank approach based on similarity measure [5], suggest a new sheet rank algorithm based on resemblance measure from the interplanetary vector model, called SimRank, to score web pages. Firstly, a new similarity measure used to calculate the similarity of pages and smear it to partition a web file into several web social networks (WSNs). Secondly, they improve the oldstyle Page Rank algorithm by taking into account the significance of a sheet to a given inquiry. Thirdly, we project an efficient web flatterer to download the web data. And finally, experimental studies are performed to evaluate the time efficiency and scoring accuracy of SimRank with other approaches. Exploring Web Search Results Using Coordinated Views [6], have industrialized two systems to provision the visual examination of Web search results: HotMap then Concept Highlighter. In both of these schemes, the search results are providing at two levels of detail: an overview map that delivers a compact and abstract picture of the top 100 documents repaid by the underlying exploration engine; and a detailed space that shows 20 to 25 pamphlets at a time. In this paper, we converse how these coordinated views sustenance the visual examination of Web search results. The user profile for personalized web search [7], effective technique to modified search engines' results is to concept user profile to contemporary an individual user's penchant. Utilizing the relative apparatus learning practices, three approaches are anticipated to build the user contour in this paper. These methods are called as Rocchio method, k-Nearest Neighbors method, and Support Vector Machines method. Experimental results based on a constructed dataset show that k-Nearest Neighbors method is better than others for its efficiency and robustness. Scalable Disseminated Inference of Self-motivated User Interests for Interactive Targeting [8], describe a streaming, distributed inference algorithm which can handle tens of millions of users. Our results show that our model contributes towards improved behavioral targeting of display advertising about baseline models that do not incorporate topical and temporal dependencies. As a side-

effect, our model yields human-understandable results which can be used in an intuitive fashion by advertisers. Predicting User Interests from Contextual Information [9], present-day a methodical study of the effectiveness of five variant springs of contextual evidence for user interest modeling. Post-query steering and general browsing performances far outweigh straight search engine message as an information-gathering activity. Therefore, we led this study with an emphasis on Website recommendations somewhat than search results. The five background information sources used are: social, historic, task, gathering, and user interaction. We appraise the utility of these foundations, and intersections between them, based on how effectually they predict employers' future interests. Our conclusions demonstrate that the sources complete differently dependent on the duration of the time opening used for future calculation and that context intersection outperforms any inaccessible source. Designers of Website proposal systems can use our findings to provide improved provision for post-query steering and general browsing behavior. Personalized Feed Recommendation Service for Social Networks [10], suggest a popularity dispersion model to broadcast feeds in social networks and sustenance our commendation service with a set of adapted indices for feed-based material retrieval. A suite of efficient index management algorithms is developed in our context to address the need of handling the subtleties in social networks. Mining Developing User Profiles in Noisy Web Clickstream Data with a Climbable Immune System Gathering Algorithm [11], discuss that statistics mining has to be completely re-applied occasionally and offline on newly generated Web server logs to keep the exposed knowledge up to a year. Hence, there is a crucial need for climbable, noise insensitive, initialization selfgoverning techniques that can unceasingly discover possibly developing Web user outlines without any strikes or reconfigurations. Social network and user context assisted personalization for recommender systems [12], design a new architecture for user customization which combines both social network data and background data. Our scheme aggregates an operator's favorite data from various social schmoozing services and then sizes a centralized user shape which is available through public Web services. We similarly collect user's background information and stockpile it in an essential space which is also reachable through public Web service area. Based on Service Focused on Architecture, recommender systems can compliantly utilize users' favorite information and context to provide more wanted references. An Agenda for Mining Evolving Drifts in Web Data Streams Using Goahead Learning and Retrospective Validation [13], propose an unassuming similarity quantity that has the lead of explicit connecter the precision and treatment criteria to 1734

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

the early scholarship stages. Even though the cosine resemblance, and its close comparative such as the Jaccard amount, have been predominant in the majority of Web data clustering approaches, they may fail to seek explicitly profiles that attain high attention and high precision concurrently. We also express a validation strategy and familiarize several metrics entrenched in information recovery to the stimulating task of validating an erudite stream synopsis in lively surroundings. Predictive discrete latent factor models for large scale dyadic data [14] propose a novel arithmetical method to predict gigantic scale dyadic reply variables in the occurrence of covariate information. Our approach simultaneously incorporates the effect of covariates and estimates local structure that are persuaded by interactions amongst the dyads through a separate latent factor model. The discovered dormant factors provide a predictive model that is both accurate and interpretable. A utility-theoretic approach to privacy in online services [15], introduce and explore an economics of privacy in personalization, where people can opt to share personal information, in a standing or on-demand manner, in return for expected enhancements in the quality of an online service. We focus on the example of web search and formulate realistic objective functions for search efficacy and privacy. We demonstrate how we can find a provably near-optimal optimization of the utility-privacy tradeoff in an efficient manner. Supporting Privacy Protection in Personalized Web Search [16], propose a PWS framework called UPS that can adaptively generalize profiles by queries while respecting user-specified privacy requirements. Our runtime generalization aims at striking a balance between two predictive metrics that evaluate the utility of personalization and the privacy risk of exposing the generalized profile. We present two greedy algorithms, namely GreedyDP and GreedyIL, for runtime generalization. We also provide an online prediction mechanism for deciding whether personalizing a query is beneficial. In Concept Networks for Personalized Web Search Using Genetic Algorithm [17], a concept network is created to identify users search preferences. This concept network consists of a list of connected concepts founded on the history of users’ preceding search. This rule helps to retrieve web sites related to the context in which the user needs material. Genetic Algorithm (GA) is used when a user examinations for approximately new. GA is used to associate the user's concept network with supplementary user's concept system for similarity, thus benefits to get a better search in their part of the attention. Building Concept Network-based User Profile for Personalized Web Search [18] offerings a novel technique

of creating the user shape of concept system for personalized exploration. The user profile is definite as a design grid, in which each concept is roughly embodied with the formal concept analysis (FCA) system. We assume that a perception, called `conference interest concept,' include a user's query meaning during a consultation meeting, and it can reproduce the user's preference. Whenever an operator issues his/her query, a meeting interest idea is generated. Then, new concepts remain merged into the present concept network (i.e., a user profile) in which recent user favorites are accrued. According to FCA, a meeting interest theory is bright as a pair of degree and intent where the degree covers a set of leaflets selected by the user amongst the exploration results, and the determined includes a set of keyword features removed from the selected leaflets. And, to make a concept network grow, we need to calculate the similarity between a new design and existing ideas, and to this end, we use an orientation concept ladder called Open Directory Project. Deriving Concept-Based Operator Profiles from Hunt Engine Logs [19], emphasis on search train personalization and grow several concept-based operator profiling approaches that are based on both optimistic and bad preferences. We appraise the proposed means against our previously future personalized inquiry clustering method. Experimental consequences show that profiles which capture and use both of the user's optimistic and negative favorites do the best. A Custom-made Intelligent Web Repossession System Founded on the Knowledge-Base Concept and Latent Semantic Indexing Model [20], proposed a Tailored Intelligent Web Retrieval System in the background of network education resources environment to investigation the intellectualization of the modified retrieval tools probing for educational resources on the internet and provide modified information facility for users. Compared to traditional search locomotives, this retrieval scheme can search for connected concepts by entering key arguments more precisely and efficiently. All the above-discussed approach has the problem of false identification of interests and has low accuracy in identifying the user interest.

Clock Rate based user Interest Prediction

The degree clock user interest forecast algorithm calculated the clock rate for both of the distinct attention and based on the heaviness a single interest will be selected. The entire process has been split into some stages namely preprocessing, clock rate mining and user interest prediction.

1735

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

User Query

Intelligent Clock Rate based user Interest Prediction for Efficient Web Search using Data Mining

Clock Rate based user Interest Prediction

Preprocessi ng

Clock Rate Mining

User Interest Prediction

WEB LOG Fig. 1: Architecture of clock rate based user interest prediction approach. The Figure 1, shows the building of clock rate based user interest prediction algorithm and shows the functional components in detail.

Construct a feature vector Fv={Q,Ts,Ap}. End End Stop. The above discussed preprocessing algorithm identifies the list of weblog produced at each time window. The splits them based on the time window and for each record the method extracts the features to create the feature vector.

Preprocessing

The preprocessing algorithm reads the web log and splits them into the different time domain. Then the method extracts various features like query, time spent, actions performed and the topic of the web page. The topic represents the interest. Then the method identifies the list of interests being searched. Each feature extracted is converted into a feature vector which will be used to compute the clock rate weight. Algorithm: Input: Web log Wl. Output: Feature set Fs. Start Read Web log wl. Split log into different time window. Initialize time window log Twl. For each time window Twi from Tw

Clock Rate Mining

The clock rate mining algorithm identifies the list of interests at each time window. For each interest being identified, the algorithm computes the clock weight. The clock weight represents the time being spent by the user on the same interest. By calculating the clock weight for each interest at subsequent time window, the interest of the user can be identified efficiently. Then the method calculates the clock rate by approximating the weight and the time. This will be performed for each interest identified. The calculated clock rate will be used to predict the user interest for the future scope. Algorithm: Input: Feature Set Fs Output: Clock weight set cws. Start

Twl(Twi) = For each log l from Twl(Twi) Extract query submitted Q = L.Query. Extract time spent Ts = L.Time-Spent. Extract Actions Performed ap= L.Actions. 1736

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

Read feature set Fs. Identify unique interests UI.

Algorithm: Input: Clock weight set Cows. Output: User interest UI. Start For each interest I from Interest Set IS Compute clock rate.

UI = For each time window Twi For each interest Ii Compute clock weight cw.

Cr =

Cw =

End Choose the interest with higher clock rate. I = Max(Cr).Interest. Stop. The user interest prediction algorithm computes the clock rate for each of the interest and selects the top weighted interest as the user interest.

End Add computed clock weight to the set. Cws = End Stop. The above-discussed clock rate mining algorithm computes the clock weight at each time window for different interests. The computed clock weight will be used to perform user interest prediction.

Results and Discussion The proposed clock rate mining algorithm for user interest prediction and recommendation system has been implemented and tested for its effectiveness. The method has been evaluated with a large number of web users, and the system has collected many weblogs belongs to the users. The proposed method has produced efficient results in all the factors of web mining.

User Interest Prediction

The user interest prediction algorithm uses the clock weight computed at the previous stage. For each interest identified, the method calculates the clock rate using the pre-computed clock weight. Based on the clock rate calculated, the method determines the interest with higher clock rate. The selected interest represent the more top interested topic.

Table 1: The details of implementation parameters. Parameter Size of weblog Number of users Number of Interest Time window considered The Table 1, shows the details of implementation has been used to evaluate the proposed method. The method has used six months log collected by monitoring the search

Value 5 Million 1500 100 6 Months

history of 1500 users and interest into 100 numbers in overall the size of log becomes 5 million.

1737

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

Interest Prediction accuracy %

Interest Prediction Accuracy 100 90 80 70 60 50 40 30 20 10 0

1000 Users 2000 Users 5000 Users

Graph 1: Comparison of interest prediction accuracy. The Graph1 shows the comparison of interest prediction accuracy produced by different methods, and it

shows clearly that the process has produced higher accuracy in interest forecasting.

Time Taken in seconds

Time Complexity 200 180 160 140 120 100 80 60 40 20 0

1000 Users

2000 Users 5000 Users

Sim-rank Multiple Hot Map State Click Graph

Clock Rate

Graph 2: Comparison of time complexity of different methods. The Graph 2. Shows the comparison of time complexity produced by different methods and it shows

clearly that the proposed method has produced less time complexity than others.

1738

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

False Ratio %

False Prediction Ratio 40 35 30 25 20 15 10 5 0

1000 Users 2000 Users 5000 Users

Sim-rank Multiple Hot Map State Click Graph

Clock Rate

Graph 3: Comparison of false prediction ratio. The Graph 3, demonstrates the result of comparative analysis of incorrect prediction rate produced by different

methods. It shows clearly that the proposed method has produced less wrong rate than other methods.

Web Search Efficiency %

Web Search Efficiency 100 80

60

1000 Users

40

2000 Users

20

5000 Users

0 Sim-rank Multiple Hot Map State Click Graph

Clock Rate

Graph 4: Comparison of web search efficiency. The Graph 4, shows the comparison of web search efficiency produced by various methods. The result shows clearly that the proposed clock rate algorithm has

produced higher web search efficiency than other methods.

Table 2: Comparative Results on various parameters. Method Interest Prediction False Prediction Name Accuracy in % with Ratio % with number number of users of users 1000 2000 5000 1000 2000 5000 Sim Rank 72 74 78 28 32 35 Multiple Click 77 78 82 21 24 27 Hot Map 81 84 86 16 19 22 State Graph 98.9 99.3 99.6 2.8 4.5 6 Clock Rate 99.6 99.7 99.9 2.1 3.2 4.5 1739

Time Complexity in seconds with a number of users. 1000 2000 5000 92 140 190 77 120 140 61 110 130 15 42 56 13 32 41

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

The Table 2, shows the comparative results on different parameters produced by different methods at varying number of users. The result shows clearly that the proposed method has produced efficient results than other methods.

7)

Conclusion

8)

We offered an intelligent clock rate based interest prediction to improve the performance of web search. The proposed method receives the user query and submits to the standard search engine and retrieves the result to return to the user. The pages visited and the actions performed by them and the time spent and a number of clicks made etc are traced and produced as log in the web log data set. The method first, splits the internet log into a number of the time window and extracts the features of each log from the log. For each log, the method extract the query, URL, actions made and time spent features to generate the feature vector. Then the method computes the clock weight for each interest at each time window. Using the clock weight computed the method compute the clock rate for each interest. Finally, a single interest will be selected based on the clock rate. The method produces efficient results in user interest prediction and reduces the false prediction ratio.

9) 10) 11)

12)

13)

References 1)

2) 3)

4)

5)

6)

14)

Wang Xiao-gang, Web mining based on user access patterns for web personalization Computing, Communication, Control, and Management, CCCM, ISECS International Colloquium on, Vol 1, page:194 – 197, 2009. Fan Guo X.Lou,Efficient Multiple-Click Models in Web Search, ACM international conference on web search and data mining, 2008 Shaojie Qiao, SimRank: A Page Rank approach based on similarity measure, IEEE international conference on Intelligent Systems and Knowledge Engineering (ISKE), Page(s):390 – 395, 2010. Liang Deng, Martin D. F. Wong, An Exact Algorithm for the Statistical Shortest Path Problem,

15)

16)

17) 18)

ACM conference on Asia South Pacific design automation, pages 965-970, 2006.

Coordinated & Multiple Views in Exploratory Visualization, pages 3-13, 2006. Chunyang Liang User profile for personalized web search, International conference on fuzzy systems and knowledge discovery, Vol:3, pp:1847-1850, 2011. Amr Ahmed, Yucheng Low “ Scalable Distributed Inference of Dynamic User Interests for Behavioral Targeting “, ACM-(2011) Ryen W. White. Predicting User Interests from Contextual Information” Microsoft Research, ACM-2009. Huajing Li, Personalized Feed Recommendation Service for Social Networks, SocialCom, PP:96-103, 2010. O. Nasraoui, C. Cardona, C. Rojas, and F. Gonzalez, “Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm,” Aug. 2003 Akther.A, Social network and user context assisted personalization for recommender systems, IEEE, Innovations in Information Technology, pp:95100, 2012. O. Nasraoui, C. Rojas, and C. Cardona, “A Framework for Mining Evolving Trends in Web Data Streams Using Dynamic Learning and Retrospective Validation,”, Elsevier, Computer networks, vol.50, issue 10, 2006. D. Agarwal and S. Merugu. Predictive discrete latent factor models for large scale dyadic data. KDD, 2007. Andreas Krause and Eric Horvitz, "A utility-theoretic approach to privacy in online services", Journal of Artificial Intelligence Research (JAIR), vol. 39, pp. 633-662, 2010 Lidan Shou, He Bai, Ke Chen and Gang Chen, Supporting Privacy Protection in Personalized Web Search, IEEE transactions on knowledge and data engineering, vol. 26, no. 2, 2014 Ramesh babu, Samuel, Concept Networks for Personalized Web Search Using Genetic Algorithm, Elsevier, procedia computer science, vol.46, 2015. Han-joon Kim. Sungjick Lee. Byungjeong Lee. Sooyong Kang. Building Concept Network-based User Profile for Personalized Web Search. Proceedings of

9thIEEE/ACIS International Conference on Computer and Information Science: Washington DC; 2010. p.

S.Sendhilkumar and T.V. Geetha, An Evaluation of Personalized Web Search for Individual User, International Conference on Artificial Intelligence and Pattern Recognition (AIPR07), FL, USA, pages 484 -490, 2007. Orland Hoeber and Xue Dong Yang, Exploring Web Search Results Using Coordinated Views, Fourth IEEE International Conference on

567-572. 19) Kenneth Wai, Ting Leung, Lee. Dik Lun Deriving Concept Based User Profiles from Search Engine Logs IEEE Transactions on knowledge and data engineering, 22 (7) (2010), pp. 969–982 20) Lihua Wu. JianPing Feng. Yunfen Luo. A Personalized Intelligent Web Retrieval System Based on the Knowledge-Base Concept and Latent Semantic 1740

Sasikumar and Karthikeyan

Transylvanian Review: Vol XXIV, No. 10, 2016

Indexing Model. IEEE 7thACIS International Conference on Software Engineering Research, Management and Applications (SERA ‘2009): Haikou; 2009. p. 45-50.

1741

Suggest Documents