Implementation of Students behavior Using

0 downloads 0 Views 759KB Size Report
Tamil Nadu, India. Kamalindia81@yahoo. ... appropriatedata from the file while information retrieval aims to select relevant ... Server logs are authorized Transfer log,. Error Log ... The results of the analysis of traffic on the course were estimated by .... We convert the log files into a flat Excel file with extension Xls, whereas,.
International Journal of Pure and Applied Mathematics Volume 116 No. 21 2017, 751-759 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue

Implementation of Students behavior Using Association Rule Mining Technique 1

S. Kamalakkannan and 2S. Prasanna 1

Vels University, Chennai, Tamil Nadu, India.

[email protected] 2

Vels University, Chennai, Tamil Nadu, India. [email protected]

Abstract Internet usage behaviors of user’s on the Internet are one of the major study areas. Various Website organization schemes were present in thetext based on diverse perspectives. Web mining makes benefit of data mining techniques. It deals with empathetic behavior of students by making use of log files. When students interact with the web, the interacted information’s of the user are stored in a special type of log files. User Name, Time Stamp, IP Address, Access Request, variety of Bytes Transferred, URL that Referred, Result Status and User Agent are the information considers in the log files. The log files are managed by the web servers. Our proposal focuses on the use of web mining techniques to classify web pages and web site type according to students visit. This classification helps us to find the student’s pattern behavior to explore the relationship between the similarity of student’s behavior and their use of browser in academic and non-acedamic. Key Words:Student’s behavior, log files, web mining, clustering, classification.

751

ijpam.eu

International Journal of Pure and Applied Mathematics

Special Issue

1. Introduction Web mining one of the vital types of data mining, which is used to distillate web data from web pages. Data mining consists of the structured information form while the web mining consists of the unstructured information form and semi structured data form. Web mining is classified in three groups, i.e. web usage mining, web structure mining and web content mining to extract web data[1]. Web Usage Mining (WUM) can be a wise data processing methodology which may be inclined to discover user entry patterns from weblog knowledge. Abundance of analysis has been accomplished already concerning this space and also the obtained results are employed in completely different applications, like suggesting the internet usage patterns, Personification, method improvement and business intelligence [2]. Web usage mining has various applications which are worn in the given below areas:  It admits users the capability to analyze huge volumes of click surge or click flow data, combine the data smoothly, with interpretation and statistical data from offline resources. Personalization for a person may be achieved by keeping track of formerly accessed pages. These pages can be used to become aware of the standard surfing behavior of a person and finally to expect desired pages.  By deciding access behavior of users, wanted hyperlinks can be recognized to enhance the overall performance of future accesses.  Web usage patterns are used to accumulate business intelligence to enhance purchaser enchantment, purchaser retention,sales, advertising, and advertisements go income.  Web usage mining is utilized in e-Learning, e-Government, e-Newspapers, Digital Libraries, e-Business and e-Commerce. The records accumulated through Web mining is evaluated by way of using conventional statistics mining parameters consisting of clustering and classification, association, and examination of sequential patterns [3]. Web Mining Tasks Web mining can be disintegrated into the following subtasks [4]: a) Information Retrieval (or Resource Discovery): Search is probably the primary application of the Web having its roots in information retrieval. Information retrieval (IR) helps the customers to find the required information available from a large collection of text documents. b) Information Extraction (Selection and Preprocessing): This task deals with the transformation of the data retrieved during the information retrieval process into a form that can be easily analyzed. Information extraction aims to select appropriatedata from the file while information retrieval aims to select relevant

752

International Journal of Pure and Applied Mathematics

Special Issue

documents. c) Generalization (Pattern Recognition and Machine Learning): It automatically generates general patterns from both the individual web sites aswell as across multiple sites. Machine learning methods or data mining techniques are generally used for the generalization purpose. d) Analysis (Validation and Interpretation): Once the patterns have been recognized, it is mandatory to explore and confirm those minedpatterns. The aim of this task is to validate the mined patterns. Based on the above mentioned subtasks, web mining may be reviewed as the benefit of data mining techniques to necessarily restore, extractand evaluate information for knowledge discovery from internet documents and services. Web Logs The Web Server data is actually the user logs that are generated on the Web Server. Logs allow the analyst to trailand studythe behaviorsof consumer’s who visit the website. Web logs are acting as a container, consist of theconsumer interaction information with the web means to store the click information by the user in a website and this information is useful for mining. Weblogs are the plaintext (ASCII) files contain data about User Name,user IP Address, Time Stamp, Access Request,error codes,URL that Referred etc. and commonlyinhabit in the web servers. Server logs are authorized Transfer log, Error Log, Agent log and Referrer Log[1]. A Web log is a document to which the Web server writes data every time a consumer requests a source for that specific site. Most of the logs need the format of the general log format. The details provided below is a partof the server logs for loganalyzer.net. 66.249.65.107 - - [08/Oct/2007:04:54:20 -0400] "GET /support.html HTTP/1.1" 200 11179 "-" "Mozilla/5.0(compatible;Googlebot/2.1;+http://www.g oogle.com/bot.html)".

2. Literature Survey In this paper proposes the User logs of a famous search engine hold a tune of consumer activities including consumer queries, consumer click through from the returned listing, and user surfing behaviors.Wisdom about consumer questions identified from consumer logs can upgrade the accomplishment of the search engine. We suggest a data-mining access that produces generalized query patterns or templates from the raw consumer logs of a famous commercial knowledge-based search engine that is currently in use. Our simulation shows that such templates can improve search engine’s speed and precision, and can cover queries not asked previously. The templates are also comprehensible, so web, editors can easily discover topics in which most users are interested[5].

753

International Journal of Pure and Applied Mathematics

Special Issue

Analyzing consumer’s Web log information and drawn out their attractions of internetwatching behaviors are essential and dispute research subject of Web usage mining. Consumer visits their favorite sites and now and again seek new sites via acting key-word seek on search engine. Consumers’ internet watching behaviors can be regarded as a graph given that visited Web sites and entered seek keywords are linked with each other in a time series.The technique is to derive sub graphs representing the consumer’s regular visit from a site-keyword graph which is generated from augmented Web audience measurement data (Web log data). Experimental results shows that our new method succeeds in finding sub graphs which contain most of the users’ interested sites [6]. The analysis of student’s behavior in a web learning environment within distance learning is the main significant areas of learning optimization. The aim of this article is to analyzestudent’sbehavior and the use of e-learning progress in subject Discrete Mathematics. Information and results of this study are valuable for further adjustment and advancement of the E-Books. The objective of our study is to analyze the use of the Mathematics subject, which is one of the compulsory subjects taught at the Department of Informatics. Data and analysis results are valuable for further adjustment and advancement of the E-Books. The results of the analysis of traffic on the course were estimated by using association rules [7]. Surfing Internet pages by the consumer provide a lot of data in the log-file. Evaluating log-files,data drives us to realize the behavior of the consumer. Web-logs involve web server access logs and function server logs. Weblog is a valuable segmentofweb mining to excerpt the custom patterns and research the visiting significance of consumer. Applying the definitive data mining method such as clustering and association rules, particular consumercan be correlated with other consumer present same behavior patterns and preferences. Then this consumercan be allowed for specific links and sales events tailored to the individual person own option, based on datesupplied by the clustering or association rule algorithms [8]. The paper discusses about browsing behavior of the user and the user’s interest, the technology of web mining, the information source from the system i.e. web log data and the apriori algorithm is used. Web graph is generated by incorporating the behavior of the user’s browsing. The paper also tells us how the user’s estimation of behavior is based onstudying web logs. [9].

3. Web Server Log Files A Web log is a document to which the Web server writes data every time a consumer requests a useful resource from that specific web site. Almost logs adopt the format of the usual log format. The detail listed below is a portion of the server logs for loganalyzer.Internet. 66.249.65.107 - - [Oct/08/2007:04:54:20 -0400] "GET /support.html HTTP/1.1"

754

International Journal of Pure and Applied Mathematics

Special Issue

200 11179 "-""Mozilla/5.0(compatible;Googlebot/2.1;+http://www.google. com/bot.html)". Before we start, we must be knowledgeable about the types of data forms feasible for the analysis of consumer behavior. Web usage datareceive the input files from the web server log files, or web logs. For every request from a user’s internet to a network server, a reply is created has automatically said to be a web log file,web log or log file (not to be baffled with blogs, of course, which are necessary web journals, occasionally called web logs). This replyreceived from the easy single-line transaction document that is appended to an ASCII text file on the web server. This text file may be commadelimited, space-delimited, or tab-delimited. A Standaard Log-file has the Following Format Remote host: This area consists of the browser IP address of the remote host preparing the desire,such as 141.243.1.172”. If the remote host name is accessible through a DNS lookup,this name is furnished, such as “wpbfl245.gate.net.” to acquire the domain name of the remote host rather than the IP address,the server must submit a request, using the browser DNS to resolve (i.e., translate) the IP address within a host name. In consideration of persons choose to perform with domain names and systems are most effective with IP addresses, the DNS system implements an essential interface between humans and computers. Logname: This area is used to stock the authenticated client user name, if it is required. The log namearea was created to accommodate the verified user name data that a client used to contribute to benefit access to the catalogue that are password protected. If no such data is provided, the area defaults to a hyphen. Username: This is the username in which the consumer has authenticated himself Date/time: field format:[DD:HH:MM:SS] where DD represents the day of the month and HH:MM:SS show the 24-hour time. Nevertheless,it is more general for the date/time field to pursue the following format: “DD/Mon/ YYYY:HH:MM:SS offset,” where the offset is a plus or Minus constant signify in hours how far earlier of or trailing the local time server is from Greenwich MeanTime (GMT). For example, a date/time field of “09/Jun/1988:03:27:00 0500” indicates that a request was made to a server at 3:27 a.m. on June 9, 1988, and the server is 5 hours behind GMT. HTTP request: The HTTP request area contains the data that the client’s internet has requested from the network server. A Web server log is an valuable resource for operating Web Usage Mining being it absolutely records the browsing behavior of the students. The information recorded in server logs give back the (possibly simultaneous) access to a Web site by multiple consumers.

755

International Journal of Pure and Applied Mathematics

Special Issue

These log-files can be stocked in different formats one of these formats is shown in table 1. Table 1: Simulated Web Server Logs Time 4:10 5:00 5:30 5:40 6:00 6:45 7:00 7:20 9:00 9:30 9:45 10:00

URL http://www.fabbers.com/tech/STL_format http://ferryhalim.com/oris.htm http://Kaspersky-lab.com http://astroscience.com/yantra.asp http://ferryhalim.com/oris.htm https://www.lifewire.com/top-social https://www.linkedin.com/ https://www.populationmedia.org/news/weekly-news http://timesofindia.indiatimes.com/home/headlines http://www.alijazeera.net/news/pages http://www.complete-review.com/ http://www.dotnetperls.com

User PC1 PC2 PC1 PC2 PC1 PC1 PC2 PC2 PC1 PC1 PC2 PC1

Type Education Entertainment Education Education Entertainment Social Social News News News Entertainment Education

The above table.1 shows that index.asp receives maximum interest from students. The Visitors section helps in determining that who all accessed the website.We proposed our own classification of websites which is shown in Figure 1. Some of the new categories introduced in the classification include: General and Professional, business, e-books, programs in Academic category, Social Networking, Community, Entertainment in the Non-academic category. We included Social Networking as an important category as it has evolved in recent years and include web sites which enable a community of users to collaboratively generate contents and share them.

4. Classification Techniques Classification is the task of mapping a document item into one ofa few predefined classes. In the internet domain, the user may develop a profile which belongs to a specific class or category. This requires extraction and options of features that best describe the properties of the given class or category. Classification techniques perform a major role in Web analytics applications for modeling the users according to various predefined metrics. In the Web domain, one is interested in developing a profile of users belonging to a specific class or category. This requires extraction and options of features that best describe the properties of a given class or category. Association Rule Mining Techniques Association rule mining discovery and statistical correlation analysis can discover groups of netting page types that are generally acquired together (Association rule mining may be needed to discover the correlation between pages types found in a web log), This, in turn, enables Web sites to organize the site content more professionally. Most common approaches to association discovery are based on the Apriori algorithm. This algorithm discovers groups of feature (page-views occurring in the preprocessed log) appearing frequently together in more transactions (i.e., satisfying a user specified minimum support threshold). Such groups of pages types are referred to as frequent datasets.

756

International Journal of Pure and Applied Mathematics

Special Issue

Association rules which satisfy a minimum confidence threshold are then generated from the frequent datasets. Mining of Association Rules Two-step Approach Step 1: Frequent Item Generation - Generate all items set who support ≥ minsup Step 2: Rule Generation–Creating high assurance rules from every frequency item set, where every rule is a binary division of a frequent item set. Prepare the Simulated Log-file We convert the log files into a flat Excel file with extension Xls, whereas, shown in the table: Table 2: Excel File of Simulated Web Server Logs

Dominant use (63.6%) of the internet is for Non-Academic purposes as compared to academic. Most periodically visited websites under the classification of Non-academic/Social/Movie/Journal/News/job search/ecommerce and classification ofacademic/journal/program/ Open Source and free coding/E-books.

Student's Behavior Analysis based on Types 600.00 400.00 200.00 0.00

Visitors

Visitors

Types Figure 1: Student’s Behavior Analysis based on Types From the above figure.1, we identified that the most of the students were browsed Non-Academic websites compared to the academic website. In that non Academic website, the most student’s frequently visited websites are News(18.77%) and in the case of Academic website, the most of the student’s visiting websites are journals(11.82%).

757

International Journal of Pure and Applied Mathematics

Special Issue

5. Conclusion We used statistical methods such as classification, association rule mining find and statistical correlation analysis, which can discover groups of internet page types that are generally accessed together. Association rule mining can be used to discover the correlation between sites types found in a web log. This, in turn, enables web sites to organize the site content more efficiently. Web data are a real source to analyze the student behavior on the web.

References [1]

Deeptisahu, Shweta meena, Detecting Users Behavior from Web Access Logs with Automated Log Analyzer Tool, International Journal of Computer Science and Information Technologies 5(4) (2014), 5106-5109.

[2]

Dr. Punit Goyal, Identification of Human Behavior using Analysis of Web log Data Mining, International Journal of Information Technology 1(1) (2013).

[3]

Suneetha K.R., Krishnamoorthi R., Identifying user behavior by analyzing web server access log file, International Journal of Computer Science and Network Security 9(4) (2009), 327-332.

[4]

Rachit Goel, Enhanced Web Mining Technique To Clean Web Log File, International Journal of Computer Applications 96(16) (2014).

[5]

Ling C.X., Gao J., Zhang H., Qian W., Zhang H., Mining generalized query patterns from web logs, Procd. of the 34th Annual Hawaii International Conference on System Sciences (2001).

[6]

Murata T., Saito K., Extracting keywords of web users' interests and visualizing their routine visits, International Conference on Control, Automation, Robotics and Vision (2006), 1-6.

[7]

Reichel J., Kuna P., Analysis of students behaviour in virtual environment, IEEE 12th International Conference on Emerging eLearning Technologies and Applications (2014), 419-423.

[8]

Zubi Z.S., Raiani M.S.E., Using Web Logs Dataset Via Web Mining for User Behavior Understanding, Int J Comput Comm 8(2014), 103-111.

[9]

Ladekar A., Pawar P., Raikar D., Chaudhari J., Web Log Based Analysis of User's Browsing Behavior, International Journal of Computer Applications 115(11) (2015).

758

759

760

Suggest Documents