2012 45th Hawaii International Conference on System Sciences
Introduction to the Web Mining Minitrack Dave King JDA Software
[email protected]
Web Mining is the application of data mining and information extraction techniques aimed at discovering patterns and knowledge from the Web. Traditionally, Web Mining is divided into three classes: • •
•
•
Web Content Mining - discovery of useful information from text, image, audio or video data in the Web. Web Structure Mining - analysis of the node and connection (graph) structure underlying single web sites, as well as larger collections of interrelated sites Web Usage Mining - often called Web analytics involves extracting useful information from server logs and other sources detailing usage patterns.
•
•
Overall the minitrack is designed to encompass papers of a quantitative, theoretical or applied nature whose content falls within one or more of the above classes. Examples of more specific topics of interest include but are not limited to the following: • • • • • • • • • •
Text mining of Web and Social Media content Opinion mining and sentiment analysis Web usage analysis Link analysis Analysis of search behavior Predictive analytics based on Web and social media content and search behavior Recommendation analysis Visual analysis of Web structure, usage, and content Semantic representations of Web content and linkages Analysis of Web-based collective intelligence
•
The four papers accepted to the inaugural year of this minitrack include:
978-0-7695-4525-7/12 $26.00 © 2012 IEEE DOI 10.1109/HICSS.2012.380
3570
Forecasting the Unemployment Rate by Neural Networks Using Search Engine Query Data – Proposes and tests a novel neural network based method for forecasting the unemployment rate prediction using search engine query data. The empirical results show that the proposed method outperforms traditional time series forecasting methodsand can improve the efficiency and effectiveness of the prediction. A Text Mining Model for Strategic Alliance Discovery - Proposes a text mining model that automatically extracts strategic organizational alliances from news articles. The model is examined from a recall, precision and F-measure perspective. The paper also shows that widely cited Thomson SDC database only covers less than 23% of total alliances. Diversification of Web Search Results through Social Interest Mining - Presents a novel search results diversification technique that integrates social interest mined from query logs with a probabilistic model based on query-URL bipartite graphs. Experimental results show that this technique outperforms existing techniques in terms of both the relevance and the diversity of result documents retrieved by a query. Using Sequence Analysis to Classify Web Usage Patterns across Websites - This study applies sequence analysis to identify and categorize the distinct and similar web browsing patterns of 200 China users’ web usage for 30 consecutive days. The results reveal four key, unique web navigation behavior categories, namely searchinformation browsing, social-information browsing, ecommerce-information browsing, and direct browsing. These categories are examined from a demographic and a behavior perspective.