Feb 25, 2014 - Statistics. Data Mining. Time Series. Image Processing. Neighborgrams. Web Analytics. Text Mining. Networ
Analyzing the Web from Start to Finish Knowledge Extraction from a Web Forum using KNIME Feb 25, 2014
Bernd Wiswedel KNIME.com AG, Zurich, Switzerland
Agenda • • • •
KNIME Overview Demo / Intro KNIME forum analyis … using KNIME Q/A
A Brief History of KNIME 2004: KNIME development commences 2006: KNIME v1 released 2006: Spin-off in Konstanz, Germany 2006-2007: First commercial partners 2008: KNIME moves to Zurich 2010: Enterprise products released 2011: KNIME.com AG founded 2013: KNIME comes to the West Coast… +3000 Organizations Using KNIME ~30% Life Science ~70% Business Intelligence, Analytics +50 Very Active Community Developers
„KNIME saved my life in a world of scripts that I do not want to learn!“ 2012
Who’s Using KNIME? • >22.000 Individuals • ~3.000 Organizations world wide • ~400 KNIME.com Customers
The KNIME Platform
KNIME loads and integrates data from diverse data sources: • Different data bases • Various file formats (CSV, XML, SDF, etc.)
KNIME provides huge repository of modules for easy-to-use, modular • Data preprocessing • Data fusion • Data transformation
In addition to standard data mining techniques, KNIME adds cutting edge data analysis algorithms. (…thanks to its academic roots)
Interactive views provide data overviews and insights into the learned models. Interactive linking&brushing techniques allow for powerful exploration of models and data.
KNIME Due to its open API and “node-in-a-sandbox”-approach additional (also external) tools are easily integrated, e.g. • Access to the statistics tool R • Complete integration of the machine learning library WEKA • Application area specific integration, e.g. CDK (Chemical Development Kit), RDKit, ImageJ, … KNIME is Eclipse-based: Integrating other Eclipse projects such as BIRT, DTP, etc. provides even more functionality
KNIME Selected Node Highlights Over 1000 native and imbedded nodes included: Statistics Data Mining Time Series Image Processing Neighborgrams Web Analytics Text Mining Network Analysis Social Media Analysis WEKA R
Database Support ETL Text Processing Data Generation XML Read/Write PMML Read / Write Social Media Analysis Business Intelligence Community Nodes 3rd Party Nodes
Advanced Visualization
14
Demo.
KNIME Forum Analysis http://tech.knime.org/forum
KNIME Forum Analysis Challenges: • Get data into KNIME • Extract simple statistics (how many posts, response time, response length) • Classify topics and detect topic shifts • Identify content and users
Forum Analysis – Classify Posts • Use text mining to classify forum post into categories such as ‘io’, ‘manipulation’, ‘mining’, … • No training set available (mis-)use KNIME node description • See evolution of discussion topics over the years
Forum Analysis – Classify Posts Want to classify forum post (only first post, no comments)…
Forum Analysis – Classify Posts … using KNIME node description text as labeled training set
Demo.
Forum Analysis – Content & Users • Look at individual categories (KNIME General, Developer, Reporting, …) • Learn what is discussed • See who is contributing
Forum Analysis – Content & Users
Input are all discussions in one forum category…
Forum Analysis – Content & Users Output is a multi page report with tag cloud and user connection graph
Combines KNIME’s text and network mining extensions
Demo.
Thank You For more information:
[email protected] http://www.KNIME.com KNIME.com AG Technoparkstr.1 8005 Zurich Switzerland Tel: +41-44-445-2660 Fax: +41-44-445-2662