SE data enables us to extract useful information from the data in order to better understand manage software engineering
Integrating Software Engineering Data Using Semantic Web Technologies Yuan-Fang Li
Hongyu Zhang
Monash University
Tsinghua University,
Melbourne, Australia
Beijing, China
MSR 2011, May 2011
• A huge amount of SE data have been accumulated over the years…
Bugzilla Mailings Source Code
Requirements
CVS/ SVN
Execution traces Crash
Developer
• The abundance of SE data enables us to extract useful information from the data in order to better understand manage software engineering activities.
Mining software repository
Logs …
Metrics Customer 2
Collecting and integrating SE data is a non-trivial task: • Great variability in the SE data: data may come from different sources, for different purposes, in different formats, languages, etc. • The data are often disparate and distributed as well.
The lack of an open, commonly-agreed schema hinders the integration of SE data. 3
Semantic Web
Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Empowered by standard technologies such as RDF, SPARQL, OWL, and SKOS. They have been successfully applied in many domains to provide a solution to data integration and knowledge management
Overall Structure
• Ontology Definition • Data Collection and Translation • Data Integration
A Case Study on Eclipse /1
Proof of concept: an initial study of Eclipse 3.0 - a large open source project containing more than 10,000 files. We integrate the following data from the Eclipse 3.0:
Object-oriented language data:
Model OO language elements such as class, method, attribute, visibility…
Use MOOSE to collect OO language data
Program dependency data
Model dependable and dependent classes
Use Dependency Finder to collect the dependency data
Metrics data
Model various complexity metrics
Use Understand for Java to collect metrics data
Note the data may come from different sources
A Case Study on Eclipse /2 We developed a program to automatically convert different datasets into RDF triples and store them in a native (on-disk, persistent) Sesame triple store.
Querying the Semantic Repository /1
Having integrated data, we can then perform queries to understand the software project.
SPARQL queries can be issued over the integrated RDF dataset
Query examples:
Find the top 10 Eclipse 3.0 classes that are larger than 500 LOC (lines of code) and have WMC (weighted methods per class) larger than 10, ordered by descending LOC value
Querying the Semantic Repository /1
Having integrated data, we can then perform queries to understand the software project.
SPARQL queries can be issued over the integrated RDF dataset
Query examples:
Find the top 10 Eclipse 3.0 classes that are larger than 500 LOC (lines of code) and have WMC (weighted methods per class) larger than 10, ordered by descending LOC value Find all classes that use the public attribute x defined in class org.eclipse.swt.graphics.Point
Querying the Semantic Repository /2
More query examples:
Find classes in Eclipse 3.0 that depend on package org.apache.tools.ant but not on the org.eclipse.ant.core package, and that have more than one subclass.
Querying the Semantic Repository /2
More query examples:
Find classes in Eclipse 3.0 that depend on package org.apache.tools.ant but not on the org.eclipse.ant.core package, and that have more than one subclass. Find subclasses of org.eclipse.jdt.internal.compiler.ASTVisitor that will be affected if method traverse() in class org.eclipse.jdt.internal.compiler.ast.ASTNode is changed.
• A huge amount of SE data have been
accumulated. • We propose to apply Semantic Web techniques to representing and integrating SE data. • We believe effective representation and integration of SE data can pave the way for more powerful analysis, mining and reasoning. Mailings
Bugzilla
Source Code
CVS/ SVN
Crash
Developer Requirements
Execution traces
Logs …
12
Thank you! Hongyu Zhang School of Software Tsinghua University Beijing 100084, China Email:
[email protected] Web: http://info.thss.tsinghua.edu.cn/hongyu