Integrating Software Engineering Data Using Semantic ... - Google Sites

1 downloads 148 Views 4MB Size Report
SE data enables us to extract useful information from the data in order to better understand manage software engineering
Integrating Software Engineering Data Using Semantic Web Technologies Yuan-Fang Li

Hongyu Zhang

Monash University

Tsinghua University,

Melbourne, Australia

Beijing, China

MSR 2011, May 2011

•  A huge amount of SE data have been accumulated over the years…

Bugzilla Mailings Source Code

Requirements

CVS/ SVN

Execution traces Crash

Developer

•  The abundance of SE data enables us to extract useful information from the data in order to better understand manage software engineering activities.

Mining software repository

Logs …

Metrics Customer 2

  Collecting and integrating SE data is a non-trivial task: •  Great variability in the SE data: data may come from different sources, for different purposes, in different formats, languages, etc. •  The data are often disparate and distributed as well.

The lack of an open, commonly-agreed schema hinders the integration of SE data. 3

Semantic Web

 

 

 

Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Empowered by standard technologies such as RDF, SPARQL, OWL, and SKOS. They have been successfully applied in many domains to provide a solution to data integration and knowledge management

Overall Structure

•  Ontology Definition •  Data Collection and Translation •  Data Integration

A Case Study on Eclipse /1  

 

Proof of concept: an initial study of Eclipse 3.0 - a large open source project containing more than 10,000 files. We integrate the following data from the Eclipse 3.0:  

 

 

 

Object-oriented language data:  

Model OO language elements such as class, method, attribute, visibility…

 

Use MOOSE to collect OO language data

Program dependency data  

Model dependable and dependent classes

 

Use Dependency Finder to collect the dependency data

Metrics data  

Model various complexity metrics

 

Use Understand for Java to collect metrics data

Note the data may come from different sources  

A Case Study on Eclipse /2 We developed a program to automatically convert different datasets into RDF triples and store them in a native (on-disk, persistent) Sesame triple store.

Querying the Semantic Repository /1  

Having integrated data, we can then perform queries to understand the software project.  

SPARQL queries can be issued over the integrated RDF dataset

  Query examples:

Find the top 10 Eclipse 3.0 classes that are larger than 500 LOC (lines of code) and have WMC (weighted methods per class) larger than 10, ordered by descending LOC value

Querying the Semantic Repository /1  

Having integrated data, we can then perform queries to understand the software project.  

SPARQL queries can be issued over the integrated RDF dataset

  Query examples:

Find the top 10 Eclipse 3.0 classes that are larger than 500 LOC (lines of code) and have WMC (weighted methods per class) larger than 10, ordered by descending LOC value Find all classes that use the public attribute x defined in class org.eclipse.swt.graphics.Point

Querying the Semantic Repository /2

  More query examples:

Find classes in Eclipse 3.0 that depend on package org.apache.tools.ant but not on the org.eclipse.ant.core package, and that have more than one subclass.

Querying the Semantic Repository /2

  More query examples:

Find classes in Eclipse 3.0 that depend on package org.apache.tools.ant but not on the org.eclipse.ant.core package, and that have more than one subclass. Find subclasses of org.eclipse.jdt.internal.compiler.ASTVisitor that will be affected if method traverse() in class org.eclipse.jdt.internal.compiler.ast.ASTNode is changed.

•  A huge amount of SE data have been

accumulated. •  We propose to apply Semantic Web techniques to representing and integrating SE data. •  We believe effective representation and integration of SE data can pave the way for more powerful analysis, mining and reasoning. Mailings

Bugzilla

Source Code

CVS/ SVN

Crash

Developer Requirements

Execution traces

Logs …

12

Thank you! Hongyu Zhang School of Software Tsinghua University Beijing 100084, China Email: [email protected] Web: http://info.thss.tsinghua.edu.cn/hongyu