A Study on Third Party Component Reuse in Java ...

3 downloads 2123 Views 748KB Size Report
Jun 18, 2013 - proprietary Java based systems. • Study 3 [10]: • Sample: apps in the Android Market. → Many ... On the Extent and Nature of Software Reuse in Open Source Java Projects. In Top .... Facebook, Twitter and SlideShare.
Widura Schwittek Stefan Eicker

Paper presentation on CBSE 2013 conference, Vancouver, CA 06/18/2013

© paluno

A Study on Third Party Component Reuse in Java Enterprise Open Source Software

Agenda • Background • Study implementation • Study results • Threats to validity

CBSE 2013 conference, Vancouver, CA

2

© paluno

• Wrap up & Future Work

Background Why studying third party component reuse?

• Known from practice: Third party components (TPCs) are heavily reused in industry projects • Because of

[2]

• reduced costs, • faster time-to-market, and • better software quality

• Some say: TPC reuse is a key success factor in software development [1]

[1] Gartner. 2008. The Evolving Open-source Software Model. Predicts from December 2008.

CBSE 2013 conference, Vancouver, CA

3

© paluno

[2] Li, J., Conradi, R., Bunse, C., Torchiano, M., Slyngstad, O. P. N., and Morisio, M. 2009. Development with Off-the-Shelf Components: 10 Facts. IEEE Softw. 26, 2, 80–87.

Background A few studies exist on third party component reuse (excerpt)

• Study 1

:

[3]

• Sample: 20 heterogenic Open Source Java projects

• Study 2

:

[4]

• Sample: 106 curated heterogenic Open Source Java projects + 178 proprietary Java based systems

• Study 3

:

[10]

• Sample: apps in the Android Market.

 Many applications heavily rely on third party components [3] Heinemann, L et al. On the Extent and Nature of Software Reuse in Open Source Java Projects. In Top productivity through software reuse (International Conference on Software Reuse) 6727. Springer, Berlin, 207–222.

[10] Ruiz, I. J. M., et al. 2012. Understanding reuse in the Android Market. In 2012 20th IEEE International Conference on Program Comprehension (ICPC). Proceedings ; June 11-13, 2012, Passau, Germany, IEEE, 113–122.

CBSE 2013 conference, Vancouver, CA

4

© paluno

[4] Raemaekers, S. et al. 2012. An Analysis of Dependence on Third-party Libraries in Open Source and Proprietary Systems. In Proceedings of the Sixth International Workshop on Software Quality and Maintainability (SQM 2012).

Background Our study goals

 Contribute further empirical data to software reuse research  Be specific to one application type:  Enterprise web applications

CBSE 2013 conference, Vancouver, CA

5

© paluno

 Build a basis for further research – Explorative study

Study implementation Sample Selection

• Enterprise Open Source web applications • "Open Source" because it's simply available • "Enterprise" because we’re interested in high quality software • "web applications" because of the third party component “jungle”

• 36 projects selected based on an internet survey

CBSE 2013 conference, Vancouver, CA

6

© paluno

• Enterprise domain such as • Business Intelligence, • Knowledge Management, • Content Management • Popularity: Used by big companies (Testimonials, References etc.), large download rate • Still actively developed

Study implementation Retrieval of reuse data

2

3

• Identifying third party components from artifacts • Use available meta-data (e.g. from maven repository) to support identification

• Remove artifacts that do not belong to a third party component

CBSE 2013 conference, Vancouver, CA

7

© paluno

1

• download and extract WAR file • locate third party component artifacts

Study implementation Retrieval of reuse data

• (Semi-)Automatic approach has advantages • Retrieval is repeatable • Large number of projects can be analyzed

• Major challenge: mapping artifacts to third party components • Mapping table

icu4j

Third party component name com.ibm.icu

icu4j-3_8_1

com.ibm.icu

icu4j-charsets

com.ibm.icu

icu4j-localespi

com.ibm.icu

slf4j-api

slf4j

slf4j-jcl

slf4j

slf4j-log4j12

slf4j

ICU third party component

SLF4J third party component

CBSE 2013 conference, Vancouver, CA

8

© paluno

Artifact name

Study results What have we found out?

• 36 web applications reuse 3311 unique artifacts • 3311 unique artifacts could be mapped to 651 third party components • Using 863 mappings

• Average of 70 components per web application have been reused

CBSE 2013 conference, Vancouver, CA

9

© paluno

• From 16 to 161

Study results What have we found out?

• Which of the analyzed applications relied on the highest number of TPCs? Alfresco

Version

Dom ain

# of Comp.

Web Application

Version

Dom ain

# of Comp.

6.0.0RC1

CRM

68

2.0

Other

66

4.2c

CM

161

Hipergate

Liferay Portal CE

6.1.1GA2

CM

144

Kuali Mobility

XWiki Enterprise

4.5RC1

SM

130

hippoCMS

7.7.0

CM

61

dotCMS

2.2

CM

127

openWGA CE

6.0.7

CM

60

Pentaho CE

4.8.0

BI

118

DSpace JSPUI

3.1

CM

60

openKM

6.2.2

KM

97

Tntnconcept

0.21.16

Other

55

jallInOne SOA

2.8.2

ERP

91

Nexus

2.3.1_01

SD

52

Kuali People Management

1.2.2

HR

89

Bonita Open Solutions

5.9.1

BPM

49

Kuali Coes

5.0.1

Other

89

JRoller

5.0.1

SM

44

2.0.0M5

Other

87

hippoCMS site

7.7.0

CM

42

openCMS

8.5.1

CM

83

Daisy

2.4.2

CM

41

openOLAT

8.3.3

LM

79

Walrus CMS

1.5

CM

41

logicalDOC

6.6.1

CM

75

Ametys site

3.4.0

CM

37

Magnolia

4.5.7

CM

71

jallInOne

2.8.2

ERP

29

Jenkins

1.501

SD

70

vosaoCMS

0.9.14

CM

28

Ametys

3.4.0

CM

70

Jamwiki

1.2.4

SM

27

3.1

CM

69

Agorum

7.0.4

CM

16

Kuali Student

DSpace XMLUI

CBSE 2013 conference, Vancouver, CA

10

© paluno

Web Application

Study results What have we found out?

• Which of the analyzed applications relied on the highest number of TPCs? # of Comp.

Version

Domain

4.2c

CM

161

Liferay Portal CE

6.1.1GA2

CM

144

XWiki Enterprise

4.5RC1

SM

130

2.2

CM

127

Pentaho CE

4.8.0

BI

118

openKM

6.2.2

KM

97

jallInOne SOA

2.8.2

ERP

91

Kuali People Management

1.2.2

HR

89

Kuali Coes

5.0.1

Other

89

2.0.0M5

Other

87

8.5.1

CM

83

Alfresco

dotCMS

Kuali Student openCMS

CBSE 2013 conference, Vancouver, CA

11

© paluno

Web Application

Study results What have we found out?

• Why does Alfresco CMS use 161 TPCs?

CBSE 2013 conference, Vancouver, CA

12

© paluno

• 25% document processing such as PDF, Office • 21% XML processing and WS • 16% accessing external services such as Google Docs, Facebook, Twitter and SlideShare

Study results What have we found out?

• Which TPC was reused most (top 40)? # reused

Library

# reused

commons-collections

34

org.springframework

21

commons-codec httpcomponents

34 32

org.apache.poi

21

aopalliance

20

commons-lang

32

org.antlr

19

commons-beanutils

32

org.objectweb.asm

19

commons-io

31

org.codehaus.woodstox

19

commons-fileupload

29

org.bouncycastle

19

commons-logging

28

javax.xml.xml-apis

18

dom4j

26

commons-compress

18

commons-digester

25

net.java.dev.rome

18

org.apache.log4j

25

xpp

17

slf4j org.apache.xerces

25 25

geronimo.specs hibernate

17 16

org.jdom

25

net.sf.cglib

15

org.apache.lucene

24

15

commons-pool

23

com.thoughtworks. xstream org.apache.xalan

net.sf.ehcache

23

jaxen

15

javax.mail

22

stax

15

org.apache.oro

22

net.sourceforge.nekohtml

14

commons-dbcp

21

org.apache.pdfbox

14

• 19 Apache Foundation TPCs • 9 XML processing libs • Building blocks: caching, web, ORM, crypto, RSS, search

15

CBSE 2013 conference, Vancouver, CA

13

© paluno

Library

Study results What have we found out?

• Which TPC was reused most (top 40)? # reused

Library

# reused

commons-collections

34

org.springframework

21

commons-codec httpcomponents

34 32

org.apache.poi

21

aopalliance

20

commons-lang

32

org.antlr

19

commons-beanutils

32

org.objectweb.asm

19

commons-io

31

org.codehaus.woodstox

19

commons-fileupload

29

org.bouncycastle

19

commons-logging

28

javax.xml.xml-apis

18

dom4j

26

commons-compress

18

commons-digester

25

net.java.dev.rome

18

org.apache.log4j

25

xpp

17

slf4j org.apache.xerces

25 25

geronimo.specs hibernate

17 16

org.jdom

25

net.sf.cglib

15

org.apache.lucene

24

15

commons-pool

23

com.thoughtworks. xstream org.apache.xalan

net.sf.ehcache

23

jaxen

15

javax.mail

22

stax

15

org.apache.oro

22

net.sourceforge.nekohtml

14

commons-dbcp

21

org.apache.pdfbox

14

• 19 Apache Foundation TPCs • 9 XML processing libs • Building blocks: caching, web, ORM, crypto, RSS, search

15

CBSE 2013 conference, Vancouver, CA

14

© paluno

Library

Study results What have we found out?

• Which TPC was reused most (top 40)? # reused

Library

# reused

commons-collections

34

org.springframework

21

commons-codec httpcomponents

34 32

org.apache.poi

21

aopalliance

20

commons-lang

32

org.antlr

19

commons-beanutils

32

org.objectweb.asm

19

commons-io

31

org.codehaus.woodstox

19

commons-fileupload

29

org.bouncycastle

19

commons-logging

28

javax.xml.xml-apis

18

dom4j

26

commons-compress

18

commons-digester

25

net.java.dev.rome

18

org.apache.log4j

25

xpp

17

slf4j org.apache.xerces

25 25

geronimo.specs hibernate

17 16

org.jdom

25

net.sf.cglib

15

org.apache.lucene

24

15

commons-pool

23

com.thoughtworks. xstream org.apache.xalan

net.sf.ehcache

23

jaxen

15

javax.mail

22

stax

15

org.apache.oro

22

net.sourceforge.nekohtml

14

commons-dbcp

21

org.apache.pdfbox

14

• 19 Apache Foundation TPCs • 9 XML processing libs • Building blocks: caching, web, ORM, crypto, RSS, search

15

CBSE 2013 conference, Vancouver, CA

15

© paluno

Library

Study results What have we found out?

• Which TPC was reused most (top 40)? # reused

Library

# reused

commons-collections

34

org.springframework

21

commons-codec httpcomponents

34 32

org.apache.poi

21

aopalliance

20

commons-lang

32

org.antlr

19

commons-beanutils

32

org.objectweb.asm

19

commons-io

31

org.codehaus.woodstox

19

commons-fileupload

29

org.bouncycastle

19

commons-logging

28

javax.xml.xml-apis

18

dom4j

26

commons-compress

18

commons-digester

25

net.java.dev.rome

18

org.apache.log4j

25

xpp

17

slf4j org.apache.xerces

25 25

geronimo.specs hibernate

17 16

org.jdom

25

net.sf.cglib

15

org.apache.lucene

24

15

commons-pool

23

com.thoughtworks. xstream org.apache.xalan

net.sf.ehcache

23

jaxen

15

javax.mail

22

stax

15

org.apache.oro

22

net.sourceforge.nekohtml

14

commons-dbcp

21

org.apache.pdfbox

14

• 19 Apache Foundation TPCs • 9 XML processing libs • Building blocks: caching, web, ORM, crypto, RSS, search

15

CBSE 2013 conference, Vancouver, CA

16

© paluno

Library

Study results What are resulting follow-up questions?

• 70 components per web application in average • How can this be effectively managed regarding updates, especially those that are security fixes? • How risky is it to not keep track?

• Some TPC are often used, some less; some web applications use more TPC than others

CBSE 2013 conference, Vancouver, CA

17

© paluno

• Why is it like this? Can patterns be identified? • Can this knowledge contribute to the component identification process in other projects?

Threats to validity Internal validity

• Manual input • Grouping artifacts to TPC • Removing non-TPC artifacts

CBSE 2013 conference, Vancouver, CA

18

© paluno

 making manual input explicit and traceable

Threats to validity External validity

• Small sample size • drawing general conclusions on software reuse not possible • but we get a good impression on TPC reuse in Java based Open Source web applications

• Preselection is biased by the authors

CBSE 2013 conference, Vancouver, CA

19

© paluno

 extent study (see next chapter)

Wrap up • Conducted a study on TPC reuse in Enterprise OS web apps based on Java • A tool to support the data generation has been developed which supported the survey • The study results showed that reuse happens supporting other studies and the practitioner's impression/guess • Further questions developed

CBSE 2013 conference, Vancouver, CA

20

© paluno

• How to cope with huge amounts of TPCs? • Can patterns in the data support the identification process of TPC?

Future Work • Address the new questions • Web platform for Reuse Documentation/Architectural Documentation • Recommender System to support TPC identification

CBSE 2013 conference, Vancouver, CA

21

© paluno

• Extend/Replicate study to other platforms/other application types/commercial software

Thanks for your attention! Feel free to ask questions!

CBSE 2013 conference, Vancouver, CA

22

© paluno

Widura Schwittek [email protected] http://www.paluno.de