Quality Attribute Variability in Software Product Lines

3 downloads 13465 Views 7MB Size Report
products, or product variants, that share a managed set of features and a product line ...... hosted Magento web shop, the configuration takes place via.
Aalto University publication series DOCTORAL DISSERTATIONS 149/2015

Quality Attribute Variability in Software Product Lines Varying Performance and Security Purposefully Varvana Myllärniemi

A doctoral dissertation completed for the degree of Doctor of Science (Technology) to be defended, with the permission of the Aalto University School of Science, at a public examination held at the lecture hall T2 of the school on 6 November 2015 at 12.

Aalto University School of Science Department of Computer Science Preago Research Group

Supervising professor Prof. Marjo Kauppinen Thesis advisor Prof. Tomi Männistö, University of Helsinki, Finland Preliminary examiners Prof. Eduardo Almeida, University of Bahia, Brazil Prof. Emer. Kai Koskimies, Tampere University of Technology, Finland Opponent Prof. Ivica Crnkovic, Chalmers University of Technology, Sweden

Aalto University publication series DOCTORAL DISSERTATIONS 149/2015 © Varvana Myllärniemi ISBN 978-952-60-6413-0 (printed) ISBN 978-952-60-6414-7 (pdf) ISSN-L 1799-4934 ISSN 1799-4934 (printed) ISSN 1799-4942 (pdf) http://urn.fi/URN:ISBN:978-952-60-6414-7 Unigrafia Oy Helsinki 2015 Finland

Abstract Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi

Author Varvana Myllärniemi Name of the doctoral dissertation Quality Attribute Variability in Software Product Lines: Varying Performance and Security Purposefully Publisher School of Science Unit Department of Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 149/2015 Field of research Software Engineering Manuscript submitted 12 June 2015

Date of the defence 6 November 2015 Permission to publish granted (date) 25 August 2015 Language English Monograph

Article dissertation (summary + original articles)

Abstract Software product line engineering is a plan-driven paradigm to produce varying software products. Software product lines typically differentiate the products by their functionality. However, customers may have different needs regarding performance, security, reliability, or other quality attributes. Building a software product line that is able to efficiently produce products with purposefully different quality attributes is a challenging task. The aim in this dissertation was to study why and how to vary quality attributes purposefully in a software product line. The study focused on two quality attributes, performance and security. We conducted a systematic literature review on quality attribute variability, conducted two case studies on performance variability, and constructed a design theory and artifacts addressing security variability. The results indicate that quality attributes can be purposefully varied to serve different customer needs, to conduct price differentiation, and to better balance product and design trade-offs. Additionally, quality attributes can be varied to adapt to varying operating environment constraints. The quality attribute differences can be communicated to the customers as observable product properties, as internal resources, or as the target operating environments. In particular, security can be distinguished as countermeasures. In the product line architecture, quality attribute differences can be designed through software or hardware design tactics or by relying on indirect variation. Just designing the differences may not be enough to ensure the product has given quality attributes, but the impact of other variability may need to be handled at the product-line or product level. Our contributions are as follows. Instead of focusing on how to represent quality attribute variability, we focused on understanding the phenomenon of how specific quality attributes vary. We identified several differences between performance and security variability, for example, that security is more difficult to distinguish to the customers but more straightforward to design and derive. We combined design and customer viewpoints: the reason to vary and the means to communicate to the customers should be analyzed both from the technical and non-technical viewpoints. Finally, we drew evidence-based generalizable knowledge from the industrial context.

Keywords software product lines, quality attributes, variability ISBN (printed) 978-952-60-6413-0 ISBN (pdf) 978-952-60-6414-7 ISSN-L 1799-4934 Location of publisher Helsinki Pages 204

ISSN (printed) 1799-4934 Location of printing Espoo

ISSN (pdf) 1799-4942 Year 2015

urn http://urn.fi/URN:ISBN:978-952-60-6414-7

Tiivistelmä Aalto-yliopisto, PL 11000, 00076 Aalto www.aalto.fi

Tekijä Varvana Myllärniemi Väitöskirjan nimi Laatuvarioituvuus ohjelmistotuoteperheissä: Suorituskyvyn ja tietoturvan tarkoituksellinen varioituvuus Julkaisija Perustieteiden korkeakoulu Yksikkö Tietotekniikan laitos Sarja Aalto University publication series DOCTORAL DISSERTATIONS 149/2015 Tutkimusala Ohjelmistotuotanto Käsikirjoituksen pvm 12.06.2015 Julkaisuluvan myöntämispäivä 25.08.2015 Monografia

Väitöspäivä 06.11.2015 Kieli Englanti

Yhdistelmäväitöskirja (yhteenveto-osa + erillisartikkelit)

Tiivistelmä Ohjelmistotuoteperheet ovat tapa tuottaa suunnitelmallisesti toisistaan eroavia ohjelmistotuotteita. Tyypillisesti tuoteperheen tuotteita differentioidaan toisistaan toiminnoilla. Asiakkailla voi kuitenkin olla erilaisia tarpeita liittyen tuotteiden suorituskykyyn, tietoturvaan, luotettavuuteen tai muihin laatuattribuutteihin. On haastavaa rakentaa tuoteperhe, jonka avulla pystytään tehokkaasti ja hallitusti tuottamaan laadullisesti toisistaan eroavia tuotteita. Tämän väitöskirjan tavoitteena oli tutkia miksi ja miten laatuattribuutteja varioidaan tarkoituksellisesti tuoteperheissä. Tutkimus keskittyi kahteen laatuattribuuttiin, suorituskykyyn ja tietoturvaan. Teimme systemaattisen kirjallisuuskatsauksen laatuvarioituvuudesta, kaksi tapaustutkimusta suorituskykyvarioituvuudesta sekä rakensimme artefaktoja ja niitä koskevia teorioita tietoturvavarioituvuudesta. Tulosten mukaan tarkoituksellisella laatuvarioituvuudella voidaan vastata asiakkaiden erilaisiin tarpeisiin, differentioida tuotteita hinnalla ja tasapainottaa suunnitteluun liittyviä ristiriitoja. Laatua voidaan varioida myös, jotta tuote mukautuisi paremmin toimintaympäristöönsä. Erot laatuattribuuteissa voidaan kommunikoida asiakkaalle ulospäin näkyvinä tuoteominaisuuksina, sisäisinä tuoteresursseina tai tuotteiden toimintaympäristöinä. Ennen kaikkea, tietoturvan erot voidaan kuvata tietoturvauhkien vastakeinoina. Laatuerot voidaan suunnitella tuoteperheen arkkitehtuuriin hyödyntäen ohjelmistoa, laitteistoa tai epäsuoraa laatuvarioituvuutta. Laatuerojen arkkitehtuurisuunnittelu ei yksinään riitä takaamaan haluttuja laatutasoja tuotteisiin, vaan muun varioituvuuden vaikutus laatuun pitää huomioida joko tuoteperheen tai tuotteen tasolla. Tutkimuksen merkitys on seuraava. Emme keskittyneet laatuvarioituuden mallittamiseen, vaan yritimme ymmärtää itse ilmiötä, eli miten laadut varioituvat. Suorituskyky ja tietoturva varioituvat osittain eri tavoin. Esimerkkinä, voi olla vaikeampaa kuvata tietoturvan eroja asiakkaille, mutta tietoturvaerojen suunnittelu ja yksittäisten tuotteiden tuottaminen voi olla suoraviivaisempaa. Tutkimuksemme korosti sekä suunnittelun että asiakkaan ymmärtämistä: sekä tekniset että ei-tekniset näkökulmat tulisi ottaa huomioon tehtäessä päätöksiä siitä, varioidaanko laatua ja miten tästä kerrotaan asiakkaille. Lopuksi, tuotimme empiiriseen näyttöön perustuvaa yleistettävää tietoa teollisista tuoteperheistä.

Avainsanat ohjelmistotuoteperheet, laatuattribuutit, varioituvuus ISBN (painettu) 978-952-60-6413-0 ISBN (pdf) 978-952-60-6414-7 ISSN-L 1799-4934 Julkaisupaikka Helsinki

ISSN (painettu) 1799-4934 Painopaikka Espoo

ISSN (pdf) 1799-4942 Vuosi 2015

Sivumäärä 204

urn http://urn.fi/URN:ISBN:978-952-60-6414-7

Preface

The path to this dissertation has been long and winding. At one point, I was ready to give up. However, what does not kill you makes you stronger and I survived. By making all the mistakes myself, I learned to become a real researcher. I have learned to follow and obey two masters: existing scientific knowledge and empirical data. Only they can set a researcher free. I want to thank everybody who has supported me on this path. First and foremost, I would like to thank both my supervisors. Professor Tomi Männistö started this path with me. He helped me to find my thesis topic, and to understand how fundamental and challenging the idea of quality attribute variability is. Under his supervision, I learned all the tricks and trades of doing research. After Tomi joined University of Helsinki, he was appointed as my instructor. Nevertheless, Tomi’s input in the later stages of this thesis was similar to that of an official supervisor. Professor Marjo Kauppinen was appointed as my supervisor. Marjo supported me when I was writing this dissertation. She excels in helping any author to distill the essential and to clarify the message. With her, I have really learned the necessity of well-defined research questions. In addition, her enthusiasm in all matters is very enlightening. I would also like to thank all my co-authors in the publications. In particular, Mikko Raatikainen has been my colleague through thick and thin, and I have learned many things through his comments and observations. Juha Savolainen has always provided valuable ideas and several possibilities of data collection. Behind the publications, the research involved collaboration with several people and companies. I am grateful for their support. When building the artifact, Timo Asikainen was there with me in the early stages of this research. I was also fortunate to do great collaboration with Nokia Research Center. The data collection for the case

1

Preface

studies would not have been possible without two companies, Nokia and Fathammer. I sincerely thank all my interviewees, representatives of the companies, and fellow researchers that helped me throughout the way. The preliminary examination of this dissertation was done by Professor Emeritus Kai Koskimies and Professor Eduardo Almeida. Their comments and valuable feedback enabled me to improve this dissertation and prepared me for the final defense. This research was financially supported by TEKES (the Finnish Funding Agency for Innovation), ITEA (a EUREKA Cluster programme in the area of Software-intensive Systems and Services), and DIGILE (one Strategic Center for Science, Technology and Innovation in Finland). I am grateful for this support. Last but not least, I would not have succeeded without my family: my husband Ville and my daughters Vivia and Vellamo. When I thought there was no path forward, Ville talked me over, told me to keep pushing and not to give up.

Espoo, September 29, 2015,

Varvana Myllärniemi

2

Contents

Preface

1

Contents

3

List of Publications

5

Author’s Contribution

7

1. Introduction

9

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.2 Research Motivation . . . . . . . . . . . . . . . . . . . . . . .

11

1.3 Research Problem and Questions . . . . . . . . . . . . . . . .

12

1.4 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . .

14

1.5 Structure of the Dissertation . . . . . . . . . . . . . . . . . .

16

2. Research Methods

17

2.1 Overview of the Methods . . . . . . . . . . . . . . . . . . . . .

17

2.2 Systematic Literature Review . . . . . . . . . . . . . . . . . .

18

2.2.1 Research Design . . . . . . . . . . . . . . . . . . . . . .

18

2.2.2 Research Process . . . . . . . . . . . . . . . . . . . . .

20

2.3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.3.1 Case Selection . . . . . . . . . . . . . . . . . . . . . . .

22

2.3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . .

24

2.3.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . .

24

2.4 Design Science . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.4.1 Artifacts and Theory . . . . . . . . . . . . . . . . . . .

26

2.4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.4.3 Research Process . . . . . . . . . . . . . . . . . . . . .

28

2.5 Overview of the Results . . . . . . . . . . . . . . . . . . . . .

29

3

Contents

3. Terminology

31

3.1 Quality Attributes and Software Architectures . . . . . . . .

31

3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

3.3 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

4. Previous Work on Quality Attribute Variability

35

4.1 Explanation for Varying Quality Attributes (RQ1) . . . . . .

36

4.2 Distinguishing Quality Variants (RQ2) . . . . . . . . . . . . .

38

4.3 Designing Variability and Deriving Variants (RQ3,RQ4) . .

40

5. Performance Variability

45

5.1 Explanation for Varying Performance (RQ1) . . . . . . . . . .

45

5.2 Distinguishing Performance Variants (RQ2) . . . . . . . . . .

47

5.3 Designing Performance Variability (RQ3) . . . . . . . . . . .

49

5.4 Deriving Performance Variants (RQ4) . . . . . . . . . . . . .

51

6. Security Variability

55

6.1 Artifacts and Theory . . . . . . . . . . . . . . . . . . . . . . .

55

6.2 Explanation for Varying Security (RQ1) . . . . . . . . . . . .

58

6.3 Distinguishing Security Variants (RQ2) . . . . . . . . . . . .

59

6.4 Designing Security Variability (RQ3) . . . . . . . . . . . . . .

61

6.5 Deriving Security Variants (RQ4) . . . . . . . . . . . . . . . .

62

6.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

7. Discussion

67

7.1 Answers to the Research Questions . . . . . . . . . . . . . . .

67

7.2 Validity and Reliability . . . . . . . . . . . . . . . . . . . . . .

77

8. Conclusions

83

8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

Bibliography

87

Publications

95

4

List of Publications

This thesis consists of an overview and of the following publications which are referred to in the text by their Roman numerals.

I Myllärniemi, Raatikainen, Männistö. A Systematically Conducted Literature Review: Quality Attribute Variability in Software Product Lines. In Software Product Line Conference (SPLC), Brazil, pp.41–45, August 2012.

II Myllärniemi, Savolainen, Raatikainen, Männistö. Performance variability in software product lines: proposing theories from a case study. Accepted for publication in Empirical Software Engineering, 47 pages, Online February 2015.

III Myllärniemi, Raatikainen, Männistö. Inter-organisational Approach in Rapid Software Product Family Development—A Case Study. In International Conference on Software Reuse, ICSR, Italy, pp.73–86, June 2006.

IV Myllärniemi, Raatikainen, Männistö. Representing and Configuring Security Variability in Software Product Lines. In Conference on the Quality of Software Architectures (QoSA), Canada, pp.1–10, May 2015.

V Myllärniemi, Prehofer, Raatikainen, van Gurp, Männistö. Approach for Dynamically Composing Decentralised Service Architectures with Cross-Cutting Constraints. In European Conference on Software Architectures (ECSA), Cypros, pp.180–195, September 2008.

5

List of Publications

6

Author’s Contribution

Publication I: “A Systematically Conducted Literature Review: Quality Attribute Variability in Software Product Lines” Myllärniemi acted as the first author. The second author commented on the manuscript and the third author commented on the review protocol. All other activities were conducted by the first author.

Publication II: “Performance variability in software product lines: proposing theories from a case study” Myllärniemi acted as the first author. The second author participated in the case study data collection. The third and the fourth author commented on the manuscript. All other activities were conducted by the first author.

Publication III: “Inter-organisational Approach in Rapid Software Product Family Development—A Case Study” Myllärniemi acted as the first author. The data collection and analysis were conducted in collaboration with all authors. The second author commented on the manuscript and wrote the methodology section. The third author commented on the manuscript. All other activities were conducted by the first author.

7

Author’s Contribution

Publication IV: “Representing and Configuring Security Variability in Software Product Lines” Myllärniemi acted as the first author. The author built the artifact based on the previous work within the research group. The second and the third author commented on the manuscript. All other activities were conducted by the first author.

Publication V: “Approach for Dynamically Composing Decentralised Service Architectures with Cross-Cutting Constraints” Myllärniemi acted as the first author. The first author built the artifact and wrote the manuscript, but the evaluation was done in collaboration with the other authors. The artifact was based on the previous work within the research group. The third and the fifth author commented on the manuscript. All other activities were conducted by the first author.

8

1. Introduction

1.1

Background

Quality attributes, such as performance, security and availability, play a significant role in satisfying the stakeholders’ needs. Quality attributes can be defined as characteristics that affect an item’s quality (IEEE Std 610.12-1990, 1990). Software architecture, that is, the structures of a software system that include software elements and their relations, is critical to the realization of many quality attributes (Bass et al., 2003). At the same time, many software product companies face the diversity of customer needs. Instead of developing a single product as a compromise of these heterogeneous needs, companies may decide to offer several software products with slightly varying capabilities. Software product lines are an industrially relevant paradigm for efficiently developing such varying products. A software product line (Bosch, 2000; Clements and Northrop, 2001), also known as a software product family, is a set of products, or product variants, that share a managed set of features and a product line architecture (Figure 1.1). The product variants are developed by reusing assets, such as software components and code. Instead of developing products independently or by opportunistically reusing the assets, the development of the product variants and the reuse of the assets take place in a prescribed, preplanned manner (Clements and Northrop, 2001). The act of planning and developing the reusable assets is called domain engineering while the act of developing the product variants based on the plans and assets is called application engineering or derivation. The product line architecture is an important part of software product lines (Clements and Northrop, 2001), since the architecture lays out the plan of how to reuse the assets to achieve the desired features.

9

Introduction

Different customers and needs

y alit qu bility or n f varia sig De ibute r att

Software product line with quality attribute variability

De pro rivat du ion ct v of ari a an t

Product variants with different quality attributes

Product line architecture Quality attribute variability = the ability to create product variants with different quality attributes Figure 1.1. The basic concepts of the research setting. Within this dissertation, quality attribute variability is defined as the ability to create product variants with different quality attributes.

A software product line must be able to handle commonality and variability. Commonality represents those aspects that are shared among the product variants. Having commonality between the products enables reuse and development efficiency. In contrast, variability is the ability of a system to be efficiently extended, changed, customized or configured for use in a particular context (Svahnberg et al., 2005). Thus, variability enables differentiation between the product variants and customization for different customer needs. One of the key challenges in software product line engineering is the efficient management of variability. For this purpose, feature modeling (Kang et al., 1990; Benavides et al., 2010) has become a de facto standard in the research community. A feature can be seen as a system characteristic that is relevant to the users or other stakeholders and is used to capture commonalities or discriminate among product variants (Kang et al., 1990; Czarnecki et al., 2005). In addition to managing features, the product line architecture must be designed to support variability, that is, the ability to create different product variants from the reusable assets.

10

Introduction

1.2

Research Motivation

Variability in software product lines has been a focus of intense research activity in recent years, encompassing all activities in variability management (Chen and Babar, 2011). However, the research has mostly concentrated on the variability of functional product characteristics, and the variability of quality attributes has received less attention. At least from the research point of view, and from the point of view of cases reported in the research, products in a software product line differentiate from each other mostly through their functional capabilities. Quality attributes are kept more or less similar, or at least their variability is not purposeful and explicitly managed. Despite this, different customers and market segments may have different needs regarding quality attributes (Figure 1.1). For example, a weather station targeting the general consumer market has less stringent data reliability requirements than a military weather station (Kuusela and Savolainen, 2000). The differences in the customer quality needs can be resolved in a software product line in two ways. The first alternative is to produce product variants with a common quality level: the product line architecture may be designed to address a common, "the worst case" quality requirement (Hallsteinsen et al., 2006a). The second alternative is to build the software product line to produce products with purposefully different quality attributes (Figure 1.1). In the domain engineering phase, the product line architecture must be designed to support quality attribute variability. Based on the definition of variability by Svahnberg et al. (2005), we define quality attribute variability as the ability to create product variants with different quality attributes. In the application engineering phase, this ability is used to derive product variants that have the quality attributes that meet specific customer needs. However, certain aspects of quality attribute variability make it more challenging than functional variability. Firstly, many quality attributes are continuous (Regnell et al., 2008), which means that different customer needs can be addressed with the same product. As a concrete example, if customer A requires that the response time of a function should be 500 ms, and customer B requires the response time to be 1000 ms, a product with 500 ms response time

11

Introduction

will satisfy both customers. Since the customer value from the quality attributes can also be continuous (Regnell et al., 2008), customer A might be willing to accept a less valuable product with 750 ms or 1000 ms. This is in contrast to functionality, where variants often cannot be ordered or substituted with each other. To summarize, there has to be a good motivation to purposefully vary quality attributes in a software product line. Secondly, quality attributes may be difficult to explicate in a way that the customer understands the differences between the products and is able to select a variant that matches her needs. In many domains, the product quality attributes are described in imprecise and vague terms. Even if there are metrics to characterize a quality attribute, such as uptime percentage for availability, the customer may not be able to understand or relate the measures to her needs. Some quality attributes, such as security, are notoriously difficult to be characterized with simple measures, but their requirements need to be elaborated taking into account several dimensions (Fabian et al., 2010). Therefore, it is worthwhile to know how the quality attribute differences in the product variants can be distinguished to the customers. Thirdly, the product line architecture must be designed to support quality attribute variability (Figure 1.1). Because of the architectural nature of quality attributes (Bass et al., 2003), the design for quality attribute variability may in the worst case affect many assets in the product line architecture. Designing and managing such widespread variability may make the cost of quality attribute variability prohibitive (Hallsteinsen et al., 2006a). Thus, the strategies of designing quality attribute variability in the product line architecture and deriving product variants using this designed ability need to be efficient enough.

1.3

Research Problem and Questions

The research problem in this dissertation can be stated as follows: Research problem: Why and how to vary quality attributes purposefully in a software product line? Thus, this dissertation studies purposeful quality attribute variability in software product lines. In more detail, the research questions are set as follows (Figure 1.2).

12

Introduction

RQ1: Why to vary quality attributes purposefully in a software product line? RQ2: How to distinguish the quality attributes of the product variants to the customers? RQ3: How to design the quality attribute differences in the product line architecture? RQ4: How to derive a product variant with given quality attributes using the product line architecture?

RQ1 is about understanding the reason to purposefully create the product variants with different quality attributes. RQ2 is about characterizing the quality attributes and their differences in a way that they can be communicated to the customer. Thus RQ2 takes an external view to the product variants. In comparison, RQ3 studies the internal, designed ability to create the externally visible quality attribute differences in the product variants. RQ3 focuses on the domain engineering activities and on the product line architecture design. On the application engineering side, RQ4 is about deriving the product variants utilizing the designed ability in the product line architecture. The derivation must ensure that

Different customers and needs RQ1: Why to vary?

RQ2: How to distinguish?

y alit qu bility or n f varia sig De ibute r att

Software product line with quality attribute variability

De pro rivat du ion ct v of ari a an t

Product variants with different quality attributes

RQ4: How to derive?

RQ3: How to design? Product line architecture

Quality attribute variability = the ability to create product variants with different quality attributes

Figure 1.2. The research questions related to the basic concepts of the research setting.

13

Introduction

Table 1.1. How the publications address the research questions. Publications Quality attributes RQ1: Why to vary quality attributes purposefully in a

I All

II

III

IV

V

Performance

Security

X

X

+

+

+

+

+

+

X

+

+

X

+

+

+

+

+

X

X

software product line? RQ2: How to distinguish the quality attributes of the product variants to the customers? RQ3: How to design the quality attribute differences in the product line architecture? RQ4: How to derive a product variant with given quality attributes using the product line architecture? Notation: "X" = publication has been explicitly built to answer the particular research question. "+" = publication provides insight to the research question but as an additional finding to its original research goal.

the required quality attributes are met by the product variants. The research questions are answered in the publications as follows (Table 1.1). All publications have been explicitly built to answer at least one of the research questions. Conversely, all research questions are explicitly answered in at least one publication. Additionally, all publications either directly answer or give some insight into most research questions.

1.4

Research Scope

The scope of this study is limited to software product lines. In a software product line, there has to be a way to tell whether a particular software system is a member of the product line (Weiss, 2008), the commonalities and variabilities that characterize the members of the product line are known (McGregor et al., 2002; Weiss, 2008), there is an underlying product line design that takes advantage of the commonality and variability (Clements and Northrop, 2001; McGregor et al., 2002; Weiss, 2008), and the organization makes a distinction between domain and application engineering (Clements and Northrop, 2001; McGregor et al., 2002). Moreover, this study focuses on purposeful quality attribute variability: the product line has been built to purposefully create product variants with different quality attributes. In general, it is possible that the product variants have different quality attributes but not on purpose. This is because any variability in the product line may cause indirect variation in the quality attributes (Niemelä and Immonen, 2007): for example,

14

Introduction

Table 1.2. How this dissertation is structured to answer the research questions. Chapter in this dissertation

Chapter 4

Chapter 5

Chapter 6

Quality attributes

All

Performance

Security

Research approach

Review

Empirical

Empirical

I, II

II, III

IV, V

Section 4.1

Section 5.1

Section 6.2

Section 4.2

Section 5.2

Section 6.3

Section 5.3

Section 6.4

Section 5.4

Section 6.5

Publications RQ1: Why to vary quality attributes purposefully in a software product line? RQ2: How distinguish the quality attributes of the product variants to customers? RQ3: How to design the quality attribute differences in the product line architecture? RQ4: How to derive a product variant with

Section 4.3

given quality attributes using the product line architecture?

selecting optional functionality may increase the memory footprint of an application (Siegmund et al., 2013). Due to the diversity of quality attributes, it is challenging to propose constructs that are applicable to all quality attributes similarly. Therefore, we decided to focus on two quality attributes: performance (including resource consumption, time behavior, and capacity (ISO/IEC 25010, 2011)) and security. The results of the empirical studies are stated to cover only performance and security, and the generalizability to other quality attributes is left as future work. There were several reasons for this decision. Performance and security are both properties of the product (ISO/IEC 25010, 2011) and measurable at runtime (Bass et al., 2003), but they also have remarkable differences in the way they are specified and improved in the design (Bass et al., 2003). For example, security has a close relationship with functionality, whereas performance is more emergent in the software architecture. Performance also seems to be the most commonly varied quality attribute in the examples and cases presented in the literature (Publication I). The focus on performance and security is also visible in the way the publications address the research questions. Publication I answers the research questions for all quality attributes, Publication II and Publication III answer the research questions for performance, and Publication IV and Publication V answer the research questions for security.

15

Introduction

1.5

Structure of the Dissertation

The rest of this dissertation is organized as follows (Table 1.2). Chapter 2 outlines the research methods used in this dissertation and explains how these methods contribute to different kinds of results. Chapter 3 defines the central concepts needed to understand the research questions and the results. Thereafter, Chapters 4, 5 and 6 lay out the answers to the research questions. Chapter 4 reviews the previous work on quality attribute variability in software product lines. Chapters 5 and 6 describe the results of the empirical studies, the former for performance variability, the latter for security variability. Each research question is addressed in a separate subsection. Chapters 4, 5 and 6 are organized along the research questions. Chapter 7 synthesizes the answers to each research question based on the empirical studies (Chapters 5 and 6): this synthesis combines the results for performance and security. Thereafter, the synthesis is compared with the previous work (Chapter 4). Additionally, Chapter 7 describes the validity and reliability of the results. Finally, Chapter 8 concludes and identifies areas of future work.

16

2. Research Methods

2.1

Overview of the Methods

This dissertation utilized three different research methods. Since different research methods focus attention on different aspects of the phenomenon, a multiple-method research deals with the richness of the real world and helps in different phases of the research (Mingers, 2001). Parallel research design was used (Mingers, 2001): different quality attributes were treated with different methods and the methods were carried out in parallel with the results feeding into each other. The three methods utilized were systematic literature reviews, case studies, and design science (Figure 2.1). Firstly, a systematic literature review (Wohlin and Prikladnicki, 2013; Kitchenham, 2004) focused on all quality attributes (Publication I) and on performance (Publication II). Secondly, two descriptive and explanatory case studies (Yin, 1994; Patton, 1990; Runeson and Höst, 2009) focused on performance variability: Case Nokia (Publication II) and Case Fathammer (Publication III). Thirdly, design science (Hevner et al., 2004; Peffers et al., 2007) focused on security variability (Publication IV, V). We mostly did not utilize triangulation between these three methods, as has been suggested in order to improve validity (Jick, 1979). This was because case studies and design science covered different quality attributes and produced different kinds of results. The only triangulation between different methods was to use data from the systematic literature review in the case study theory building (Figure 2.1).

17

Research Methods

Systematic literature review

All quality attributes

Systematic literature review with manual reading on quality attribute variability (I)

Performance

Security

Systematic literature review with snowballing on quality attribute variability, analysis for performance variability (II)

Is extended to

Is used as a case account in

Descriptive case study on performance variability (III)

Descriptive and explanatory case study and theory building on performance variability (II)

A design theory and artifacts for representing and configuring security variability (IV, V)

Design science

Case study

Is combined with the case account of

Figure 2.1. The overview of the research methods used in this dissertation and their focus on specific quality attributes.

2.2

Systematic Literature Review

Systematic literature reviews (SLRs) start by defining a review protocol, are based on a defined search strategy, document their search strategy so that readers can evaluate rigor and completeness, use explicit inclusion and exclusion criteria to select primary studies, and assess the quality of the primary studies (Kitchenham, 2004). Such practices aim at four objectives (Kitchenham, 2004; Staples and Niazi, 2007). Firstly, one should aim at completeness, that is, all relevant primary studies should be included. The second goal is objectiveness, so that no researcher bias exits. Thirdly, the aim is replicability, so that the method can be repeated with similar results. Finally, validity should be assessable from the outside.

2.2.1

Research Design

At the core of the systematic literature review is the search strategy, that is, the means of identifying a set of potentially relevant primary studies. A search strategy can employ the following: searching publication databases with search strings, backward and forward snowballing, manually reading specific venues, personal contact, and previous knowledge of relevant studies (Kitchenham, 2004; Wohlin and Prikladnicki, 2013; Jorgensen and Shepperd, 2007; Kitchenham et al., 2009). Out of these, searching databases is the prevailing way that dominates the descrip-

18

Research Methods

Table 2.1. The inclusion and exclusion criteria utilized in the selection strategy. Inclusion criteria

Exclusion criteria

The study says explicitly OR uses

The study does not explicitly mention that

an example / case:

the quality attribute variability takes place

There is purposeful variability of

in a software product line / family.

quality attributes in a software

OR

product line.

Quality attribute variability is not part of the

OR

study contribution / results.

Different products in a software

OR

product line have purposefully dif-

The study is not a peer-reviewed publication

ferent quality attributes.

or the contribution is not assessable from the study.

tions of the search protocols. However, backward and forward snowballing (Wohlin and Prikladnicki, 2013) have been recommended as the primary method for finding relevant primary studies in software engineering. The search strategy in this dissertation omitted database searches altogether and utilized manual reading and backward and forward snowballing. It was very difficult to come up with a search string that would firstly yield known relevant primary studies and secondly return a manageable number of potential studies (Publication I). Quality attribute variability is discussed with very heterogeneous terms and is rarely the main focus in the study. Since we decided to use snowballing, the search protocol was not dependent on any specific terms or synonyms utilized to characterize quality attribute variability. Another key element in the systematic literature review is the selection strategy, that is, the means of selecting the relevant studies from the set identified through the search strategy. This involves both the process of screening the studies as well as the inclusion and exclusion criteria applied in screening. The inclusion and exclusion criteria explicate what is relevant within the study scope. The criteria are especially important for search strategies that utilize backward and forward snowballing, since the included studies drive the search process into new directions. The selection strategy in this dissertation did not exclude any primary studies based on the title and abstract only. To increase completeness, all potentially relevant studies were retrieved and the full content was read through for all studies encountered in the search process. Thus, studies that did not mention quality attribute variability in the title or abstract but nevertheless contributed were not excluded. The inclusion and exclusion criteria (Table 2.1) were set to address all quality attributes and

19

Research Methods

All full studies published in Software Product Line Conferences up to 2010 (221)

Manual reading, inclusion and exclusion always based on full content

Backward and forward snowballing, inclusion and exclusion always based on full content

Relevant primary studies (29)

Relevant primary studies (139)

Analysis and synthesis of quality attribute variability; reported in Publication I

Analysis of performance variability, synthesis with the case account to build theories; reported in Publication II

Figure 2.2. The process of conducting the systematic literature review.

to include all studies that address quality attribute variability in their contribution.

2.2.2

Research Process

In the following, we outline how the research was conducted based on the search and selection strategies (Figure 2.2): we manually read conference proceedings and then extended the initial set by backward and forward snowballing. In the first phase, we manually read through 221 studies (Table 2.2), that is, all full studies in the Software Product Line Conference up until 2010. We chose only one venue, the leading conference on this topic, since we wanted to read all content and hence to find all relevant studies. The selection resulted in 29 primary studies; the analysis and synthesis from these primary studies were reported in Publication I. In the second phase, we utilized snowballing to extend the set of relevant primary studies beyond what was published in the Software Product Line Conference. After applying the revised set of inclusion and exclusion criteria to the 29 studies identified through manual reading, 26 primary studies were selected as the initial start set (Wohlin, 2014) for snowballing. Thereafter, we conducted iterative and incremental backward and forward snowballing as outlined in Table 2.2. The order of the iterations followed the guidelines by Wohlin (2014). For backward snowballing, the primary studies in the start set were processed as follows. The reference list of each primary study was pruned

20

Research Methods

Table 2.2. The backward and forward snowballing iterations taken to select the 139 primary studies. Search action

Start set

Candidate for selection

Selected as new

Manual reading

-

221

29

Manual reading: 29 primary studies selected Revised exclusion criteria: 26 primary studies as the start set (Wohlin, 2014) Search action

Start set

Candidate for selection

Selected as new

Backward snowballing

26

92

28

Backward snowballing

28

74

7

Backward snowballing

7

17

1

Backward snowballing

1

-

-

Backward iterations: 36 primary studies selected as new Search action

Start set

Candidate for selection

Selected as new

Forward snowballing

62 (= 26+36)

342

54

Forward snowballing

54

69

9

Forward snowballing

9

1

-

Forward iterations: 63 primary studies selected as new Search action

Start set

Candidate for selection

Selected as new

Backward snowballing

63

155

13

Backward snowballing

13

30

1

Backward snowballing

1

-

-

Backward iterations: 14 primary studies selected as new Search action

Start set

Candidate for selection

Selected as new

Forward snowballing

14

52

1

Forward snowballing

1

-

-

Forward iterations: 1 primary study selected as new Iteration

Start set

Candidate for selection

Selected as new

Backward snowballing

1

-

-

In total: 140 primary studies selected; 1 primary study excluded in analysis

based on the recommendations by Wohlin (2014): by firstly looking at the publication type, and thereafter by looking at the context of the actual reference in the primary study. If an item in the reference list passed both criteria, it was deemed as a candidate for selection. After all reference lists in the start set had been examined, the candidates for selection were recorded, duplicates removed, and new, previously unprocessed studies retrieved. The inclusion and exclusion criteria were then applied, based on the full content, for all retrieved studies. For forward snowballing, the primary studies in the start set were processed as follows. Two citation databases were used: ISI Web of Science and Scopus. The forward citations covered studies published up until February 2013. For each primary study in the start set, the studies that cited it in either database were recorded, duplicates removed, and new,

21

Research Methods

previously unprocessed studies retrieved. The inclusion and exclusion criteria were then applied, based on the full content, for all retrieved studies. As the result from snowballing, 140 primary studies were selected. However, during the detailed analysis of one primary study, it was deemed not to contribute to quality attribute variability. This resulted in 139 selected primary studies. The selected primary studies were analyzed in two stages, in Publication I for all quality attributes and in Publication II for performance. For this dissertation, the analysis from Publication I and Publication II was revised. Also a few important primary studies published after February 2013 were added manually.

2.3

Case Studies

Case studies have emerged as one form of empirical software engineering research (Runeson and Höst, 2009). A case study investigates a contemporary, non-manipulable phenomenon in its real-life context and is characterized by the lack of clear boundaries between the phenomenon and its context (Yin, 1994). Consequently, case studies have more variables of interest than data points and have to rely on multiple sources of evidence (Yin, 1994). Further, case studies are suitable for "how" and "why" types of research questions, that is, for all research questions in this dissertation. Theory plays an important role in the case study methodology (Yin, 1994; Urquhart et al., 2010). In empirical software engineering, two distinct research designs can be identified: theory building (observational path) and theory testing (hypothetical path) (Stol and Fitzgerald, 2013). This study took the theory building approach: we built theories from the empirical data and observations, and theory testing was left as future work.

2.3.1

Case Selection

Two cases were selected: Case Nokia and Case Fathammer (Table 2.3). The case selection combined criterion sampling and convenience sampling (Patton, 1990). For criterion sampling, the case needed to exhibit purposeful quality attribute variability in a software product line. For convenience sampling, both Case Nokia and Case Fathammer were easily

22

Research Methods

Table 2.3. The characteristics of the selected case study cases.

Domain

Case Nokia

Case Fathammer

Telecommunication

Mobile games

infrastructure Company

More than 50.000 employees

Less than 50 employees

Product line

IP-BTS, a customizable and

X-Forge 3D game platform and

configurable base station in 3G

several game titles built on top

Customers

mobile radio access networks.

of it.

Operators

Operators, device manufacturers and game portal users

Customer

Capacity, coverage, support for

Playability, game attractive-

value in

upgrades, reliability, guaran-

ness, gaming satisfaction

teed quality Main

Radio access standards, capac-

Target mobile device hardware

variability

ity, coverage, dimensioning

and software, sales channel customizations

Product line

The product line contains both

The capabilities of the target

characteristics

software and dedicated hard-

mobile devices vary drastically.

ware.

Long-lived investment

No evolution of product vari-

products, variability manage-

ants, light-weight variability

ment focusing on reconfigura-

handling.

tion. Product line

The product line was designed

At the time of data collec-

status

and evaluated, but was can-

tion: 80 licensees of X-Forge,

celed due to market reasons

15 game titles shipped. Several

before

years later, the case company

the

production

was

started; data collection took

was acquired and merged.

place approximately ten years from this.

accessible due to existing personal connections and there were no confidentiality issues to collect the data and publish the results. The units of analysis were set as follows. For Case Nokia, the unit of analysis was capacity variability in a software product line. For Case Fathammer, the unit of analysis was set to cover the software product line and its variability management practices. The product line in Case Nokia was discontinued before it reached production (Table 2.3), but the reasons were not related to the unit of analysis. This post mortem nature decreased the study validity (Section 7.2), but also enabled us to access confidential documentation. Hence, selecting another case to improve validity would have decreased the amount of available data.

23

Research Methods

Table 2.4. The data collection in the cases. Case Nokia

Approximately 300 pages of internal documents, including the product line software architecture document, detailed subsystem architecture document, and a product specification document First-hand experience of the second author, informal discussions recorded through notes Publicly available information about the domain, e.g., by Holma and Toskala (2000) Two validation sessions, where the product line chief architects reviewed the case account Written clarifying questions exchanged with e-mail and answered by the chief architects

Case

A 3-hour long joint interview with the process manager and deriva-

Fathammer

tion manager; recorded and transcribed Public and non-public documentation One validation session with the interviewees, where clarifying questions and uncertain issues were resolved

2.3.2

Data Collection

Several sources of data were used (Table 2.4). For Case Nokia, there were two main data sources: internal technical documents about the product line and observational first-hand experience of one co-author who had participated in the product line architecture evaluation. These data sources were triangulated with two validation sessions and personal contact with the chief architects who had designed the case product line. The validation also enabled us to collect further data. In addition, publicly available information about the domain was used to augment the internal documents. For Case Fathammer, the main data collection method was a 3-hour long interview with two product line managers. The interview questions were modified from an existing research framework. The interview results were triangulated against public and internal documents about the product line: we compared that the findings were similar and added further information from the documents to the analysis. The results were later validated with the interviewees.

2.3.3

Data Analysis

For Case Nokia, we adopted some of the analysis principles from the grounded theory (Urquhart et al., 2010; Strauss and Corbin, 1998). In par-

24

Research Methods

ticular, we built a theory from the data without any existing hypotheses, utilized data comparison to identify and saturate concepts and relations, and used additional slices of data to guide the analysis (Urquhart et al., 2010). However, the suggested practices of coding (Strauss and Corbin, 1998) were not adopted to the full extent. It was more appropriate for the data to use light-weight open coding and analyze the relations between the concepts separately. The threats to validity stemming from this are discussed in Section 7.2. In the first phase, the analysis was carried out to describe capacity variability in a base station product line, that is, to build a case account without generalization beyond the case domain. This analysis utilized lightweight coding, constant comparison between the data sources, and adding new data samples from the validation with the chief architects. The resulting concepts and relations were kept relatively low-level (Urquhart et al., 2010), that is, describing only the case domain. In the second phase, the analysis was carried out to describe performance variability in software product lines, that is, to generalize the results beyond the case domain and to build theoretical models. For this purpose, 139 primary studies from the systematic literature review (Section 2.2) served as new data. This analysis involved identifying case accounts and examples of performance variability from the primary studies through light-weight coding. We also re-visited the studies to saturate emerging concepts and explanations in the theory building. As the result, three theoretical models were proposed (Publication II): two theoretical models to explain (Gregor, 2006) and one theoretical model to describe (Gregor, 2006) the phenomenon. We proposed these models to describe the results using domain-independent, explicitly defined concepts and to enable analytical generalization from the case study. For Case Fathammer, we followed the principles of the grounded theory approach using deductive coding of data (Strauss and Corbin, 1998) and utilized data comparison to identify and saturate concepts and relations (Urquhart et al., 2010). However, the resulting descriptions were relatively case-specific and low-level (Urquhart et al., 2010). Thus, the theory building part of grounded theory was omitted (Urquhart et al., 2010). Later on, Case Fathammer was used to identify new categories and saturate existing ones in the proposed theoretical models in Publication II.

25

Business needs

Generalize

Concrete Artifacts Configurator tool, configuration models and tasks

Assess

Refine

Evaluation Case Magento and Case Shopping Mall Descriptive evaluation No stakeholders involved, not in the real environment

Application in the environment

Knowledge Base

Environment

Instantiate

Kumbang: a modelling conceptualization and tool for functional variability Answer set programming and its use on knowledge-based configuration Characteristics and standards on software security

Design Theory Modelling conceptualization and principles

Applicable knowledge

Configurable software product lines with functional and security variability Example industrial environment: Magento

Research Methods

Additions to the knowledge base

Figure 2.3. The research design within this dissertation, following the design science framework (Hevner et al., 2004) and the distinction between theory and artifacts (Gregor and Jones, 2007).

2.4

Design Science

Design science as a research paradigm seeks to create artifacts that extend the human and organizational capabilities in managing, using or developing information systems (Hevner et al., 2004). The relevance in design science is ensured by focusing on solving business needs that are either heretofore unsolved or solving them in a more effective way, and by evaluating the utility of the artifact against the needs of the application environment (Hevner et al., 2004). The rigor in design science is ensured by utilizing existing scientific knowledge when constructing and evaluating the artifacts (Hevner et al., 2004) and by building design theories about the artifacts (Gregor and Jones, 2007). Design science was chosen since we wanted to both understand the phenomenon and build concrete solutions to the problem. In addition, we were able to build on and further develop existing artifacts and principles, which enabled a wider scope of the results.

2.4.1

Artifacts and Theory

At the core of design science is the artifact: the artifact can be a construct, a model, a method, or an instantiation, such as a prototype tool (Hevner et al., 2004). Further, a distinction is made between the concrete

26

Research Methods

Table 2.5. The characteristics of the cases used for evaluating the theory and concrete artifacts.

Product line

Case Magento

Case Shopping Mall

Magento, a configurable and

A customizable service portal

extensible web shop product

within a shopping mall; in par-

line; is available as enterprise,

ticular, a service to search for

hosted and open-source ver-

offers and information

sions Market

A

commercially

successful

position

framework and an ecosystem

Not an industrial product line, but an example envisioned by Nokia Research Center

Product line

Vast amount of varying web

Users have varying preferences

characteristics

shop functionality, only small

on authentication. The number

amount of security variability.

and capabilities of the partici-

The web shop owners and ad-

pating services may vary.

ministrators can configure this variability through web-based Admin Panel.

artifact and the theory about the artifact (Gregor and Jones, 2007): the design theory lays out the constructs, relations, and scope of the design knowledge codified in and needed to instantiate the concrete artifact. Within this study, a design theory and concrete artifacts were built to represent and configure software product lines with functional and security variability (Figure 2.3). To enable representation and configuration, several concrete artifacts were constructed: a configurator tool, and casespecific configuration models and configurations. The generalized knowledge to build these artifacts was captured in the design theory, in particular, as a modeling conceptualization and as the principles of building and using a configurator operating on stable model semantics. Thus, the concrete artifacts acted as expository instantiations of the theory (Gregor and Jones, 2007).

2.4.2

Evaluation

The evaluation of the theory and artifacts utilized two cases, Case Magento and Case Shopping Mall (Table 2.5). Case Magento was based on a commercially successful configurable web shop, Magento. Magento is a variability-rich, highly configurable product line and ecosystem. Most of the explicitly configurable variability is about functionality: only a small part of the web shop security can be varied. The data needed to instantiate the artifacts for Case Magento consisted of the following: the source

27

Research Methods

code, official documentation, and the demonstration version of the Magento Admin Panel. In contrast, Case Shopping Mall was a case example, that is, it was not a real-life software product line. Case Shopping Mall originated from Nokia Research Center. The particular case example was inspired by the search service described by van Gurp et al. (2008). The criteria for the evaluation were set as three testable propositions (Gregor and Jones, 2007) that characterized the effect of the concrete artifacts. The testable propositions followed the form "if you want to achieve Y in situation Z, the use of X helps" (Gregor and Jones, 2007). With the testable propositions, the evaluation was conducted at two levels. Firstly, the feasibility of the instantiation was used. That is, the design theory was instantiated as concrete artifacts that included the KumbangSec configurator and the configuration models and tasks for Case Magento and Case Shopping Mall. A realistic instantiation tests the potential problems in the generalized design and tests that the artifact is worth considering (Gregor and Jones, 2007). Further, the instantiation for Case Magento represented a "slice of life", which as such "is most likely to be convincing" (Shaw, 2003). Secondly, a descriptive comparison to the current state was used, in particular, by comparing with the current state of practice in Case Magento. That is, the design theory and artifacts were evaluated descriptively using an informed argument (Hevner et al., 2004). The feasibility of the instantiation and the comparison to the current state were both evaluated qualitatively. No quantitative evaluation was conducted, for example, in the form of systematic performance testing, mainly because it was deemed less suitable for assessing the testable propositions. In addition, the instantiation and the evaluation were conducted by the researchers: therefore, the concrete artifacts were not tried out in the real application environment nor evaluated with the real stakeholders.

2.4.3

Research Process

The research proceeded as suggested by Peffers et al. (2007). Firstly, the problem and the motivation were identified in collaboration with Nokia Research Center. This collaboration also served as a way to define the objectives for a solution iteratively. In the design theory, the objectives were captured as testable propositions.

28

Research Methods

Performance Case studies Chapter 5

Security Design Science Chapter 6

A synthesis of the literature

A theoretical model to explain

Explanations of Case Magento and Case Shopping Mall

RQ1

All quality attributes SLR Chapter 4

Performance and security Synthesis and comparison Chapter 7

A synthesized model to explain A comparison

RQ2

A synthesis of the literature

Descriptions of Case Nokia and Case Fathammer

A description based on the artifacts and the design theory A synthesized description A comparison

A theoretical model to classify

RQ3

A synthesis of the literature

A description based on the artifacts and the design theory A synthesized model to classify A comparison

RQ4

A synthesis of the literature

Descriptions of Case Nokia and Case Fathammer

A description based on the artifacts and the design theory A synthesized description A comparison

Figure 2.4. The overview of the research results in this dissertation.

Secondly, the design and development took place iteratively. The initial version of the modeling conceptualization and the configurator tool were constructed and applied to the initial case examples. Later on, the modeling conceptualization was simplified and applied to Case Shopping Mall. Finally, the design theory was formulated and finalized while applying them to Case Magento. The demonstration of the artifact took place by instantiating the artifacts for the case examples; this was iterated with the design and development. The evaluation utilized the testable propositions of the design theory. Finally, the communication took place via a number of research publications.

2.5

Overview of the Results

Given three different methods, there were three different types of results in this dissertation. The different results from the different methods are illustrated as the first three columns in Figure 2.4. Firstly, the systematic literature review produced an analysis and syn-

29

Research Methods

thesis of the previous work (Chapter 4). The results were organized along the research questions and the answers were given to address all quality attributes in general. Secondly, the case studies produced descriptions and explanations for performance variability (Chapter 5). For RQ1 and RQ3, the results were given as theoretical models for explaining and classifying the phenomenon (Gregor, 2006). For RQ2 and RQ4, the results were given as descriptions of the case accounts. Thirdly, the design science produced a design theory and artifacts for security variability (Chapter 6). For RQ2, RQ3 and RQ4, the results were given as descriptions based on the design theory and the artifacts. For RQ1, the results were given as descriptions based on the cases, since the artifacts did not explicitly address this research question. Last, the results of the case studies and the design science were synthesized, that is, we synthesized the results over performance and security (last column in Figure 2.4). The synthesis was conducted qualitatively, by identifying dimensions that revealed the differences and similarities between the results, and by identifying hierarchies and relations between the concepts across the results. Thus, the synthesis was conducted as a form of cross-case analysis (Huberman and Miles, 1994). The results of the synthesis included one model to explain, one model to classify, and two synthesized descriptions given as tables. Finally, the synthesis was compared with the previous work (Chapter 7).

30

3. Terminology

The following defines the central concepts upon which the results are built.

3.1

Quality Attributes and Software Architectures

Quality attributes are defined as characteristics that affect an item’s quality (IEEE Std 610.12-1990, 1990). Quality is the degree to which the system satisfies the stated and implied needs of its various stakeholders and thus provides value (ISO/IEC 25010, 2011). Quality attributes are often defined via attribute taxonomies (ISO/IEC 9126-1, 2001; ISO/IEC 25010, 2011; Boehm et al., 1978; McCall et al., 1977) and then defining the constituent subattributes in more concrete terms or with concrete measures. To complicate matters, more or less a similar concept has been called with a multitude of terms: as quality attributes (ISO/IEC 9126-1, 2001; IEEE Std 610.12-1990, 1990; Bass et al., 2003), quality characteristics (ISO/IEC 25010, 2011; ISO/IEC 9126-1, 2001), quality factors (IEEE Std 1061-1998, 1998), and quality properties (ISO/IEC 25010, 2011; Rozanski and Woods, 2011). Within this dissertation, we use the term "quality attribute". A closely related concept is a quality requirement, also known as a nonfunctional requirement (Mylopoulos et al., 1992; Berntsson Svensson et al., 2012). A quality requirement can be defined as a requirement that a quality attribute is present in software (IEEE Std 1061-1998, 1998). There are different kinds of quality attributes. Some quality attributes manifest themselves in the product, whereas some attributes manifest themselves in the interaction when the product is used (ISO/IEC 25010, 2011). Product quality attributes can be divided into those that are observable or measurable at runtime and to those that are not (Bass et al., 1998, 2003; ISO/IEC 25010, 2011). Examples of the former include perfor-

31

Terminology

mance and security, whereas examples of the latter include modifiability and testability. Some consider business qualities, such as time to market and cost (Bass et al., 2003), to be part of quality attributes as well. Software architecture has been defined as the fundamental organization of a system that is embodied in the system elements, relationships, and in the principles guiding the system design and evolution (ISO/IEC 42010, 2011). That is, software architecture is about different software system structures (Bass et al., 2003; Rozanski and Woods, 2011). Software architecture can also be seen as the set of early design decisions people perceive as hard to change (Fowler, 2003; Jansen and Bosch, 2005). Software architectures are created solely to meet the concerns of their stakeholders and to balance any conflicts in an acceptable way (Rozanski and Woods, 2011). Many quality attributes are architectural, meaning that the software architecture is critical to their realization (Bass et al., 2003): a significant part of the quality attributes are determined by the choices done during the architecture design. To improve quality attributes, design tactics and patterns encapsulate reusable design strategies and solutions (Bass et al., 2003; Rozanski and Woods, 2011). However, many design decisions improve one quality attribute at the expense of another: for example, most of the availability tactics (Bass et al., 2003) increase the overhead and complexity and thus may decrease performance. Such situations are called trade-offs and they are usually resolved by finding a global, multi-attribute optimum (Barbacci et al., 1995).

3.2

Performance

Performance is considered as one of the most important quality attributes in the industry (Berntsson Svensson et al., 2012). Performance is defined as the degree to which a system or component accomplishes its designated functions within given constraints, such as speed, accuracy, or memory usage (IEEE Std 610.12-1990, 1990). Performance is relative to the amount of hardware or software resources used to meet those constraints (ISO/IEC 25010, 2011). Performance is one of the quality attributes to which it is relatively straightforward to associate quantitative measures: examples include response time, throughput and jitter (Barbacci et al., 1995). Performance is divided into subattributes of time behavior, resource uti-

32

Terminology

lization and capacity (ISO/IEC 25010, 2011). Time behavior is either the latency of responding to an event or the throughput of processing events in a given time interval (Bass et al., 2003; Barbacci et al., 1995). Resource utilization refers to the amount of resources the system uses to perform its functions (ISO/IEC 25010, 2011); typical resources include both static and dynamic memory. Capacity means the degree to which the maximum limits of a product or system parameter meet requirements (ISO/IEC 25010, 2011). As a concrete example, capacity can be defined as the maximum achievable throughput without violating the latency requirements (Barbacci et al., 1995). The software architecture is critical to the realization of performance (Bass et al., 2003; Smith and Williams, 2002). Performance is affected by many aspects in the architecture: the type and amount of communication among components, the functionality that has been allocated to the components, and the allocation of the shared resources (Bass et al., 2003). Hence, performance is an architectural, emergent quality attribute. Several design tactics and patterns have been proposed to improve performance (Bass et al., 2003; Smith and Williams, 2002; Rozanski and Woods, 2011): they often involve decreasing resource demand or increasing or parallelizing resources (Bass et al., 2003).

3.3

Security

Security is the capability of the software system to protect information and data so that unauthorized persons or systems cannot read or modify them and authorized persons or systems are not denied access to them (ISO/IEC 9126-1, 2001). Security is the composite of confidentiality, integrity and availability: confidentiality is the absence of unauthorized information disclosure, integrity is the absence of unauthorized system alternations, and availability is the readiness for correct service for authorized actions (Avizienis et al., 2004). Security can also be characterized through assets, threats, countermeasures, and vulnerabilities (ISO/IEC 15408-1, 1999; Firesmith, 2004). The core of security is protecting assets from threats: assets are sensitive information or resources (ISO/IEC 15408-1, 1999). A threat exploits vulnerabilities in the system and materializes as an attack (ISO/IEC 15408-1, 1999). Countermeasures are introduced in the software to reduce vulnerabilities (ISO/IEC 15408-1, 1999). Countermeasures are also termed security mechanisms (Firesmith,

33

Terminology

2004), security controls (Fabian et al., 2010), or security use cases (Sindre and Opdahl, 2005). Given the complexity of the software security definition, it is not easy to characterize or measure security with a single quantitative metric. This poses a challenge for security requirements elicitation and definition (Fabian et al., 2010). Some of the best-known methods for defining security requirements include misuse cases (Alexander, 2003; Sindre and Opdahl, 2005), Common Criteria (ISO/IEC 15408-1, 1999) and quality factors (Firesmith, 2004, 2005). Countermeasures, such as authentication, authorization or obfuscation, are the link between the requirements and the design. This is because countermeasures are responsible for fulfilling the security requirements (Firesmith, 2004). A countermeasure is a technique that meets or opposes a threat, vulnerability, or an attack by eliminating or preventing it, by minimizing the harm, or by discovering and reporting it (IETF RFC 4949, 2007). Countermeasures can be defined at different levels of abstraction (Fabian et al., 2010): for example, the requirements on preventing, detecting or reacting to security incidents (Firesmith, 2005) are requirementlevel countermeasures. At the design level, countermeasures correspond to security tactics and patterns: concrete examples include Authenticator or Defense in Depth (Hafiz et al., 2007).

34

4. Previous Work on Quality Attribute Variability

In this section, a synthesis of the previous work on quality attribute variability is given based on the systematic literature review. Based on the review results, three observations can be made. Firstly, although a large number of primary studies (139) was selected into the review, most of the studies had quality attribute variability only as a minor part of their contribution. Nevertheless, the topic has received research attention: there are even other literature reviews that address the topic as a major (Etxeberria et al., 2007) or minor (Montagud and Abrahão, 2009; Asadi et al., 2012) contribution. Secondly, most of the selected studies discuss quality attributes in general: a typical case is to propose a method or a construct that is implied to be applicable to all quality attributes. Yet, it is unknown whether a blanket solution can cover all quality attributes equally well (Berntsson Svensson et al., 2012). Only a handful of studies focus on specific quality attributes, for example, on security variability (Mellado et al., 2008; Fægri and Hallsteinsen, 2006; Wang et al., 2006) or on performance variability (Siegmund et al., 2013; Tawhid and Petriu, 2011; Street and Gomaa, 2006). Thirdly, industrially relevant yet rigorously obtained empirical evidence on quality attribute variability is lacking. There are studies that describe quality attribute variability within its real-life industrial context (Kishi et al., 2002; Sinnema et al., 2006; Niemelä et al., 2004; Hallsteinsen et al., 2006a). However, such studies mostly do not describe data collection, data analysis or validity threats. There are also studies that utilize an example of varying quality attributes and mention or imply an industrial product line (Lee and Kang, 2010; Siegmund et al., 2012b; Jarzabek et al., 2006; Tun et al., 2009; Kuusela and Savolainen, 2000; White et al., 2009). Yet, it is not clear if these studies are merely statements backed up by exem-

35

Previous Work on Quality Attribute Variability

plary experience (Fettke et al., 2010), slices of real life (Shaw, 2002), or examples influenced by industrial software product lines. In the following, we summarize and synthesize the current body of knowledge for each research question.

4.1

Explanation for Varying Quality Attributes (RQ1)

It is relatively challenging to analyze the explanations for the decision to vary quality attributes purposefully (RQ1). For many studies, it was difficult to distinguish whether they were addressing purposeful or unintended quality attribute variability. Unintended quality attribute variability may be caused by indirect variation (Niemelä and Immonen, 2007). As a concrete example, Etxeberria and Sagardui (2008) describe an arcade game product line in which the refresh time should always stay under 100 ms, yet other variability causes the actual refresh time to unintendedly vary within that limit. Moreover, the studies rarely analyze the reason to vary explicitly. The need to vary quality attributes may be just taken without explanations (Sincero et al., 2007). Some studies justify purposeful quality attribute differences through examples, anecdotes and small case studies. For example, Halmans and Pohl (2003) give an example of availability variation that is due to different geographical locations, while Siegmund et al. (2012b) justify the need to produce database systems for embedded, real-time and mobile devices as the reason to vary resource consumption and performance. However, there are studies that discuss the explanations in more general terms. Lee and Kang (2010) and Lee et al. (2014) describe how variability in the usage context causes variability in quality attributes: usage contexts include the user, physical, social and business contexts as well as the product operating environment. An example of the user context is the peak service rush hour causing varying minimum waiting time (Lee and Kang, 2010). An example of the business context involves different customer segments and their price points (Lee and Kang, 2010). Niemelä and Immonen (2007) justify the varying quality requirements through different business domains. For example, high availability and recovery are required in the emergency domain, whereas in the entertainment domain, the service availability only needs to be of medium rate (Niemelä and Immonen, 2007). Further, even within the same domain, the standards and regulations that constrain quality attributes vary between geographical

36

Previous Work on Quality Attribute Variability

Explanation for the decision to purposefully vary quality attributes

Legend Explanation

Differences in the user and customer needs

Differences in the hardware or resources that affect or constrain

Is-a

Figure 4.1. Explanations for the decision to purposefully vary quality attributes in the primary studies (slightly modified from Publication I).

areas (Niemelä and Immonen, 2007). We identified two different kinds of explanations for varying quality attributes (Figure 4.1). Firstly, there may be differences in the user or customer needs. These reasons may stem from the user, business and social contexts (Lee and Kang, 2010). These reasons may include geographical market segments (Kang et al., 2002), different types of usage (Kuusela and Savolainen, 2000), or different service domains (Niemelä and Immonen, 2007). Also, the customer needs may change over time (Ishida, 2007). Secondly, there may be differences in the hardware or resources that affect or constrain the product quality attributes. Differences in the mobile device hardware (White et al., 2007), embedded system capabilities (White et al., 2009; Tun et al., 2009; Siegmund et al., 2012b), or network capacity and battery power (Hallsteinsen et al., 2006b) may cause the need to adapt quality attributes, and in particular, memory consumption and time behavior. The explanations in Figure 4.1 are also used to justify performance and security variability. For performance, and in particular, for resource consumption, the differences in the hardware or resources are the prevailing way to explain the need to vary (White et al., 2007; Tun et al., 2009; Sinnema et al., 2006; Siegmund et al., 2012b). Surprisingly, it is rare that performance variability is linked directly to the user or customer needs: examples include different customer needs in regard to data volumes (Ishida, 2007) or the resulting price for the better performing variant (Bartholdt et al., 2009). For security, different user and customer needs are mentioned as the reason to vary, for example, as differences in the legislation and users’ privacy preferences (Wang et al., 2006; Hendrickson et al., 2009). Additionally, security variability is often motivated by the need to balance trade-offs between security and other quality attributes, for example, to vary encryption to mitigate the increase in the response time (Bartholdt et al., 2009; Myllärniemi et al., 2006).

37

Previous Work on Quality Attribute Variability

Table 4.1. Identified approaches in the previous work for modeling customer-relevant quality attribute variability (extended from Publication I). Proposed

Quality attribute feature

modeling

Quality attribute softgoal

concept

Quality attribute requirement Quality attribute information attached to other features

Quality

Soft: qualitative with a measure on the nominal or ordinal scale,

attributes

may not have clear-cut satisfaction criteria

modeled as

Hard: quantitative with a measure on the interval or ratio scale, measurable and verifiable

Differences

Different levels of the same quality attribute

modeled as

Different quality attributes

Finally, Niemelä and Immonen (2007) argue that quality attributes that are not visible at runtime are less prone to be purposefully varied, since typically only the internal stakeholders have interest in them. In contrast, the needs for runtime-observable quality attributes, such as security and performance, are usually different for each product variant (Niemelä and Immonen, 2007).

4.2

Distinguishing Quality Variants (RQ2)

Distinguishing the quality attributes of the product variants to the customers (RQ2) is addressed in the primary studies mostly from the modeling point of view. This is because variability representation is one part of variability management (Chen and Babar, 2011), which in turn is necessary to understand and communicate the differences between the products. When communicating the variability to the customer, one should focus on the customer-relevant aspects of variability, that is, on essential variability, and not on technical variability (Halmans and Pohl, 2003). In the primary studies, customer-relevant quality attribute variability is modeled in several ways: as features, as softgoals, as requirements, or by attaching quality attribute information to other features (Table 4.1). Firstly, varying quality attributes can be represented as features. In order to be meaningful to the customer, quality attributes should be represented as problem domain features (Lee et al., 2014). Similarly to traditional features (Benavides et al., 2010), the variability of quality attribute features can involve mandatory, optional or alternative relations. For example, there could be an optional quality attribute feature Minimum waiting time (Lee and Kang, 2010) or alternative quality attribute fea-

38

Previous Work on Quality Attribute Variability

tures Low, medium, high usability (Etxeberria and Sagardui, 2008). Secondly, varying quality attributes can be represented as softgoals. Instead of capturing "what the system does", softgoals capture intentionality, that is, "why the system does it" (Gonzales-Baixauli et al., 2004). Therefore, softgoals have been argued to be suitable for selecting the product variants (Laguna and González-Baixauli, 2008). Softgoals represent requirements that do not have clear-cut satisfaction criteria. Instead, the softgoals are satisfied when there is sufficient positive evidence and little negative evidence (Mylopoulos et al., 2001). Because of this, softgoals are often used for quality attributes that have no quantifiable measures, such as security and usability (Gonzales-Baixauli et al., 2004; Yu et al., 2008): an example includes User authenticity. Thirdly, varying quality attributes can be represented as requirements (González-Huerta et al., 2012; Kuusela and Savolainen, 2000). In particular, Mellado et al. (2008, 2010) define varying security requirements through the Common Criteria (ISO/IEC 15408-1, 1999) concepts: varying security requirements are distinguished through threats, assets, countermeasures, and objectives. Finally, quality attribute variability can also be modeled by attaching information about quality attributes into other features (Table 4.1). A popular way is to simply attach some quantitative or qualitative information about the impact of the feature on the overall quality attributes (Siegmund et al., 2012b; White et al., 2009; Bagheri et al., 2010). For example, feature Diagnostics increases the binary footprint by 191 KB and improves reliability (Siegmund et al., 2012b). Attaching feature impacts is used especially for resource consumption, perhaps since resource consumption is one of the feature-wise quantifiable attributes (Siegmund et al., 2012b). To compare the approaches in Table 4.1, varying quality attribute features, softgoals and requirements are typically closer to essential variability. As a drawback, it must be known how High usability affects the product line solution-space features, such as capabilities and design decisions (Lee et al., 2014; Asadi et al., 2011). For this purpose, qualitative contributions are often used: for example, a solution-space feature may strongly support or hurt a quality attribute feature (Lee et al., 2014). Within the models, quality attributes can be treated either as soft or hard (Table 4.1). Soft quality attributes have no clear-cut satisfaction criteria and only impose restrictions on how behavioral requirements should

39

Previous Work on Quality Attribute Variability

be met (Jarzabek et al., 2006). Such soft quality attributes are typically described qualitatively (Siegmund et al., 2012b) with a measure on the ordinal scale (Stevens, 1946), for example, as "high security". In contrast, product line quality attributes can be also quantifiable (Siegmund et al., 2012b): they can be characterized on the interval or ratio scale (Stevens, 1946) and be measured unambiguously, for example, as "latency less than 100 ms". The primary studies typically treat security as a soft attribute whereas performance is treated as a hard attribute. Moreover, the differences between quality attributes can be modeled in two ways (Table 4.1): either as different levels of the same quality attribute or as different quality attributes (Niemelä and Immonen, 2007). To exemplify the former, the length of encryption can be 128 or 256 (Sun et al., 2009), the response time can be less than 5, 15 or 30 seconds (Gimenes et al., 2008), and the service recovery rate may be low, medium or high (Niemelä and Immonen, 2007). To exemplify the latter, one product may require monitoring and controlling functionality to ensure availability, whereas another product does not (Niemelä and Immonen, 2007).

4.3

Designing Variability and Deriving Variants (RQ3,RQ4)

From the viewpoint of the product line architecture, the design of quality attribute variability (RQ3) and the derivation of product variants (RQ4) are rarely covered in the previous studies explicitly. Instead, both aspects are mostly addressed in conjunction or through feature models. This is because features can be used to represent, besides problem domain entities, also solution domain entities even to the level of design decisions and technologies (Lee et al., 2014; Jarzabek et al., 2006; Kang et al., 2002). Therefore, many studies on feature models and quality attribute variability actually describe design. As a concrete example, Linux Ubuntu packages, that is, architectural entities, are modeled as varying features with a certain impact on memory footprint (Quinton et al., 2012). As another example, redundancy controls active and standby, that is, design tactics for availability, are modeled as alternative implementation technique features (Kang et al., 2002). Additionally, there are studies that focus on the product line architecture design activities (Kishi and Noda, 2000; Kishi et al., 2002). Also the link from the feature models to the product line architecture has been studied (Kang et al., 1990, 1998, 2002). The architectural, emergent nature of quality attributes makes design

40

Previous Work on Quality Attribute Variability

and derivation more challenging. It has been argued that quality attributes cannot be directly derived, that is, derived by simply selecting single features that represent product quality attributes in the feature models (Sincero et al., 2010). This is because quality attributes are the result of and impacted by many functional features (Sincero et al., 2010). In other words, quality attributes are indirectly affected by other variability in the software product line (Niemelä and Immonen, 2007). This impact of other variability must be taken into account in the design and derivation approach. To address the impact of other variability, most studies use model-based, externalized approaches. In particular, the impact of other variability is explicitly represented in the feature models as feature impacts. A feature impact characterizes how a particular feature contributes to a specific quality attribute: for example, selecting feature Credit Card adds 50 ms to the overall response time (Soltani et al., 2012). There can be two kinds of feature impacts that depend on the nature of the specific quality attribute. Firstly, there can be qualitative feature impacts, for example, feature Verification improves reliability (Siegmund et al., 2012b); these can be used as a guideline during the product derivation. Secondly, there can be quantitative feature impacts that can be either directly measured or inferred from other measurable properties, for example, one can compute to which extent a feature influences the memory footprint of an application (Siegmund et al., 2012b). However, there are also quality attributes that are quantifiable but not measurable per feature: for example, it has been claimed that response time can only be measured per product variant (Siegmund et al., 2012b). To complicate matters further, the impact of one feature may depend on the presence of other features (Siegmund et al., 2013, 2012a; Sincero et al., 2010; Etxeberria and Sagardui, 2008). Thus, the derivation must take into account feature interactions. The features in a software product line are not independent of each other, but their combinations may have unexpected effect on quality attributes compared with having them in isolation. For example, when both features Replication and Cryptography are selected, the overall memory footprint is 32KB higher than the sum of the footprint of each feature when used separately (Siegmund et al., 2012b). Feature interactions may occur when the same code unit participates in implementing multiple features, when a certain combination of features requires additional code, or when two features share the same

41

Previous Work on Quality Attribute Variability

resource (Siegmund et al., 2012b, 2013). An approach to approximate and measure feature interactions has been proposed and evaluated for memory footprint, main memory consumption, and time behavior (Siegmund et al., 2012a, 2013). Managing feature impacts and interactions may require explicit representation and dedicated tool support, as manifested by the Intrada product line derivation (Sinnema et al., 2006). It may be possible to manage the feature impacts manually by trying to codify tacit knowledge into heuristics or by comparing with predefined reference configurations (Sinnema et al., 2006). However, when Intrada needed to create a highperformance product variant, the complications of the manual impact management caused the derivation to take up to several months, compared with only a few hours (Sinnema et al., 2006). Even when supported with a tool that was able to evaluate the performance of a given configuration, only a few experts were capable of conducting a directed optimization towards the high-performance configuration (Sinnema et al., 2006). Based on this, we identified two distinct strategies for both designing quality attribute variability and deriving it from the design. Firstly, quality attribute variability can be designed and derived through design tactics and patterns, that is, by varying architectural patterns (Hallsteinsen et al., 2003; Matinlassi, 2005) and tactics (Kishi and Noda, 2000; Kishi et al., 2002). This approach is used both for performance (Kishi and Noda, 2000; Kishi et al., 2002; Ishida, 2007) and for security (Fægri and Hallsteinsen, 2006). The challenge with varying tactics and patterns is that they may crosscut the architecture (Hallsteinsen et al., 2006a), and thus may be costly to implement, test, manage and derive. Also the impact of other variability may need to be managed separately. Secondly, quality attribute variability can be designed and derived by using indirect variation only. We have termed this as indirect variation strategy. Instead of using explicit design mechanisms, this approach primarily relies on other variability in the product line to create the required quality attribute differences. That is, indirect variation (Niemelä and Immonen, 2007) is used as a way to purposefully vary quality attributes. For example, a variant with different memory footprint is derived by leaving features Replication and Cryptography out from the product (Siegmund et al., 2012b). Indirect variation is especially used for resource consumption (Tun et al., 2009; White et al., 2007; Siegmund et al., 2013), but other quality attributes are covered as well (Siegmund et al., 2012b).

42

Previous Work on Quality Attribute Variability

To create desired products with the indirect variation strategy, one must estimate and represent the impacts and interactions from other variability. Most of the studies use feature models as the basis. During derivation, one needs to aggregate single feature impacts onto overall product quality and manage interactions. The derivation can be about finding a variant that meets specific quality attributes, for example, to find a product that has 64 MB or smaller memory usage (Sinnema et al., 2006), or about optimizing over one or more quality attributes, for example, to find the most accurate possible face recognition system that can be constructed with a given budget (White et al., 2009). From the computational point of view, algorithms that are needed for finding and optimizing variants from feature models are computationally very expensive. Earlier solvers based on constraint satisfaction problems resulted in exponential solution times to the size of the problem (Benavides et al., 2005). White et al. (2009) showed that finding an optimal variant that adheres to feature model and system resource constraints is an NP-hard problem. Therefore, several approximation algorithms have been proposed to find partially optimized feature configurations (Guo et al., 2011; White et al., 2009). Other proposals utilize hierarchical task network planning (Soltani et al., 2012). In particular, when the derivation takes place at runtime, the scalability of the approach becomes an issue (Wang et al., 2006; Hallsteinsen et al., 2006b).

43

Previous Work on Quality Attribute Variability

44

5. Performance Variability

This dissertation focused on performance and security. In the following, the results for performance variability are given based on two case studies (Case Nokia and Case Fathammer). The results are given as generalized theoretical models and as case-specific descriptions.

5.1

Explanation for Varying Performance (RQ1)

To study why performance is purposefully varied, we proposed a theoretical model of the explanations behind the decision to purposefully vary performance in software product lines. The proposed model consisted of generalized explanations (Figure 5.1), definitions, scope, and instantiations (Publication II). Each of the explanation was instantiated in either Case Nokia or Case Fathammer. Three types of generalized explanations behind the decision to vary were identified: explanations related to the customer needs and characteristics, explanations related to the product and design trade-offs, and explanations related to the operating environment. Firstly, the customer needs and characteristics, that is, the problem domain, can motivate the decision to vary (Figure 5.1). Performance may be varied to serve different or evolving customer performance needs. Also the ability to conduct product or price differentiation motivates: differentiation is supported by the customers’ ability to understand the performance differences and their willingness to pay more for better performance. Secondly, the product and design trade-offs, that is, the solution domain, can motivate the decision to vary (Figure 5.1). Performance can be varied to balance two kinds of trade-offs: trade-offs between performance and other quality attributes, and trade-offs between performance and costs. The latter can be caused by the expensive hardware needed to improve

45

Performance Variability

Explanations related to the customer needs and characteristics Differences in the customer performance needs (caused, e.g., by differences in the amount of requests or data) Evolution of the customer performance needs over time, long-lived investment products

Differences in how customers are willing to pay for better performance

Explanations related to the product and design trade-offs

Case Nokia

Case Nokia

Case Nokia

Purposeful performance variability in a software product line

Case Fathammer

Case Fathammer

Ability of the customer to understand the performance differences

An explanation to the decision

Trade-off between performance and other quality attributes

Case Nokia

Case Nokia

Legend

Trade-off between performance and production costs (caused, e.g., by the hardware design)

Differences in the resources available in the product operating environment that constrain performance Explanations related to the operating environment constraints

Case X

A class of explanations

A decision

Identified as one explanation (no direct causality), explanation instantiated in case X

Figure 5.1. RQ1: Generalized explanations for the decision to purposefully vary performance in a software product line (adapted from Publication II). All explanations were instantiated in either Case Nokia or Case Fathammer. See scope and other instantiations from Publication II.

performance. Thirdly, constraints in the operating environment can also explain the decision to vary (Figure 5.1). If there are differences in the resources available in the product operating environment and the resources constrain performance, the situation can be resolved by adapting the product performance. In Case Nokia and Case Fathammer, there were several explanations that contributed to the decision to purposefully vary performance. In Case Nokia, the main explanations were related to the problem domain. Different base stations had initially different capacity requirements and the capacity needs grew over time. Moreover, the operators were willing to pay more for better capacity, since capacity was tied to the operators’ business and revenue mechanisms. The case company was able to characterize the capacity with unambiguous metrics and guarantee the capacity to the operators. All this supported price differentiation. The case company offered the operators the possibility to upgrade the base station capacity in the future: this flexibility in pricing added to the customer satisfaction and acted as a differentiating feature. In the solution domain, production cost trade-offs acted as one explanation: when capacity differences were achieved with hardware, the conflict between capacity and hardware costs could be resolved with variability. To summarize, Case Nokia was about price differentiation of capacity, both in space and

46

Performance Variability

in time. In Case Fathammer, the customers did not have any explicitly stated differences in the game performance needs. Instead, the main reason was related to the drastic differences in the tightly-constrained target mobile devices. In addition, game playability and graphics attractiveness were valuable to the market success, yet were in conflict with resource consumption and game refresh rate. Since the device capabilities differed so much from each other, the single solution would have either been too heavy for the low-end devices, or would have had too poor graphics for the high-end devices. To summarize, Case Fathammer was about maximizing the use of varying device capabilities in the product operating environment.

5.2

Distinguishing Performance Variants (RQ2)

To study how performance in the product variants could be distinguished to the customers, we describe Case Nokia and Case Fathammer from the following viewpoints: what the varying performance attributes were and how they were distinguished to the customers (Table 5.1). In Case Nokia, the base stations had different phone call capacity (Table 5.1). Phone call capacity was defined as the maximum phone call throughput, that is, using an established, externally observable performance measure. The phone call capacity differences were communicated to the customers in two ways: directly as phone call capacity or as channel elements. Since the ability to serve as many phone calls as possible was one of the most valuable aspects to the operators, phone call capacity was used as a selling point when network elements were purchased. However, when configuring base stations, the base station capacity was communicated to the customer technical representatives as channel elements. A channel element was an abstraction of the internal resources needed to deliver certain phone call capacity. Channel elements dictated phone call capacity independently from other network planning parameters, such as base station interference and power. Hence, channel elements were easier to configure separately. To summarize, the varying phone call capacity was communicated either directly, as externally observable capacity, or as internally observable resources needed to deliver the capacity. In Case Fathammer, the varying performance attributes were game resource consumption and refresh rate (Table 5.1). Three resource consump-

47

Performance Variability

Table 5.1. RQ2: Distinguishing the product performance variants in the case study cases.

Case Nokia

The varying performance

Distinguished to the cus-

attribute

tomers the

As externally observable ca-

maximum number of phone

pacity: as phone call capacity

calls the base station can pro-

when selling the products.

cess per a time unit, that is, the

As internally observable re-

maximum phone call through-

sources: as channel elements

put.

when (re)configuring the prod-

Phone call capacity:

ucts; channel elements measure the internal resources needed for certain phone call capacity and are independent of other product parameters. Case

Resource

consumption:

As target operating envi-

Fathammer

game heap memory consump-

ronment: as the target mo-

tion,

application size when

bile device to which the game

downloading and application

resource consumption and re-

size when installed on the

fresh rate were adapted.

device. Refresh rate: the game action and graphics refresh rate.

tion measures were varied: runtime memory consumption, game binary size and download size. Refresh rate was varied as game graphics and game action refresh rate. These differences in resource consumption and refresh rate were not explicated to the customers as such. Instead, the product performance was communicated to the customers only as the target device to which resource consumption and refresh rates were adapted, for example, by stating that the game was meant for Nokia 6600 mobile phone. To summarize, varying resource consumption and refresh rates were communicated indirectly as the target operating environment. To synthesize the results in Table 5.1, the varying performance attributes included capacity, resource consumption and time behavior, that is, established, externally observable product attributes with quantitative measures. However, these attributes were not necessarily used to distinguish the variants to the customers. Performance differences were communicated to the customers in three different ways: as the externally observable product performance, as the internally observable resources needed to deliver that performance, and as the target operating environment to which the product performance was adapted.

48

Performance Variability

Performance variability design strategy

Hardware design tactic (Case Nokia)

Legend Is-a

Software design

Indirect variation (only in the literature)

Software design tactic

Downgrading tactic (Case Nokia)

Characterization in the model (exhibited in)

Trade-off tactic (Case Fathammer)

Figure 5.2. RQ3: Generalized strategies for designing performance variability in product line architectures, see definitions from Table 5.2. Also the cases in which the strategy was used are illustrated.

5.3

Designing Performance Variability (RQ3)

To describe how performance differences can be designed in the product line architecture (RQ3), we proposed a theoretical model that classified the strategies for designing performance variability in product line architectures. The proposed model consisted of a classification (Figure 5.2), definitions (Table 5.2), and the description of scope and instantiations (Publication II). In Case Nokia, there were two strategies for designing capacity variability in the base station architecture. Firstly, major differences in capacity were achieved by scaling up and out the base station hardware responsible for speech processing. This is called hardware design tactic strategy (Table 5.2). Secondly, differences in capacity were achieved through software means, and in particular, by downgrading the maximum system capacity achieved with the full hardware configuration: this is called downgrading tactic strategy (Table 5.2). Downgrading was achieved by having a dedicated component to monitor and limit the channel element usage and to programmatically disable the dedicated hardware resources. Since other software components were unaware of the actual hardware resources, available channel elements could be changed at runtime without affecting other operations in the base station. In Case Fathammer, several design tactics were used to design resource consumption and time behavior variability in the mobile games. Many of these tactics altered game graphics. For example, resource consumption

49

Performance Variability

Table 5.2. RQ3: Definitions of the generalized strategies illustrated in Figure 5.2. Performance Variability Design Strategy The explicit product line architecture design means of purposefully creating performance differences between the product variants. A software product line can apply several different strategies simultaneously. Hardware Design Tactic Differences in performance are achieved by hardware scaling, that is, by having different installed hardware in the product variants. This corresponds to varying design tactic Increase Available Resources (Bass et al., 2003). Software Design Differences in performance are achieved by varying software. Software Design Tactic Create differences in performance by one or more purposefully introduced, varying software design tactics (Bass et al., 2003) that affect performance; performance variability is managed through these tactics. Can be either about downgrading or trading off tactic consequences (see below). Downgrading Tactic Vary software design tactics with the purpose of decreasing performance without affecting other quality attributes. Can be done by limiting both software and hardware resources and processes through the operating system or middleware. Trade-off Tactic Vary software design tactics with the purpose of decreasing performance but improving other quality attributes or lowering the production costs. Indirect Variation Differences in performance are primarily achieved by indirect variation, that is, as an emergent byproduct from other software variability. Indirect variation is managed through externalized impacts and interactions, often using feature models as a basis.

was varied by changing the materials, textures, object models and game levels. Game refresh rate was varied by changing the number of polygons and rendering algorithms. Since all these tactics improved game visual attractiveness and playability at the expense of performance, Case Fathammer was an example of a trade-off tactic strategy (Table 5.2). Additionally, the proposed model defined and classified the prevailing strategy in the previous work, that is, indirect variation (Table 5.2). This strategy is described in Section 4.3. To summarize the proposed model in Figure 5.2, performance differences can be designed with software or hardware. Moreover, software can either utilize purposeful design tactics to vary performance or let performance differences emerge indirectly from other variability. Finally, software design tactics can either trade off performance with other quality attributes or simply downgrade performance without trying to affect other quality attributes.

50

Performance Variability

Trade-off between performance and hardware production costs; ability to develop scalable software

Nokia

Performance variability is motivated by differentiation, not by trade-offs

Nokia

Downgrading tactic

The need for the customers to rebind performance easily at runtime without changing product functionality

Nokia

Software design tactic

Performance variability is motivated by trade-offs in the software design

Fathammer

Trade-off tactic

Hardware design tactic

Legend An explanation to the decision A decision Case

Explains; no predictive causality

Figure 5.3. RQ3: Explanations for the selected design strategies in the cases, adapted from Publication II.

The reasons why Case Nokia and Case Fathammer selected different design strategies are outlined in Figure 5.3. In Case Nokia, since the hardware costs were a major driver in the base station products, varying capacity through hardware was straightforward. Additionally, scaling base station hardware was an established practice in the domain. However, when the hardware configuration was fixed and the capacity was varied through downgrading, there were no cost differences between the variants and the lower capacity variants had "too good" hardware. With the downgrading strategy, capacity variability was motivated by price differentiation: different price points and the possibility to upgrade were valued by the customer. Finally, the use of a software design tactic to vary capacity enabled reconfiguring the base station at runtime without affecting other product functionality. In Case Fathammer, resource consumption and time behavior variability were motivated partly by the drastic differences in the mobile devices, partly by the inherent trade-offs between game graphics and performance. An example of such an inherent trade-off was that having more game levels increased the application size. Therefore, it was straightforward to adapt the game performance to different devices by varying these inherent trade-offs.

5.4

Deriving Performance Variants (RQ4)

RQ4 is about deriving products that meet given performance needs using the designed product line architecture variability. In Case Nokia, since the base stations were expensive long-lived products and the capacity needs increased over time, capacity reconfiguration was essential (Table 5.3). The reconfiguration was done by the cus-

51

Performance Variability

Table 5.3. RQ4: How the cases derived product variants that met given performance needs using the product line architecture variability.

Derivation task

Case Nokia

Case Fathammer

System-supported

Manual porting and opti-

(re)configuration: installing

mization: manually adapting

or upgrading base stations,

a game to a specific mobile de-

either remotely at runtime or

vice and sales channel after

manually,

by the customer.

the outsourced game produc-

Configuration logic and sup-

tion; relied on the skills of the

port were implemented into

Fathammer engineer.

the system. Binding

Through design tactics: set-

Through design tactics: tun-

architecture

ting software parameters or in-

ing the code parameters and

variability

stalling hardware.

compiling; but also creating new implementation or game content.

Handling the

Minimized in the product

Tested and tuned for a

impact of other

line

design:

product: the high impact of

variability

impacts were minimized in

other game variability was

the design and ignored during

manually

derivation.

derivation.

Ensuring the

Testing at the product-line

Testing when deriving a

performance needs

level:

are met

guarantee promised capacity;

graphics and playability of

not all variants tested.

each variant were tested iter-

architecture

testing beforehand to

product:

checked

during

the performance,

atively, based on which final decisions on the variant were made.

tomer technical representative, and it involved either purchasing new capacity licenses at runtime or upgrading the base station hardware. In Case Fathammer, the derivation was about adapting a game to a specific mobile device and sales channel in the post-production phase (Table 5.3). No maintenance of the games took place. Thus, the derivation tasks were vastly different. Case Nokia was focused on supporting automated, customer-conducted reconfiguration for the evolving capacity. Case Fathammer was focused on light-weight, manual performance adaptation by the product line engineer. In both cases, the product derivation bound performance variability in the product line architecture (RQ3). In Case Nokia, the desired performance level was achieved by upgrading the hardware or by changing the number of channel elements that were visible to all software components. Thus, performance was set by binding the variability introduced through

52

Performance Variability

the hardware and downgrading strategies. In Case Fathammer, the desired balance between frame rate, memory consumption, playability, and visual appeal was achieved by binding the variability of several design tactics. These varying tactics had been introduced to implement the verification configurations. Performance in general is affected by the software architecture and thus may be heavily impacted by other variability. Hence the impact of other variability may need to be managed and checked in addition to binding the design tactic variability. Nevertheless, explicit or externalized impact management was not necessary in either case, but the impacts were either minimized in the product line design or manually checked per product. In Case Nokia, the design was built to minimize the impact of other software variability on phone call capacity. Since dedicated resources were reserved for handling the phone calls, the variability of other base station functionality mostly did not affect phone call capacity. In Case Fathammer, the impact of other variability was high, but it was manually tested and tuned for each product during derivation. Finally, both cases used testing to ensure the performance of the variant was sufficient, but in a very different way. In Case Nokia, the variants had to be tested beforehand, since the customer did the reconfiguration herself at runtime and the capacity was guaranteed. Instead of testing all capacity variants against all configurations, it was sufficient to test only the maximum, minimum and some downgraded design tactic variant. This was due to minimizing the impact of software variability in the design. In Case Fathammer, testing was an integral part of the derivation: during post-production, the suitable levels of performance were decided based on iterative testing and tuning. The tuning relied on the domain knowledge and judgment of the post-production engineer.

53

Performance Variability

54

6. Security Variability

This dissertation focused on performance and security. In the following, the results for security variability are given based on the artifacts and the design theory. The artifacts and the theory were built through design science and evaluated with two cases (Case Magento and Case Shopping Mall). We briefly introduce the artifacts and the theory, describe the results for each research question, and summarize the evaluation results.

6.1

Artifacts and Theory

We constructed a set of concrete artifacts for representing and configuring software product lines with security and functional variability. The most important concrete artifact was the KumbangSec configurator tool, but also concrete, case-specific configuration models and configurations were considered as artifacts (Figure 6.1). Besides just building the concrete artifacts, we separated the generalizable knowledge of the artifacts into a design theory (Gregor and Jones, 2007). The design theory consists of the KumbangSec modeling conceptualization and a number of principles (Table 6.1). The concrete artifacts were instantiations of the theory knowledge (Figure 6.1). The scope of the artifacts and the theory was set on configurable software product lines, that is, on software product lines where the application engineering requires very little or no implementation effort (Bosch, 2002). Therefore, we built a configurator tool that supports the automated configuration task, that is, the task of deriving a product variant during application engineering. A configuration describes one particular software product variant, while a configuration model describes the structure, the variability, and the rules upon which valid product variants could be constructed from the product line. Given a configuration model, the Kum-

55

Security Variability

The principle of using countermeasures to represent and distinguish security variants to the customers

The principle of separating the configuration model and a configuration

embodies

embodies

KumbangSec modeling conceptualization

embodies

KumbangSec modeling language

is used to represent

is instantiated by

is instantiated by

KumbangSec configuration model

takes as input

translates to an answer set program

implements and supports

KumbangSec configurator is instantiated by

satisfies and is justified by

KumbangSec configuration

calculates as a stable model to the answer set program

KumbangSec configuration task is instantiated by

The principles of building and using a configurator operating on stable model semantics embodies

The principle of separating the configuration knowledge from the configurator tool implementation

Legend

Concrete artifact

embodies

The principles of translating the modeling concepts into answers set programs

Theory of artifacts

Figure 6.1. The concrete artifacts and the design theory proposed in this study.

bangSec configurator tool can be used to efficiently derive configurations that both satisfy and are justified by the configuration model. The proposed design theory codifies several principles that the concrete artifacts instantiate (Figure 6.1). At the core of the design theory is the KumbangSec modeling conceptualization. Similarly to knowledge-based configuration (Soininen et al., 1998), the design theory codifies the principle of separating the domain engineering models with variability from the application engineering models where all variability was bound. Further, the configuration knowledge is described separately from the configurator implementation. The configuration task is defined utilizing the stable model semantics (Simons et al., 2002): a KumbangSec configuration is calculated as a stable model from an answer set program. All this is embodied in the principles of building and utilizing a configurator for the configuration task. Consequently, the design theory is based on knowledge-

56

Security Variability

Table 6.1. The design theory of the concrete artifacts (Publication IV), described using the guidelines by Gregor and Jones (2007). Theory

The theory aims at representing and configuring software product lines

purpose

with security and functional variability.

Theory

The theory is applicable to configurable software product lines with

scope

varying functional and security requirements and composable or parameterizable software architecture entities.

Constructs

The principles of using countermeasures to represent and distinguish

and

security variants to customers.

principles

The KumbangSec modeling conceptualization. The principle of separating the concepts for configuration model and configuration. The principles of building and using a configurator operating on stable model semantics. The principle of separating the configuration knowledge from the configurator implementation. The principles of translating the modeling concepts into answer set programs.

Testable

In the situation described by the theory scope:

proposi-

P1: in order to represent and distinguish security variants to the cus-

tions

tomers, the use of countermeasures helps; P2: in order to represent the design of security and functional variability, the KumbangSec conceptualization helps; P3: in order to configure consistent products to meet given security and functional needs, the KumbangSec configurator helps.

Justifica-

Countermeasures as a characterization of software security (ISO/IEC

tory

15408-1, 1999). Kumbang (Asikainen et al., 2007; Myllärniemi et al.,

knowledge

2007). Using stable models and answer set programs for product configuration (Simons et al., 2002; Soininen et al., 2001, 1998).

Expository

The KumbangSec configurator.

artifact

Configuration models for Case Magento and Case Shopping Mall.

instantia-

Configuration tasks for Case Magento and Case Shopping Mall.

tions Instanti-

A configurator can be implemented in different ways to implement the

ated

principles of the theory.

artifact

Different configuration models for different product lines can be instan-

mutability

tiated within the limits of the modeling conceptualization.

based configuration (Soininen et al., 1998; Sabin and Weigel, 1998; Felfernig et al., 2014) and answer set programming (Simons et al., 2002; Soininen et al., 2001). In addition, the KumbangSec modeling conceptualization and tool set were extended from Kumbang (Asikainen et al., 2007; Myllärniemi et al., 2007), which is meant to represent and configure variability of functional features and components. In the following, we describe how the concrete artifacts and the design theory answered the research questions.

57

Security Variability

Table 6.2. RQ1: Explanations for the decision to purposefully vary security as countermeasures in Case Magento and Case Shopping Mall. Explanation for the deci-

How the explanation was instantiated in the

sion to purposefully vary

cases

countermeasures Trade-offs between security

Case Magento: using encrypted communication for

and other quality attributes

all authenticated users was believed to impose a per-

caused by the countermea-

formance penalty on the web shop browsing; using

sures

two-way authentication increased security but decreased user efficiency by requiring an additional step in the administration authentication process.

Trade-offs between security

Case Magento: the use of SSL (Secure Socket Layer)

and cost caused by the coun-

encryption required an SSL certificate that imposed

termeasures

additional operation and installation costs.

Differences in the customer

Case Shopping Mall: different users had different

countermeasure

needs

existing accounts (passwords, OpenID) while some

caused by differences in the

users preferred not to identify themselves at all.

user identification

Case Magento: using an identification token generated by the Google Authenticator application in the two-way login may not be possible or preferable for all web shop administrators.

Differences in the customer

Case Magento: since credit card information was

countermeasure

deemed as a sensitive asset, the variability of credit

needs

caused by differences in the

card information motivated to vary encryption.

sensitive assets

6.2

Explanation for Varying Security (RQ1)

The artifacts did not explicitly address the reason to vary security (RQ1). However, the reasons to vary countermeasures in Case Magento and Case Shopping Mall were analyzed. Based on the cases, Table 6.2 lists the explanations for the decision to purposefully vary security as countermeasures. Firstly, there were two kinds of trade-offs that could be resolved with countermeasure variability. Countermeasures can impact usability, performance or other quality attributes negatively. When a countermeasure that caused a trade-off was varied, it was possible to have one variant that maximized security and another variant that maximized the other quality attribute. Further, countermeasures can impose additional development or operation costs. Hence, countermeasure variability was introduced to balance the trade-off between security and cost differently in different product variants. Secondly, there were differences in the customer countermeasure needs.

58

Security Variability

Many security countermeasures require that the users identify themselves. If the users had different identification needs, for example, different existing accounts or preferences, such countermeasures could be varied. Further, countermeasures are introduced in the system to reduce vulnerabilities and to protect the assets against the threats (ISO/IEC 15408-1, 1999). Therefore, differences in the sensitive assets could lead to differences in the customer countermeasures needs. To summarize, security variability was motivated either through tradeoffs or through differences in the customer countermeasure needs.

6.3

Distinguishing Security Variants (RQ2)

To study how security variants could be distinguished to the customer (RQ2), we describe the artifacts from the following viewpoints: what the varying security attribute was and how this attribute was distinguished to the customers (Table 6.3). Firstly, the varying security attribute was defined to be a countermeasure (Table 6.3). We defined the varying countermeasure as a requirement or specification of an action or technique that opposes a threat, an attack or vulnerability by preventing, detecting or reacting to it. Informally, countermeasures describe what the system does to prevent malicious things from happening. Countermeasures were selected for several reasons. Countermeasures describe the behavior of the product, compared with threats and attacks, which are properties of the outside world. Further, countermeasures are easier to recognize than vulnerabilities, since they are purposefully introduced in the product. Finally, the variability of assets, threats and vulnerabilities could be operationalized as the variability of countermeasures. Table 6.3. RQ2: Distinguishing the product security variants in the artifacts. The varying security attribute

Distinguished to the customers

actions or tech-

As countermeasures: represented at

niques to oppose a threat, an attack

the requirement or specification level,

or vulnerability; are varied similarly to

selected in the configurator tool during

functionality.

the configuration task.

Example instantiations (Figure 6.2): au-

In Case Magento,

thentication, the encryption of data in

were already in use; included both ex-

transmission, session validation, net-

ternally and internally observable coun-

work access restriction.

termeasures.

Countermeasures:

countermeasures

59

Security Variability

WebShopCountermeasures

encryption

EncryptCommunication

adminAuth

NoEncryptedCommunication

{value(protectionLevel)=custom present(customSettings); value(protectionLevel)!=custom not present(customSettings);} sessionValidation

AdminAuthentication

encryptAdmin: { yes, no }

BrowserSessionValidation protectionLevel : { nothing, medium, high, custom }

encryptCustomers restrictAccess[0...1] EncryptAfterAuthentication EncryptOnlyInCheckout

customSettings [0...1]

auth2Way [0...1]

CustomBrowserSessionValidation RestrictAccessToAllowedIP TwoWayAdminAuthentication

checkRequestIPAgainstSessionIP : { yes, no } checkRequestBrowserAgainstSessionBrowser : { yes, no } useSessionIDinURL : { yes, no }

countermeasure type EncryptCommunication { contains (EncryptAfterAuthentication,EncryptOnlyInCheckout) encryptCustomers; attributes Boolean encryptAdmin; implementation value(component-root.static.core.core.config, web_secure_base_url) = https; has_instances(SSLCertificate); value(encryptAdmin) = yes => value(component-root.static.core.core.config, web_secure_use_in_adminhtml) = 1; value(encryptAdmin) = no => value(component-root.static.core.core.config, web_secure_use_in_adminhtml) = 0; description ”Encrypts the traffic between the browser and the server.” ”Requires an installed and authorized SSL certificate, which may impose additional operational costs.” ”May impact the response time of the page requests negatively.” ”Attribute encryptAdmin can be used to indicate whether encryption is also used when the administrator is logged in.” }

(a) Countermeasure types and their variability in Case Magento

LoginOption portalLogin: { openID, passwd, none }

{value(portalLogin) = openID instance_of(portal.search.mallSearch->login, LoginOpenID) value(portalLogin) = passwd instance_of(portal.search.mallSearch->login, LoginPasswd) value(portalLogin) = none not present(portal.search.mallSearch->login)}

(b) Countermeasure types and their variability in Case Shopping Mall

Figure 6.2. RQ2: Countermeasure types and their variability for (a) Case Magento and (b) Case Shopping Mall (excerpt from Publication IV, Publication V). An excerpt from the textual configuration model is also shown.

In our modeling conceptualization, the security of a product variant is conceptualized as the set countermeasures in a configuration (Figure 6.3). The allowed countermeasure variability can be defined in the configuration model through the composition, attributes, and inheritance of countermeasure types. Also constraints among countermeasures and between countermeasures and functional features can be defined. Thus, countermeasures are varied similarly to functional features. To distinguish the security differences to the customers (Table 6.3), countermeasures can be used directly in the configurator tool as the means to select variants. To ease communication, the selectable countermeasures need to be modeled at the requirement or specification level. Case Magento already used countermeasures to distinguish configurable security options to the customer. However, all Magento countermeasures had been stated in very technical terms, for example, as Use Secure URLs in Frontend. When modeling Case Magento, we represented the countermeasures at the requirement or specification level, for example, as EncryptCommunication. Some of the resulting countermeasures in Case Magento described behavior that was externally observable in the product, for ex-

60

Security Variability

KumbangSecModel

KumbangSecConfiguration KumbangSecType KumbangSecInstance ComposableType parts[*] : PartDefinition attributes[*] : AttributeDefinition constraints[*] : Constraint

AttributeType ComposableInstance

values[1..*] : AttributeValue

parts[*] : ComposableInstance

PartDefinition

AttributeDefinition

types[1..*] : ComposableType similarity[1] : Similarity

type[1] : AttributeType

AttributeInstance value : AttributeValue

FeatureInstance

FeatureType CountermeasureInstance

implementationConstraints[*] : Constraint

CountermeasureType description[0..1] : String

Domain engineering modeling concepts to represent the product line and its variability

Application engineering modeling concepts to represent one product variant

Figure 6.3. RQ2: Countermeasures and their variability in the modeling conceptualization (excerpt from Publication IV).

ample, AdminAuthentication. Some countermeasures described behavior that could be fully observed only internally, for example, BrowserSessionValidation.

6.4

Designing Security Variability (RQ3)

RQ3 is about designing the countermeasures in the product line architecture. The artifacts treated a countermeasure as a requirement or a specification of preventing, detecting or reacting to security incidents. At the design level, such a requirement or specification is implemented by utilizing one (or more) security tactics, which are design strategies to resist, detect, react to or recover from attacks (Bass et al., 2003). Moreover, the artifacts focused on software architecture. Therefore, the security variability design strategy implied by the artifacts was to use software design tactics (Figure 6.4). In the cases, following security tactics were used in the design. Case Magento varied several countermeasures (Figure 6.2): corresponding de-

Security variability design strategy

Legend

Characterization (exhibited in)

Is-a

Software design tactic (artifact, Case Magento, Case Shopping Mall)

Figure 6.4. RQ3: How security variability was designed as advocated in the artifacts.

61

Security Variability

Table 6.4. RQ4: How the artifacts and the design theory support deriving product variants that meet given security needs using the product line architecture variability. Derivation task

Knowledge-based configuration: based on a configuration model, finding a consistent and complete configuration as the stable model of answer set programs. The configuration represents the countermeasures, features and architecture of the product. Is supported by a configurator tool.

Binding

Through modeled design tactics: binding the vari-

architecture

ability in a model; the model captures how each coun-

variability

termeasure is implemented in the product line architecture through explicit constraints. Also cross-cutting constraints supported.

Handling the

Modeled as constraints: similarly to functionality, mod-

impact of other

eled as constraints in the product line architecture and

variability

resolved for a product.

Ensuring the

Model-based consistency:

security needs are

ments, the configuration model and configurations consis-

met

tent. The correctness of the models must be tested sepa-

by keeping the require-

rately; instantiating and testing outside the scope.

sign tactics include Authenticate Users, Maintain Data Confidentiality, and Limit Access (Bass et al., 2003). Case Shopping Mall varied authentication countermeasure (Figure 6.2); the corresponding design tactic is Authenticate Users (Bass et al., 2003).

6.5

Deriving Security Variants (RQ4)

RQ4 is about deriving the product variants that meet given security needs using the product line architecture variability. Using the artifacts, the product derivation is performed as a knowledgebased configuration task (Table 6.4): the derivation relies on the configuration knowledge captured in the configuration model, and very little or no new development effort is needed. For example, the configuration model in Figure 6.5 describes the knowledge that is needed to derive varying shopping mall search services with different authentication mechanisms. Given a configuration model, the configuration task is about finding a configuration that satisfies and is justified by the configuration model and the requirements. For example, the configuration task in Case Shopping Mall would involve finding a configuration that takes into ac-

62

Security Variability

Configuration knowledge of countermeasures LoginOption

AuthMethod = {openID, passwd, none}

portalLogin : {AuthMethod}

{value(portalLogin) = openID instance_of(portal.search.mallSearch->login, LoginOpenID) value(portalLogin) = passwd instance_of(portal.search.mallSearch->login, LoginPasswd) value(portalLogin) = none not present(portal.search.mallSearch->login)}

Configuration knowledge of components and architectural constraints ComposedSearchService

search:{DoSearch} login[0,1]: {LoginOpenID, LoginPasswd}

mallSearch : {MallSearchService}

shopSearch[0,N] : {ShopSearchService}

{for_all(X:shopSearch)( instance_of(mallSearch->login, LoginOpenID) instance_of(X->login, LoginShopOpenID) ) for_all(X:shopSearch)( instance_of(mallSearch->login, LoginPasswd) instance_of(X->login, LoginShopPasswd) ) for_all(X:shopSearch)( not present(mallSearch->login) not present(mallSearch->login) )}

search:{DoSearch} login[0,1]: {LoginOpenID, LoginPasswd}

sSearch:{DoShopSearch} MallSearch Service

ShopALotSearchService {extends ShopSearchService}

sLogin[0,N]: {LoginShopOpenID, LoginShopPasswd}

ZMartSearchService {extends ShopSearchService}

search:{DoShopSearch} login[0,1]: {LoginShopOpenID, LoginShopPasswd}

YMartSearchService {extends ShopSearchService}

{not type_of(login, LoginShopOpenID)}

ShopSearch Service {abstract}

GadgetsRUsSearchService {extends ShopSearchService}

{present(login)}

Legend provided interface definition[min,max]: {possible types}

component type

countermeasure type attribute definition : {type}

required interface definition[min,max]: {possible types}

attribute value type = {possible values}

partdefinition[min,max] : {possible types}

{constraints}

Figure 6.5. RQ4: An example configuration model that contains all configuration knowledge needed to derive product variants in Case Shopping Mall (Publication V).

count the architectural constraints between the shop services and satisfies the given requirements on authentication (Figure 6.5). An essential part of the derivation is the configurator tool: it operationalizes the configuration knowledge during the configuration task. In Case Magento, the web shop owner or her technical representative configured the web shop herself, which meant that all domain knowledge and rules needed to be managed by a tool. In Case Shopping Mall, there was a need to perform the configuration task automatically by another system. The main task of the configurator is to support the derivation of consistent and complete configurations. Informally, in a consistent and com-

63

Security Variability

KumbangSec configuration model (in KumbangSec modelling language) Configurator translates the model to weight constraint rules (CM), enumerates possible instances/attributes (GF), and grounds both

Derivation task = Configuration task

Configurator initializes the configuration task Select a feature, countermeasure, attribute value, component...

Preparation at the product-line level

Answer set program CMUGF

Configurator finds and visualizes consequences of R

C not consistent, cancel selection C consistent

Configurator adds the selection to requirements R and calculates stable models C to check consistency and completeness

For a complete and consistent C: indicate that all requirements have been selected

KumbangSec configuration Legend

Activity performed by a human or by another system

Activity performed by the configurator tool

Activity outside the scope

Instantiate or reconfigure the product variant

Artefact

Figure 6.6. RQ4: The principles of building and using the configurator tool for the configuration task to derive security variants (excerpt from Publication IV).

plete configuration, no rules of the configuration model are violated and all necessary selections are made. During the configuration task, the user of the configurator tool can select required features, countermeasures, attribute values, and components (Figure 6.6). For example, OpenID can be selected as the authentication mechanism for the shopping mall service (Figure 6.5). The configurator tool checks the consistency and completeness of the resulting configuration and calculates the consequences (Figure 6.6). For example, since YMartSearchService does not support OpenID as an authentication mechanism, the selection of both OpenID and YMartSearchService would render the configuration inconsistent (Figure 6.5). For a complete and consistent configuration, the configurator tool can export a configuration, that is, a description of the features, countermeasures and architecture of the product variant. Instead of implementing the reasoning behind the configurator tool from scratch, the configuration task was defined through answer set programs and stable models (Figure 6.7). This enabled the use of an efficient inference engine (Simons et al., 2002). Figure 6.6 illustrates how the configurator uses the answer set programs and stable models. Before the configuration task begins, the configuration model is translated into an answer set program. During the configuration task, the answer set program is used to calculate stable models that satisfy the requirements entered so far. Due to the characteristics of the stable models (Simons et al., 2002), the resulting configuration is both consistent and complete. In order to use the product line architecture in the derivation (Table 6.4),

64

Security Variability

Given a KumbangSec configuration model as a set of weight constraint rules CM , a set of ground facts GF representing the possible instances from the types in the configuration model, and a set of rules R representing the requirements, is there a KumbangSec configuration C, that is, a stable model of CM ∪ GF , such that C satisfies R? Figure 6.7. RQ4: The definition of the configuration task based on stable models and answer set programs; the way the configurator supports this task is illustrated in Figure 6.6.

the design of countermeasure variability needs to be represented in the configuration model. Constraints can be used to state how the components implement the countermeasures. Also cross-cutting constraints may be defined to support countermeasures that have a wider impact in the architecture: for example, all components participating in the search must use the same authentication method (Figure 6.5). Our modeling conceptualization is agnostic about the exact semantics of a component. This is because different kinds of software systems elements, for example, firewalls and installed certificates, can participate in the implementation of the countermeasures. During derivation, the architectural variability in the model is bound in order to achieve the desired countermeasures in the product variant (Table 6.4). The impact of other variability can be handled similarly to functionality: if other variability in the product line architecture impacts how the countermeasures are realized, this can be modeled as explicit constraints. For example, a constraint to handle the impact of other variability can state that the varying component YMartSearchService is not compatible with OpenID (Figure 6.5). To ensure that countermeasure variability is correctly represented in the configuration model, the countermeasure implementation must be tested separately. However, instantiating and testing the product was left outside the artifact scope (Figure 6.6).

6.6

Evaluation

To evaluate the design theory and the concrete artifacts, Case Magento and Case Shopping Mall were used. The proposed design theory (Table 6.1) contains three testable propositions P1, P2 and P3 to explicate the impact and utility of the artifacts. We used these testable propositions as the evaluation criteria (Table 6.5). The evaluation was conducted

65

Security Variability

Table 6.5. The testable propositions in the design theory (Table 6.1) were used to evaluate the research questions. Testable propositions P1: RQ2

To represent and distinguish security variants to the customers, the use of countermeasures helps.

P2: RQ3

To represent the design of security and functional variability, the KumbangSec conceptualization helps.

P3: RQ4

To configure consistent products to meet given security and functional needs, the KumbangSec configurator helps.

Evaluation results RQ1

No artifact instantiation or evaluation, results drawn from the cases.

RQ2: P1

Feasibility: possible to represent all security variability in Case Magento and Case Shopping Mall as countermeasures. Comparison to the current state: focus on requirement and specification level and not on technical countermeasures; an explicit construct to elicit and make security variability visible.

RQ3: P2

Feasibility: possible to represent the design of countermeasures in the product line architecture. Comparison to the current state: can represent fine-grained dependencies in the product line architecture.

RQ4: P3

Feasibility: possible to select security needs as countermeasures; possible to find consistent and complete configurations. Comparison to the current state: checking constraints and dependencies between selections; the separation of more customer-friendly countermeasure selections from the configuration file options.

at two levels: through the feasibility of the instantiation and through comparison to the current state of practice in Case Magento. The evaluation results are summarized in Table 6.5. It was possible to represent security variability and its design in Case Magento using countermeasures and the modeling conceptualization. It was also possible to configure consistent products using the configurator tool. Several improvements over the current state in Case Magento were identified. The configurator tool checked dependencies between the selections and enabled the user to enter her needs at the requirements or specification level. The countermeasures in Case Magento were originally represented at a very technical level: we proposed requirement-level countermeasures to describe security variability at a customer-relevant level. Nevertheless, there was a challenge to represent some of the varying countermeasures in Case Magento at a customer-relevant level. For example, countermeasure BrowserSessionProtection may be too technical for many customers.

66

7. Discussion

7.1

Answers to the Research Questions

The results so far have addressed performance and security variability separately. In the following, the research questions are answered by synthesizing the results over both quality attributes. The answers are given as models or descriptions. The synthesis focuses only on performance and security, and the resulting models and descriptions indicate which findings apply to which quality attribute. In particular, we do not aim to generalize the results to other quality attributes. The answers are also compared with our review of the previous work on quality attribute variability, and any similarities and differences are discussed.

Why to vary quality attributes purposefully in a software product line? (RQ1) As the result for performance, we proposed a theoretical model of explanations for the decision to purposefully vary performance in software product lines: the explanations were instantiated either in Case Nokia or in Case Fathammer. As the result for security, we identified explanations to purposefully vary security as countermeasures from the cases used to evaluate the artifacts (Case Magento and Case Shopping Mall). To synthesize the results for performance and security, we propose a model that explains the decision to vary performance or security purposefully (Figure 7.1). There are three classes of explanations. Firstly, customer needs and characteristics, that is, the problem domain, can explain the decision to vary. Differences in the customer needs, either in space (between customers) or in time, are two such explanations. Other two

67

Discussion

Explanations related to the customer needs and characteristics Differences in the customer performance or countermeasure needs. (For performance: caused, e.g., by differences in requests. For security: caused by differences in assets or user identification.) Evolution of the customer performance needs over time Differences in how customers are willing to pay for better performance

Explanations related to product and design trade-offs

Cases Nokia (perf), Magento (sec), Shopping Mall (sec)

Case Nokia (perf)

Cases Nokia (perf), Magento (sec)

Purposeful performance or security variability in a software product line Cases Fathammer (perf), Magento (sec)

Case Nokia (perf) Case Nokia (perf)

Ability of the customer to understand the performance differences

Trade-off between performance or security and cost. (For performance: caused, e.g., by hardware design. For security: caused by countermeasures.)

Trade-off between performance or security and other quality attributes

Case Fathammer (perf)

Differences in the product operating environment resources that constrain performance Explanations related to the operating environment constraints Legend

An explanation for the decision

Case X (perf/sec)

A class of explanations

A decision

Identified as one explanation (no direct causality), explanation instantiated in case X for performance/security

Figure 7.1. RQ1: A model to explain the decision to purposefully vary performance or security, synthesized from Figure 5.1 and Table 6.2. The cases and quality attributes are also illustrated.

explanations describe customer characteristics that support differentiation. Secondly, product and design trade-offs in the solution domain can explain the decision: trade-offs imply situations in which all customer needs cannot be satisfied with one product variant. Thirdly, varying constraints from the operating environment can also explain the decision. These explanations describe situations in which the product quality attributes may need to be adapted. In the following, we compare these results with our review of the previous work (Section 4.1). The purpose of Figure 7.1 is to present a synthesized model of generalized, domain-independent explanations. In contrast, the previous work either does not report the reason to vary quality attributes or gives the reasons only through brief examples. We used such examples as example instantiations when building the generalized model. The most comprehensive model by Lee and Kang (2010); Lee et al. (2014) proposes that differences in the usage context explain quality attribute variability. A usage context can be a user, physical, social or business context or a product operating environment (Lee and Kang, 2010). Compared with our model, it seems Lee and Kang (2010) focus more on the reasons why customer or user needs vary, whereas we treat different customer quality needs as one explanation category. However, it was diffi-

68

Discussion

cult to compare the models explicitly, since the model by Lee and Kang (2010); Lee et al. (2014) was not described beyond illustrative examples. Hence, we contributed by defining the explanations explicitly and by giving concrete explanations specifically for performance and security. We also pointed out internal trade-offs as one explanation. According to our study, the decision to vary should be analyzed from the customer point of view, but at the same time trade-offs and operating environment constraints affect the decision as well. Different customer, business and user needs are mentioned in a few studies (Lee and Kang, 2010; Lee et al., 2014; Niemelä and Immonen, 2007; Wang et al., 2006), but most studies focus only on the technical reasons to vary. As examples of such technical explanations, the studies have mentioned design tradeoffs (Bagheri et al., 2010; Bartholdt et al., 2009), the scarce resources of the mobile devices (White et al., 2007; Hallsteinsen et al., 2006b), and various environments in which the product must operate (Siegmund et al., 2012b). Especially performance variability is often implied to be driven by the trade-offs or constraints that force to vary instead of being driven by the customer needs. In comparison, we contributed by combining both customer and technical explanations in our model. Also, Case Nokia showed that performance variability can be driven primarily by the customer needs and by the ability to conduct price differentiation. Our results show that price differentiation can be an important reason to vary quality attributes purposefully. There are some previous studies that mention differentiation as high-end and low-end products (Kishi and Noda, 2000; Kuusela and Savolainen, 2000), but otherwise price differentiation is not explicitly reported as an explanation to vary. Price differentiation, that is, charging a higher price for better quality, is a powerful way for vendors to improve profitability (Phillips, 2005). Price differentiation without any differences in the production costs is called price discrimination (Belobaba et al., 2009). As exemplified by Case Nokia, when the customer needs and characteristics enable price differentiation, quality attributes may be varied even with no trade-offs involved and even when the cost to produce the variants is the same. The model in Figure 7.1 covers only performance and security variability, and some of the explanations were only instantiated for performance. This is to contrast the proposals by Lee and Kang (2010); Lee et al. (2014); Niemelä and Immonen (2007), which are implied to be applicable to all quality attributes. It is not yet known whether all explanations or expla-

69

Discussion

nation classes in Figure 7.1 are applicable to other quality attributes. For example, is evolution of the customer needs a more probable driver for performance variability, given that the load, amount of data and resource consumption tend to increase in many modern products? Finally, Figure 7.1 consists of only explanations identified from the cases: it is possible that there are other reasons as well. For example, Niemelä and Immonen (2007); Wang et al. (2006) justify quality variability through different legislations. Given the classification in Figure 7.1, varying legislation could be considered to be part of the business operating environment. Similarly, security variability could be motivated by the customers’ willingness to pay more for better security. Although the cases did not exhibit these explanations, the categorization in Figure 7.1 seems to be applicable to these as well.

How to distinguish the quality attributes of the product variants to the customers? (RQ2) As the result for security, we described how the design theory and artifacts distinguished security to the customers. As the result for performance, we described how performance was distinguished in Case Nokia and Case Fathammer. To synthesize the results for performance and security, we describe how the variants were distinguished to the customers (Table 7.1). The varying performance attributes had established, externally observable quantitative measures, but these measures were not necessarily used when communicating to the customers. Performance could also be distinguished as the internal resources needed to deliver performance or as the target operating environment to which product performance was adapted. Moreover, countermeasures can be used to communicate security differences to the customers. This is because countermeasures describe the observable behavior of the product, that is, what the system does to oppose threats and attacks. In the following, we compare these results with our review of the previous work (Section 4.2). In the previous work, we identified feature and softgoal models as the predominant ways to distinguish and explicate varying quality attributes. Our proposed design theory and artifacts used feature modeling as a basis: countermeasures were modeled as "security features". Since countermeasures describe system behavior, it seems reasonable to model them

70

Discussion

Table 7.1. RQ2: Distinguishing the product performance and security variants; synthesized from Table 5.1 and Table 6.3. The varying quality

Distinguished to the customers

attribute Case Nokia

Phone call capacity

Performance

As externally observable product performance As internal resources needed to deliver performance

Case Fathammer

Resource

As the target operating

Performance

consumption, refresh

environment to which performance

rate

was adapted

Artifacts

Countermeasures

Security

As countermeasures, either externally or internally observable

similarly to functional features. However, our study did not focus on the modeling notations, but on understanding the properties that vary and how these properties need to be communicated. After gaining this understanding, it is easier to propose a suitable modeling means. It has been said that the customer specification for a product quality attribute is either a quantifiable, measurable metric or a qualitative statement on a discrete, ordered scale (Siegmund et al., 2012b): examples include "latency less than 100 ms" and "high security". Based on our results, this division into hard and soft quality attributes is not very clear-cut. The resource consumption and refresh rates in Case Fathammer were quantifiable and measurable, but the differences were communicated to the customers as target operating environments, that is, as non-measurable, non-ordered statements. In contrast, countermeasures were discrete and often not ordered, but still could be measured and observed from the product. Hence, the countermeasures were distinguished similarly to functionality. It is possible that some other quality attributes, such as safety, can also be distinguished this way. We chose to represent and communicate security differences as countermeasures. In comparison, Mellado et al. (2008, 2010) define varying security requirements using countermeasures, but also using security goals, assets, threats, and vulnerabilities. Security requirements engineering is geared towards understanding and eliciting the customer needs before prematurely deciding on the security solutions (Firesmith, 2003). Within software product lines, and especially within configurable software product lines, the solutions to ensure security are mostly in place when the product variants are distinguished to the customers. Therefore, using

71

Discussion

countermeasures only seems to be more suitable in our study context: countermeasures describe what the system does in a way that can be observed and verified from the product. According to our study, quality attribute differences can be distinguished by describing the product itself, that is, as product quality attributes (ISO/IEC 25010, 2011). The product differences can be either externally or internally observable: this corresponds to the division of external and internal quality attributes (ISO/IEC 9126-1, 2001). Finally, it is also possible to distinguish the products by not describing the product itself, but by describing its environment. Halmans and Pohl (2003) state the customer is interested in essential variability, not in technical variability. Our results indicate that it is not straightforward to determine essential quality attribute variability. Even if established, external product measures exist, these measures are not necessarily the best way to communicate to the customer. In some cases, it may be more informative to tell that the product is optimized to a certain environment. In some other cases, the internal resources may describe the product more precisely. The latter is similar to the product line requirement "Weather station shall be a single chip system" (Kuusela and Savolainen, 2000): instead of describing product reliability, the requirement describes the internal resources. Our study indicates that the balance between essential and technical variability is especially challenging for security: countermeasures by nature are close to internal design, and countermeasures can be defined at different levels of abstraction (Fabian et al., 2010). The original countermeasures in Case Magento had been described in very technical terms. We represented such technical countermeasures at the requirement or specification level. Requirement-level countermeasures correspond to security requirements about prevention, detection or reaction (Firesmith, 2005), or to security use cases that mitigate misuse cases (Sindre and Opdahl, 2005). Despite this, it may be a challenge to abstract all internal behavior away from the countermeasures. Moreover, some internal behavior may even be helpful for the customer to fully understand the product security differences. Understanding the needed level of countermeasure abstraction calls for future work.

72

Discussion

Performance or security variability design strategy

Hardware design tactic (Case Nokia, performance)

Characterization in the model (exhibited in)

Is-a

Software design

Software design tactic (artifact, security)

Downgrading tactic (Case Nokia, performance)

Legend

Indirect variation (only in the literature, many attributes)

Trade-off tactic (Case Fathammer, performance)

Figure 7.2. RQ3: A model to classify the strategies for designing performance and security variability in product line architectures, synthesized from Figure 5.2 and Figure 6.4.

How to design the quality attribute differences in the product line architecture? (RQ3) As the result for performance, we proposed a theoretical model that classified the strategies for designing performance variability. Most strategies were instantiated in Case Nokia or in Case Fathammer. As the result for security, the design theory implied the use of security design tactics to implement countermeasure variability. To synthesize the results for performance and security, we propose a model that classifies the strategies for designing performance and security variability in product line architectures (Figure 7.2). To summarize, quality attribute differences can be designed either with software or hardware. Moreover, software design can either utilize purposeful design tactics to vary product quality or let quality attribute differences emerge indirectly from other variability. Finally, software design tactics can either trade off quality attributes or simply downgrade one quality attribute without trying to affect others. In the following, we compare these results with our synthesis of the previous work (Section 4.3). In the previous work, indirect variation has been the prevailing yet implicitly stated way to vary many quality attributes, and in particular, many performance attributes. The existing approaches seem focus on creating quality attribute differences indirectly by varying other features in the product. Consequently, quality attribute variability seems to be emergent in the product line, not something that is purposefully introduced

73

Discussion

in the design. This may be because feature modeling has been the dominant approach in the research community, and the research has simply extended feature models with quality attribute impacts. Indirect variation may be a useful strategy for varying or optimizing several quality attributes in variability-rich software product lines. Indirect variation strategy may also be suitable for very emergent, heavily architectural quality attributes: for example, dynamic memory consumption may be affected by almost all architectural entities. In contrast to the prevailing studies, Case Nokia and Case Fathammer both utilized explicit design tactics to vary performance. That is, the quality attribute variability was explicitly "designed in" as purposefully introduced variability mechanisms using which quality attributes could be altered. When the variability is a key selling point to the customer or represents a hard constraint, it makes more sense to explicitly design the mechanisms that create the needed differences than just rely on indirect variation. However, the difference between indirect variation and software design tactic strategies is not always clear-cut. For example, a varying algorithm (Siegmund et al., 2012b; Sinnema et al., 2006) can be treated as a varying performance tactic as well as a varying feature that causes indirect performance variation. In fact, it may be possible to both apply purposeful design tactics and thereafter use indirect variation to create additional differences. In the previous work, tactics and patterns have been identified as a way to design quality attribute variability (Hallsteinsen et al., 2003; Matinlassi, 2005; Kishi and Noda, 2000; Kishi et al., 2002). Also the idea of implementing variant-specific countermeasures in the architecture design through security tactics has been proposed (Fægri and Hallsteinsen, 2006). As a novel addition, our study identified two different kinds of design tactic strategies, downgrading and trading off, and gave industrial cases as instantiations of both kinds. In particular, downgrading was a suitable design strategy when the aim was to differentiate performance in pricing. The previous work mostly focuses on software as the means to design and implement quality attribute variability. Only a few studies imply that quality attribute differences can be created by having different hardware (Kuusela and Savolainen, 2000; Ishida, 2007). More often, the studies treat varying hardware as a constraint for resource consumption and time

74

Discussion

Table 7.2. RQ4: Deriving product variants with given performance or security, synthesized from Table 5.3 and Table 6.4.

Derivation task

Case Nokia

Case Fathammer

Artifacts

Performance

Performance

Security

System-supported

Manual porting

Knowledge-based

(re)configuration

and optimization

configuration with a separate configurator tool

Binding the

Through software

Through software

Through modeled

product line

and hardware

design tactics

design tactics

architecture

design tactics

variability Handling the

Minimized in the

Tested and tuned

Modeled as

impact of other

product line

manually for each

constraints in the

variability

architecture design

product

product line architecture

Ensuring the

Testing at the

Testing when

Model-based

quality

product-line level

deriving the

consistency

attributes are

product

met

behavior (Botterweck et al., 2008; Sinnema et al., 2006; Karatas et al., 2010), not as the means to create quality attribute differences. In contrast, we proposed hardware as a generalized strategy to design capacity and time behavior variability: this was an important strategy in Case Nokia and in its domain. The model in Figure 7.2 covers only strategies for performance and security variability, and some of the strategies were only instantiated for performance. It is not yet known whether the strategies in Figure 7.2 can be generalized to other quality attributes. For example, is varying the hardware design a reasonable way to vary usability?

How to derive a product variant with given quality attributes using the product line architecture? (RQ4) As the result for security, we proposed a design theory and artifacts to configure product variants to meet given security needs. As the result for performance, we described how the product variants were derived in Case Nokia and Case Fathammer. To synthesize the results for performance and security, we describe how the product variants were derived to meet given quality attributes using the product line architecture (Table 7.2). The derivation tasks were

75

Discussion

vastly different: they ranged from manual porting and optimization to customer-conducted, automated configuration or reconfiguration. Despite the differences, an essential part in the derivation task was to bind the variability of the software or hardware tactics introduced in the product line design. Besides binding the design tactics, the impact of other variability may need to be handled separately: this can be done at the productline or product level. With performance variability, the impact of other variability could either be manually tested and tuned during derivation, or minimized in the product line architecture design and completely ignored during derivation. Countermeasures were impacted by other variability similarly to functionality: any dependencies between the countermeasure realization and other architectural elements could be modeled as constraints. In the following, we compare these results with our synthesis of the previous work (Section 4.3). The majority of the previous work focuses on feature model derivation, externalized feature impacts, and on the algorithms needed to perform model-based feature derivation. In contrast, we focused on the product line architecture as the means to derive and handle the impact of other variability. Moreover, only the artifacts utilized models as the primary means of derivation. It has been argued that quality attributes cannot be directly derived similarly to functional features, that is, by simply selecting single features that represent product quality attributes in the feature models (Sincero et al., 2010). This is because the impact of other variability may affect the resulting product quality attributes. To contrast this, our results show that direct derivation from the design is possible, and that derivation can be conducted without externalizing or managing the impact of other variability. Although performance is an architecturally emergent quality attribute, the impact of other variability was not an issue in the cases. In Case Nokia, the impact of software variability to capacity was minimized in the design, and it was possible to directly derive capacity by selecting the desired downgrading level. Minimizing the impacts also helped testing: not all product combinations needed to be tested. In Case Fathammer, the impact of other variability on resource consumption and time behavior was high, but the impacts were manually checked during derivation without any externalized models. Thus, the difficulty of performing the

76

Discussion

manual configuration of performance described by Sinnema et al. (2006) was not encountered in Case Fathammer. When security is derived as countermeasures, it is not as emergent in the architecture as performance. This is because countermeasures describe product behavior. Our study indicated that countermeasures can be derived similarly to functionality: direct derivation is possible and the impact of other variability on countermeasure realization can be captured with constraints. Therefore, knowledge-based configuration is a suitable approach for security variability. One novelty of our work is to combine software variability management and knowledge-based configuration, which is an area that calls for future work (Hubaux et al., 2012).

7.2

Validity and Reliability

Validity means the degree to which the interpretation of the data corresponds to the phenomenon (Carmines and Zeller, 1979). Thus, validity is the degree to which relevant evidence supports a claim (Shadish et al., 2002). Validity can be studied at least as construct validity, internal validity and external validity (Shadish et al., 2002; Yin, 1994). Construct validity is about establishing correct operational measures for the concepts being studied (Yin, 1994): the particular instances on which the data are collected should represent the higher order constructs (Shadish et al., 2002). Internal validity is about causality: does the observed correlation between study constructs reflect a causal relationship (Shadish et al., 2002; Yin, 1994)? External validity refers to generalizing the study findings beyond the immediate case or setting (Yin, 1994), that is, whether the constructs and relations (Gregor, 2006) hold over variations in populations and settings (Shadish et al., 2002). In contrast, reliability concerns the extent to which an experiment, a study protocol, or a measurement can be repeated with similar results under similar conditions (Carmines and Zeller, 1979; Yin, 1994). In the following, we discuss threats to validity and reliability that are specific to each research method. Thereafter, we discuss the threats specific to the overall synthesis. Validity and Reliability in the Systematic Literature Review.

The system-

atic literature review in our study affects the validity of the results only partly. The results in this dissertation were mostly drawn from the case

77

Discussion

studies and the design science. As an exception, examples from the primary studies were used when building the theoretical models from the case studies. Hence, the previous work was mostly used to compare and argue the novelty of the results, not to build the results. Nevertheless, we assess the validity and reliability of our systematic literature review with the four questions used by Kitchenham et al. (2009). Firstly, are the inclusion and exclusion criteria described and appropriate (Kitchenham et al., 2009)? We believe this is a crucial aspect in a literature review that utilizes snowballing. Therefore, we spent effort and several iterations on formulating the criteria. Secondly, is the literature search likely to have covered all relevant studies (Kitchenham et al., 2009)? This is mostly decided by the ability of the snowballing protocol (Wohlin and Prikladnicki, 2013; Wohlin, 2014) and the way we applied it in this study. Some indication of the completeness is given by the high number of selected primary studies (139) compared with the number of selected studies (196) about any variability and not just in software product lines (Galster et al., 2014). We did not exclude any studies based on metadata only, which meant more detailed scrutiny of the primary studies. Thirdly, is the quality or validity of the primary studies assessed (Kitchenham et al., 2009)? Within the scope of this study, a full quality assessment was not done but only the level of empirical evidence was evaluated. Fourthly, are the individual studies and their data described (Kitchenham et al., 2009)? To keep focus, the primary studies were not described individually, but only through the synthesis. Validity and Reliability in the Case Studies.

The results drawn from the

case studies included a theoretical model of explanations, a theoretical model of descriptions, and descriptions of Case Fathammer and Case Nokia. In the following, we analyze the validity and reliability of these results (Table 7.3). There are several threats to construct validity. As the first threat, Case Nokia was conducted as post mortem: the product line was discontinued before it was taken into production. There is a threat that the collected data did not correctly represent concepts related to operation, customers, selling, or other characteristics of successful software product lines. This threat was mitigated by having the chief architects contrast the results to other products in the case company portfolio: capacity variability configuration through channel elements was a common phenomenon in subsequent and successful base stations.

78

Discussion

The data sources also caused threats to construct validity. Although interviews are typically a good source of rich qualitative data, we had only one interview in Case Fathammer and none in Case Nokia. The lacking richness was alleviated by having an involved participant as an author in Case Nokia. Also, as suggested by Yin (1994), we used multiple data sources, triangulated the data sources against each other, and validated the results with the key informants. As another threat, the data sources were mostly technical in nature. How well can technical data sources be used to measure customer considerations that were part of the results for RQ1 and RQ2? However, both chief architects and product managers must have an understanding of the stakeholders’ needs. Also the scope of the data collection may pose a threat to construct validity: performance variability was only a minor part of the collected data in Case Fathammer. Similarly, RQ2 and RQ4 were only minor aspects in Case Nokia. Nevertheless, there was enough data to answer the research questions to the extent given in this dissertation. Moreover, when writing this dissertation, the data on Case Fathammer was revisited to check against the higher-order concepts. As the final threat to construct validity, the data analysis in Case Nokia utilized only light-weight coding. A large part of the analysis was conducted through writing and informal discussions. Thus, are the operationalized high-level constructs grounded to data (Urquhart et al., 2010)? To alleviate this threat, the data was revisited during analysis to check against the newly formed concepts, and the results were validated. In contrast, internal validity was not a major concern. The results from the case studies did not involve any purely causal inferences in the form "if X, then Y", but were mostly given as descriptions. Although the results for RQ1 were given as a theoretical model of explanations, the explanations were stated in a form that does not necessarily imply causal inference. Instead, the explanations were about insufficient and unnecessary (Shadish et al., 2002) but affecting factors that contributed to the motivation: "Y was motivated by X". The explanations were additionally validated with the architects to ensure they were correctly built. However, there may be other rival explanations (Yin, 1994) that were not identified in this study. External validity is about generalizing the findings of a case study to other cases; however, the generalization should be done analytically to theories and not to populations (Yin, 1994; Runeson and Höst, 2009). To

79

Discussion

Table 7.3. The threats to validity and reliability of the results drawn from the case studies and the design science. RQ

Performance

Security

Case studies All

Design science case study post

Overall threats: not evaluated with

mortem; low number of interviews;

real stakeholders or in the real en-

only a minor aspect of the focus of

vironment; no meticulous data col-

Case Fathammer; only light-weight

lection and analysis procedures.

Overall threats:

coding in Case Nokia. RQ1

Results: theoretical model to ex-

Results: explanations of Case Ma-

plain.

gento and Case Shopping Mall.

Specific

threats:

data

sources

Specific threats: not from the artifacts but from the evaluation cases;

mostly technical.

only technical data sources. RQ2

Results: descriptions of Case Nokia

Results: description based on the

and Case Fathammer.

artifacts and design theory.

Specific

threats:

mostly technical;

data

sources

only a minor

Specific threats:

(no additional

identified)

aspect in the case study focus. RQ3

RQ4

Results: theoretical model to clas-

Results: description based on the

sify.

artifacts and design theory.

Specific threats: one class only iden-

Specific threats: only a minor aspect

tified in the literature.

in the artifacts.

Results: descriptions of Case Nokia

Results: description based on the

and Case Fathammer.

artifacts and design theory.

Specific threats: only a minor aspect

Specific threats:

in the case study focus.

identified)

(no additional

enable analytical generalization, the results to RQ1 and RQ3 were given in a generalized form as theoretical models (Table 7.3). If the results of a study are stated as theories, external validity is about establishing the domain to which the theories can be applied (Gregor, 2006). Therefore, the theory constructs were described in a domain-independent form, and the scope of each description and explanation was described as limits to generalization in Publication II. However, since the scope was established partly based on examples in the literature, partly analytically, some limits to generalizability may have been established incorrectly. In contrast, the results to RQ2 and RQ4 were given as case-specific descriptions (Table 7.3), and analytical generalization was left as future work. For reliability, the main tactic was to produce all data into written form and to establish a case study database (Yin, 1994).

80

Discussion

Validity and Reliability in the Design Science.

The results from the design

science were descriptions based on the proposed design theory and artifacts, and explanations based on the cases used to evaluate the artifacts. In the following, we analyze the validity and reliability of these results (Table 7.3). Construct validity can be studied as the extent to which the particular instances on which the data are collected represent the higher order constructs (Shadish et al., 2002), that is, how well Case Magento, Case Shopping Mall and the instantiated configurator tool represent the constructs in our design theory. An obvious concern to construct validity is whether the modeled countermeasures, features and components correspond to the reality of Case Magento. Compared with real case studies (Yin, 1994), the procedures of data collection were not as meticulous. Moreover, the modeling and configuration tasks were done by the researchers, and the resulting artifacts were not validated with the real stakeholders. Nevertheless, since Magento source code acted as one data source, it was relatively straightforward to elicit the components and their variability. Also, the Admin Panel and configuration file options served to identify the varying countermeasures and features. Constraints could be identified from the documentation. As another threat to construct validity, both Case Magento and Case Shopping Mall had a relatively low number of varying countermeasures and their role in the overall variability was quite small. Thus, how representational are they as cases for security variability? Nevertheless, Case Magento is a "slice of life" (Shaw, 2003) of security variability, at least compared with Case Shopping Mall. As the final threat to construct validity, the answers to research question RQ1 were not based on the proposed theory or artifacts, but on the characteristics of the cases. Since the data sources were mostly technical in nature, they did not much address explanations for security variability. Moreover, the instantiation and evaluation procedures did not validate these results. In design science, the internal validity is whether the causal correlations in the design theory have been correctly evaluated. In our design theory, the testable propositions were stated as: "if you want to achieve Y in situation A, the use of X helps". As a threat to internal validity, these causal correlations were not very well evaluated. The evaluation

81

Discussion

was done descriptively by the researchers by comparing with the current state of Case Magento. Thus, there was no application into the real environment nor evaluation with the real stakeholders. Finally, external validity can be studied as whether the proposed theory is applicable to different settings, that is, whether the outcome of applying the concrete artifacts is similar over variations in populations and settings (Shadish et al., 2002). It is not yet known whether the artifacts are applicable to other cases that fit the stated theory scope. To improve reliability, the data needed for the instantiation and configuration was recorded, along with the resulting artifacts. Validity of the Synthesized Results.

The answers to the research ques-

tions were synthesized to cover both performance and security (Section 7.1). For RQ1 and RQ3, the synthesized results were given as models that combined the explanations and classifications. This poses a threat to external validity: some classes and explanations may not be generalizable to both performance and security. To address this threat, we identified which parts of the models were specific to either performance or security. However, the limits of generalization for security descriptions and explanations were not established explicitly. For RQ2 and RQ4, the synthesized results were given as descriptions that were compared but not combined. Therefore, no specific threats to external validity stemmed from this synthesis.

82

8. Conclusions

8.1

Contributions

Our aim was to study why and how to vary quality attributes purposefully in a software product line. The study focused on performance and security as quality attributes. We conducted a systematic literature review on quality attribute variability, conducted two case studies on performance variability, and constructed a design theory and artifacts to represent and configure security variability. As the contribution, we proposed one model to explain the decision to vary, one model to classify the design strategies, and descriptions of how to distinguish and derive the product variants. The results show there are several reasons to purposefully vary performance and security. These attributes may be varied to better serve different customer needs or to conduct price differentiation. Additionally, the decision to vary may be motivated by technical matters, such as the need to better balance trade-offs, or the need to better adapt to different operating environment constraints. Two challenges must be addressed in the product line architecture: how to achieve desired differences between the products and how to handle the impact of other variability. Performance and security differences can be designed by introducing varying software or hardware tactics, or by relying on indirect variation. The impact of other variability can be either minimized or represented as constraints in the product line architecture design. Another option is to manually test and tune the impacts during the product derivation. The quality attribute differences can be communicated to the customers as observable product properties, which for security can be countermeasures. Alternatively, the differences can be communicated as the internal resources or as the target

83

Conclusions

operating environments. This dissertation focused on performance and security. Although some aspects of the results applied to both attributes, there were several differences. Performance has many established, quantifiable measures, which can be used directly to communicate the performance differences to the customers. Due to its emergent, architectural nature, performance is more difficult to design and derive. In addition to creating performance differences with explicit tactics, the impact of other variability may need to be handled to ensure the product has given performance. Nevertheless, hardware scaling is a relatively straightforward way to design capacity and time behavior variability. In comparison, security is more difficult to distinguish to the customers. We proposed that countermeasures can be used to communicate the observable, verifiable product differences. However, the challenge is to keep the countermeasures relevant to the customers. When security is varied as countermeasures, design and derivation are more straightforward, and even direct derivation is possible. Compared with the state-of-the-art, the novelty of our contribution is as follows. We focused on characterizing the complex phenomenon of quality attribute variability in its real context and with explicit data collection and analysis methods. It has been argued that identifying and characterizing problems should be the primary concern and building the solution should come only after that (Fuggetta, 1999). As a concrete example of our position, the majority of the studies on quality attribute variability focus on representation through feature models. Instead of focusing on how to represent, we focused on understanding the varying attributes and how such attributes need to be communicated. After gaining this understanding, a suitable modeling means could be proposed. In addition, we explicitly asked the question "why" and continued it by asking the question "how". Most studies start from the assumption that quality attributes need to be varied. At the same time, most studies focus on features and abstract away the architecture design, which is the key in varying quality attributes. In order to successfully scope the product line and to efficiently implement the products, the research on quality attribute variability needs to cover both customers and design. Besides addressing the product line architecture, we analyzed the reasons to vary and the means to distinguish from the customer point of view. Such a combination of both technical and non-technical viewpoints is a novel aspect in this dissertation.

84

Conclusions

The implications for the state-of-the-practice are twofold. When the organizations plan quality attribute variability, they should not focus only on the technical possibilities to vary. Instead, one should understand the needs of the customers as well as the technical constraints and trade-offs. Only after that an informed decision about whether to vary quality attributes can be made. The organizations also need to decide how to create differences in quality attributes and handle the impact of other variability in an efficient way. We identified different ways to do this. For example, downgrading is a suitable way to design performance variability for price differentiation, and minimizing the impact of other variability makes derivation and testing easier.

8.2

Future Work

Although we contributed by studying quality attribute variability in its real context, there is still ample room for more case studies on this topic. One interesting aspect is to study the industrial software product lines that have decided to vary quality attributes. In which kinds of product lines is price differentiation with quality attributes reasonable? Is price differentiation more suitable within business-to-business domains, where the customer can estimate the return of investment for a higher quality product? Is quality attribute variability more common in embedded or critical domains, where operating environment constraints and product trade-offs may be more dominant? In such domains, what is the role of hardware as both the reason to vary and as the strategy to design quality attribute variability? Within our study, we focused on building a theory from the empirical data (Stol and Fitzgerald, 2013). Therefore, further research is needed to test the theoretical models and hence to close the theorizing cycle. This would involve setting hypotheses based on the proposed models and evaluating them in real environments. Such theory testing could be conducted as a qualitative study, for example, as another case study, or as a quantitative survey. Similarly, we focused on performance and security only, and thus applicability to other quality attributes calls for future work. To what extent can the proposed models be generalized to cover other quality attributes? Are there certain classes of quality attributes, such as security, safety and reliability, that can be varied similarly? Within industrial software prod-

85

Conclusions

uct lines, are the varied quality attributes always runtime-observable, or does it also make sense to vary development-time quality attributes? Moreover, the dual nature of design and derivation should be further studied. The tension between explicit design tactics and emergent, indirect variation in the product line architecture definitely calls for future work, and in particular, within real life contexts. For example, is using merely indirect variation as the way to design quality attribute differences more suitable for resource consumption? Further, given the difficulties of testing even functional variability, testing quality attribute variability is a real challenge. How to design quality attribute variability to minimize the number of combinations that need to be tested? What kind of testing can or must be conducted as part of domain engineering, and how to take into account the impact of other variability? In which cases it is sufficient to resort to testing and tuning during the product derivation?

86

Bibliography

Alexander I. 2003. Misuse cases: use cases with hostile intent. IEEE Software, 20(1), 58–66. Asadi M, Bagheri E, Gaševi´c D, Hatala M, and Mohabbati B. 2011. Goal-driven Software Product Line Engineering. In: ACM Symposium on Applied Computing (SAC). Asadi M, Bagheri E, Mohabbati B, and Gaševi´c D. 2012. Requirements Engineering in Feature Oriented Software Product Lines: An Initial Analytical Study. In: Software Product Line Conference (SPLC) - Volume 2. Asikainen T, Männistö T, and Soininen T. 2007. Kumbang: A Domain Ontology for Modelling Variability in Software Product Families. Advanced Engineering Informatics Journal, 21(1), 23–40. Avizienis A, Laprie J-C, Randell B, and Landwehr C. 2004. Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Transactions on Dependable and Secure Computing, 1(1), 11–33. Bagheri E, Di Noia T, Ragone A, and Gasevic D. 2010. Configuring Software Product Line Feature Models Based on Stakeholders’ Soft and Hard Requirements. In: Software Product Line Conference (SPLC). Barbacci M, Longstaff T, Klein M, and Weinstock C. 1995. Quality Attributes. Tech. rept. CMU/SEI-95-TR-021. SEI. Bartholdt J, Medak M, and Oberhauser R. 2009. Integrating quality modeling with feature modeling in software product lines. In: International Conference on Software Engineering Advances (ICSEA). Bass L, Clements P, and Kazman R. 1998. Software Architecture in Practice. Addison-Wesley. Bass L, Clements P, and Kazman R. 2003. Software Architecture in Practice. 2nd edn. Addison-Wesley. Belobaba P, Odoni A, and Barnhart C. 2009. The Global Airline Industry. John Wiley Sons. Benavides D, Martín-Arroyo P Trinidad, and Cortés A Ruiz. 2005. Automated Reasoning on Feature Models. In: Conference on Advanced Information Systems Engineering (CAiSE).

87

Bibliography

Benavides D, Segura S, and Cortés A Ruiz. 2010. Automated analysis of feature models 20 years later: A literature review. Information Systems, 35(6), 615– 636. Berntsson Svensson R, Gorschek T, Regnell B, Torkar R, Shahrokni A, and Feldt R. 2012. Quality Requirements in Industrial Practice – An Extended Interview Study at Eleven Companies. IEEE Transactions on Software Engineering, 38(4), 923–935. Boehm B, Brown J, Kasper H, Lipow M, Macleod G, and Merrit M. 1978. Characteristics of Software Quality. North-Holland Publishing Company. Bosch J. 2000. Design and Use of Software Architectures: Adapting and Evolving a Product-Line Approach. Addison-Wesley. Bosch J. 2002. Maturity and Evolution in Software Product Lines: Approaches, Artefacts and Organization. In: Software Product Line Conference (SPLC). Botterweck G, Thiel S, Nestor D, bin Abid S, and Cawley C. 2008. Visual Tool Support for Configuring and Understanding Software Product Lines. In: Software Product Line Conference (SPLC). Carmines EG, and Zeller RA. 1979. Reliability and Validity Assessment. Sage Publications. Chen L, and Babar M Ali. 2011. A systematic review of evaluation of variability management approaches in software product lines. Information and Software Technology, 53(4), 344–362. Clements P, and Northrop L. 2001. Software Product Lines—Practices and Patterns. Addison-Wesley. Czarnecki K, Helsen S, and Eisenecker UW. 2005. Formalizing CardinalityBased Feature Models and Their Specialization. Software Process: Improvement and Practice, 10(1), 7–29. Etxeberria L, and Sagardui G. 2008. Variability Driven Quality Evaluation in Software Product Lines. In: Software Product Line Conference (SPLC). Etxeberria L, Sagardui G, and Belategi L. 2007. Modelling Variation in Quality Attributes. In: Workshop on Variability Modelling of Software-intensive Systems (VaMOS). Fabian B, Gurses S, Heisel M, Santen T, and Schmidt H. 2010. A comparison of security requirements engineering methods. Requirements Engineering, 15(1), 7–40. Fægri TE, and Hallsteinsen S. 2006. A Software Product Line Reference Architecture for Security. In: Käkölä T, and Dueñas JC (eds), Software Product Lines — Research Issues in Engineering and Management. Springer. Felfernig A, Hotz L, Bagley C, and Tiihonen J. 2014. Knowledge-based Configuration: From Research to Business Cases. Morgan Kaufmann. Fettke P, Houy C, and Loos P. 2010. On the Relevance of Design Knowledge for Design-Oriented Business and Information Systems Engineering – Conceptual Foundations, Application Example, and Implications. Business and Information Systems Engineering, 2(6), 347–358.

88

Bibliography

Firesmith D. 2003. Engineering Security Requirements. Journal of Object Technology, 2(1), 53–68. Firesmith D. 2004. Specifying reusable security requirements. Journal of Object Technology, 3(1), 61–75. Firesmith D. 2005. A taxonomy of security-related requirements. In: Workshop on High Assurance Systems (RHAS’05). Fowler M. 2003. Design—Who needs an architect? IEEE Software, 20(5), 11–13. Fuggetta A. 1999. Some Reflections on Software Engineering Research. SIGSOFT Software Engineering Notes, 24(1), 74–77. Galster M, Weyns D, Tofan D, Michalik B, and Avgeriou P. 2014. Variability in Software Systems—A Systematic Literature Review. IEEE Transactions on Software Engineering, 40(3), 282–306. Gimenes IM., Fantinato M, and de Toledo MBF. 2008. A Product Line for Business Process Management. In: Software Product Line Conference (SPLC). Gonzales-Baixauli B, Prado Leite JCS, and Mylopoulos J. 2004. Visual variability analysis for goal models. In: Requirements Engineering Conference (RE). González-Huerta J, Insfran E, Abrahão S, and McGregor JD. 2012. Nonfunctional Requirements in Model-driven Software Product Line Engineering. In: International Workshop on Nonfunctional System Properties in Domain Specific Modeling Languages. Gregor S. 2006. The Nature of Theory in Information Systems. MIS Quarterly, 30(3), 611–642. Gregor S, and Jones D. 2007. The Anatomy of a Design Theory. Journal of the Association for Information Systems, 8(5), 312–335. Guo J, White J, Wang G, Li J, and Wang Y. 2011. A genetic algorithm for optimized feature selection with resource constraints in software product lines. Journal of Systems and Software, 84(12), 2208–2221. Hafiz M, Adamczyk P, and Johnson RE. 2007. Organizing Security Patterns. IEEE Software, 24(4), 52–60. Hallsteinsen S, Fægri TE, and Syrstad M. 2003. Patterns in Product Family Architecture Design. In: Workshop on Software Product Family Engineering (PFE). Hallsteinsen S, Schouten G, Boot GJ, and Fægri TE. 2006a. Dealing with Architectural Variation in Product Populations. In: Käkölä T, and Dueñas JC (eds), Software Product Lines – Research Issues in Engineering and Management. Springer. Hallsteinsen S, Stav E, Solberg A, and Floch J. 2006b. Using product line techniques to build adaptive systems. In: Software Product Line Conference (SPLC). Halmans G, and Pohl K. 2003. Communicating the Variability of a SoftwareProduct Family to Customers. Software and Systems Modeling, 2(1), 15–36.

89

Bibliography

Hendrickson S, Wang Y, Hoek A, Taylor R, and Kobsa A. 2009. Modeling PLA Variation of Privacy-Enhancing Personalized Systems. In: Software Product Line Conference (SPLC). Hevner AR, March ST, Park J, and Ram S. 2004. Design Science in IS Research. MIS Quarterly, 28(1), 75–105. Holma H, and Toskala A (eds). 2000. WCDMA for UMTS: Radio access for third generation mobile communications. Wiley. Hubaux A, Jannach D, Drescher C, Murta L, Männistö T, Heymans P, Czarnecki K, Nguyen T, and Zanker M. 2012. Unifying Software and Product Configuration: A Research Roadmap. In: Configuration Workshop (ConfWS). Huberman AM, and Miles MB. 1994. The Qualitative Data Analysis. 2 edn. Sage Publications. IEEE Std 1061-1998. 1998. Methodology.

IEEE Standard for a Software Quality Metrics

IEEE Std 610.12-1990. 1990. IEEE Standard Glossary of Software Engineering Terminology. IETF RFC 4949. 2007. Internet Security Glossary, Version 2. Ishida Y. 2007. Software Product Lines Approach in Enterprise System Development. In: Software Product Line Conference (SPLC). ISO/IEC 15408-1. 1999. Information technology — Security techniques — Evaluation criteria for IT security — Part 1: Introduction and general model. ISO/IEC 25010. 2011. Software Engineering—Product quality—Part 1: Quality model. ISO/IEC 42010. 2011. Systems and Software Engineering—Architecture Description. ISO/IEC 9126-1. 2001. Systems and software engineering—Systems and software Quality Requirements and Evaluation (SQuaRE)— System and software quality models. Jansen A, and Bosch J. 2005. Software Architecture as a Set of Architectural Design Decisions. In: Working IEEE/IFIP Conference on Software Architecture (WICSA). Jarzabek S, Yang B, and Yoeun S. 2006. Addressing quality attributes in domain analysis for product lines. IEE Proceedings Software, 153(2), 61–73. Jick TD. 1979. Mixing Qualitative and Quantitative Methods: Triangulation in Action. Administrative Science Quarterly, 24(4), 602–611. Jorgensen M, and Shepperd M. 2007. A Systematic Review of Software Development Cost Estimation Studies. IEEE Transactions on Software Engineering, 33(1), 33–53. Kang KC, Cohen SG, Hess JA, Novak WE, and Peterson AS. 1990. FeatureOriented Domain Analysis (FODA) Feasibility Study. Tech. rept. CMU/SEI90-TR-21, ADA 235785. Software Engineering Institute.

90

Bibliography

Kang KC, Kim S, Lee J, Kim K, Shin E, and Huh M. 1998. FORM: A featureoriented reuse method with domain-specific reference architectures. Annals of Software Engineering, 5(1), 143–168. Kang KC, Lee J, and Donohoe P. 2002. Feature-Oriented Product Line Engineering. IEEE Software, 19(4), 58–65. Karatas AS, Oguztuzun H, and Dogru A. 2010. Mapping Extended Feature Models to Constraint Logic Programming over Finite Domains. In: Software Product Line Conference (SPLC). Kishi T, and Noda N. 2000. Aspect-Oriented Analysis of Product Line Architecture. In: Software Product Line Conference. Kishi T, Noda N, and Katayama T. 2002. A Method for Product Line Scoping Based on a Decision-Making Framework. In: Software Product Line Conference (SPLC). Kitchenham B. 2004. Procedures for Performing Systematic Reviews. Tech. rept. TR/SE-0401 / 0400011T.1. Keele University / NICTA. Kitchenham B, Pearl Brereton O, Budgen D, Turner M, Bailey J, and Linkman S. 2009. Systematic literature reviews in software engineering—A systematic literature review. Information and Software Technology, 51(1), 7–15. Kuusela J, and Savolainen J. 2000. Requirements engineering for product families. In: International Conference on Software Engineering (ICSE). Laguna MA, and González-Baixauli B. 2008. Product Line Requirements: MultiParadigm Variability Models. In: Workshop on Requirements Engineering. Lee J, Kang KC, Sawyer P, and Lee H. 2014. A holistic approach to feature modeling for product line requirements engineering. Requirements Engineering, 19(4), 377–395. Lee K, and Kang KC. 2010. Using Context as Key Driver for Feature Selection. In: Software Product Line Conference (SPLC). Matinlassi M. 2005. Quality-driven Software Architecture Model Transformation. In: Working IEEE/IFIP Conference on Software Architecture (WICSA). McCall JA, Richards PK, and Walters GF. 1977. Factors in Software Quality. Tech. rept. TR-77-369. RADC. McGregor JD, Northrop LM, Jarrad S, and Pohl K. 2002. Initiating software product lines. IEEE Software, 19(4), 24–27. Mellado D, Fernández-Medina E, and Piattini M. 2008. Towards Security Requirements Management for Software Product Lines: A Security Domain Requirements Engineering Process. Computer Standards and Interfaces, 30(6), 361–371. Mellado D, Fernández-Medina E, and Piattini M. 2010. Security requirements engineering framework for software product lines. Information and Software Technology, 52(10), 1094–1117. Mingers J. 2001. Combining IS Research Methods: Towards a Pluralist Methodology. Information Systems Research, 12(3), 240–259.

91

Bibliography

Montagud S, and Abrahão S. 2009. Gathering Current Knowledge About Quality Evaluation in Software Product Lines. In: Software Product Line Conference (SPLC). Myllärniemi V, Männistö T, and Raatikainen M. 2006. Quality Attribute Variability within a Software Product Family Architecture. In: Conference on Quality of Software Architectures (QoSA), vol. 2. Myllärniemi V, Raatikainen M, and Männistö T. 2007. KumbangTools. In: Software Product Line Conference (SPLC), vol. 2. Mylopoulos J, Chung L, and Nixon B. 1992. Representing and Using Nonfunctional Requirements: A Process-Oriented Approach. IEEE Transactions on Software Engineering, 18(6), 483–497. Mylopoulos J, Chung L, Liao S, W Huaiqing, and Yu E. 2001. Exploring alternatives during requirements analysis. IEEE Software, 18(1), 92–96. Niemelä E, and Immonen A. 2007. Capturing quality requirements of product family architecture. Information and Software Technology, 49(11–12). Niemelä E, Matinlassi M, and Taulavuori A. 2004. Practical Evaluation of Software Product Family Architectures. In: Software Product Line Conference (SPLC). Patton MQ. 1990. Qualitative Evaluation and Research Methods. 2nd edn. Sage Publications. Peffers K, Tuunanen T, Rothenberger MA, and Chatterjee S. 2007. A Design Science Research Methodology for Information Systems Research. Journal of Management Information Systems, 24(3), 45–77. Phillips R. 2005. Pricing and Revenue Optimization. Stanford University Press. Quinton C, Rouvoy R, and Ducein L. 2012. Leveraging Feature Models to Configure Virtual Appliances. In: Workshop on Cloud Computing Platforms (CloudCP). Regnell B, Berntsson-Svensson R, and Olsson T. 2008. Supporting Roadmapping of Quality Requirements. IEEE Software, 25(2), 42–47. Rozanski N, and Woods E. 2011. Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives. 2nd edn. Addison-Wesley. Runeson P, and Höst M. 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering, 14(2), 131– 164. Sabin D, and Weigel R. 1998. Product Configuration Frameworks — A Survey. IEEE Intelligent Systems, 13(4), 42–49. Shadish WR, Cook TD, and Campbell DT. 2002. Experimental and QuasiExperimental Designs for Generalized Causal Inference. Houghton Mifflin. Shaw M. 2002. What Makes Good Research in Software Engineering? International Journal on Software Tools for Technology Transfer, 4(1), 1–7. Shaw M. 2003. Writing Good Software Engineering Research Papers. In: International Conference on Software Engineering (ICSE).

92

Bibliography

Siegmund N, Kolesnikov SS, Kastner C, Apel S, Batory D, Rosenmuller M, and Saake G. 2012a. Predicting performance via automated feature-interaction detection. In: International Conference on Software Engineering (ICSE). Siegmund N, Rosenmüller M, Kuhlemann M, Kastner C, Apel S, and Saake G. 2012b. SPL Conqueror: Toward optimization of non-functional properties in software. Software Quality Journal, 20(3-4), 487–517. Siegmund N, Rosenmuller M, Kastner C, Giarrusso PG, Apel S, and Kolesnikov SS. 2013. Scalable prediction of non-functional properties in software product lines: Footprint and memory consumption. Information and Software Technology, 55(3), 491–507. Simons P, Niemelä I, and Soininen T. 2002. Extending and Implementing the Stable Model Semantics. Artificial Intelligence, 138(1–2), 181–234. Sincero J, Spinczyk O, and Schröder-Preikschat W. 2007. On the Configuration of Non-Functional Properties in Software Product Lines. In: Software Product Line Conference (SPLC), volume 2. Sincero J, Schroder-Preikschat W, and Spinczyk O. 2010. Approaching Nonfunctional Properties of Software Product Lines: Learning from Products. In: Asian-Pasific Software Engineering Conference (APSEC). Sindre G, and Opdahl AL. 2005. Eliciting security requirements with misuse cases. Requirements Engineering, 10(1), 34–44. Sinnema M, Deelstra S, Nijhuis J, and Bosch J. 2006. Modeling Dependencies in Product Families with COVAMOF. In: Engineering of Computer Based Systems (ECBS). Smith CU, and Williams LG. 2002. Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software. Addison-Wesley. Soininen T, Tiihonen J, Männistö T, and Sulonen R. 1998. Towards a General Ontology of Configuration. AI EDAM, 12(4), 357–372. Soininen T, Niemelä I, Tiihonen J, and Sulonen R. 2001. Representing Configuration Knowledge With Weight Constraint Rules. Tech. rept. SS-01-01. AAAI. Soltani S, Asadi M, Gasevic D, Hatala M, and Bagheri E. 2012. Automated Planning for Feature Model Configuration based on Functional and NonFunctional Requirements. In: Software Product Line Conference (SPLC). Staples M, and Niazi M. 2007. Experiences using systematic review guidelines. Journal of Systems and Software, 80(9), 1425–1437. Stevens SS. 1946. On the Theory of Scales of Measurement. Science, 103(2684), 677–680. Stol K-J, and Fitzgerald B. 2013. Uncovering Theories in Software Engineering. In: SEMAT Workshop on General Theory of Software Engineering (GTSE). Strauss A, and Corbin J. 1998. Basics of Qualitative Research. 2nd edn. Sage Publications. Street J, and Gomaa H. 2006. An Approach to Performance Modeling of Software Product Lines. In: Workshop on Modeling and Analysis of Real-Time and Embedded Systems.

93

Bibliography

Sun H, Lutz R, and Basu S. 2009. Product-Line-Based Requirements Customization for Web Service Compositions. In: Software Product Line Conference (SPLC). Svahnberg M, van Gurp J, and Bosch J. 2005. A taxononomy of variability realization techniques. Software—Practice and Experience, 35(8), 705–754. Tawhid R, and Petriu DC. 2011. Automatic Derivation of a Product Performance Model from a Software Product Line Model. In: Software Product Line Conference (SPLC). Tun TT, Boucher Q, Classen A, Hubaux A, and Heymans P. 2009. Relating Requirements and Feature Configurations: A Systematic Approach. In: Software Product Line Conference (SPLC). Urquhart C, Lehmann H, and Myers MD. 2010. Putting the theory back into grounded theory: guidelines for grounded theory studies in information systems. Information Systems Journal, 20(4), 357–381. van Gurp J, Prehofer C, and di Flora C. 2008. Experiences with Realizing Smart Space Web Service Applications. In: Consumer Communications and Networking Conference (CCNC). Wang Y, Kobsa A, Hoek A, and White J. 2006. PLA-based Runtime Dynamism in Support of Privacy-Enhanced Web Personalization. In: Software Product Line Conference (SPLC). Weiss DM. 2008. The Product Line Hall of Fame. In: Software Product Line Conference (SPLC). White J, Schmidt DC, Wuchner E, and Nechypurenko A. 2007. Automating Product-Line Variant Selection for Mobile Devices. In: Software Product Line Conference (SPLC). White J, Dougherty B, and Schmidt DC. 2009. Selecting highly optimal architectural feature sets with Filtered Cartesian Flattening. Journal of Systems and Software, 82(8), 1268–1284. Wohlin C. 2014. Guidelines for Snowballing in Systematic Literature Studies and a Replication in Software Engineering. In: Conference on Evaluation and Assessment in Software Engineering. Wohlin C, and Prikladnicki R. 2013. Systematic literature reviews in software engineering. Information and Software Technology, 55(6), 919–920. Yin RK. 1994. Case Study Research. 2nd edn. Sage Publications. Yu Y, do Prado Leite JCS, Lapouchnian A, and Mylopoulos J. 2008. Configuring features with stakeholder goals. In: Symposium On Applied Computing (SAC).

94

Publication I

Myllärniemi, Raatikainen, Männistö. A Systematically Conducted Literature Review: Quality Attribute Variability in Software Product Lines. In Software Product Line Conference (SPLC), Brazil, pp.41–45, August 2012.

c 2012 ACM.

Reprinted with permission.

95

A Systematically Conducted Literature Review: Quality Attribute Variability in Software Product Lines Varvana Myllärniemi, Mikko Raatikainen, Tomi Männistö Aalto University, P.O. Box 15400, 00076 Aalto, Finland

[email protected], [email protected], [email protected]

ABSTRACT

We present a systematically conducted literature review on quality attribute variability in software product lines. The method is adapted from the systematic literature review (SLR) guidelines [25]. The results conceptualize and elaborate a classification that includes dimensions for a rationale for and a means of capturing quality attribute variability. Further, specific quality attributes and empirical evidence on varying quality attributes are studied. There exists one literature review on quality attribute variability in software product lines [8]; however, the focus was only on modeling aspects, and the method was not reported. Further, there exists a systematic literature review on quality evaluation of software product lines [29], but the analysis did not cover quality attribute variability.

Typically, products in a software product line differ by their functionality, and quality attributes are not intentionally varied. Why, how, and which quality attributes to vary has remained an open issue. A systematically conducted literature review on quality attribute variability is presented, where primary studies are selected by reading all content of full studies in Software Product Line Conference. The results indicate that the success of feature modeling influences the proposed approaches, different approaches suit specific quality attributes differently, and empirical evidence on industrial quality variability is lacking.

Categories and Subject Descriptors

2.

D.2.13 [Software Engineering]: Reusable Software

General Terms Theory, Performance, Security

Keywords Quality attribute, Variability, Systematic literature review

1.

METHOD

The research method was a systematically conducted literature review, i.e., an adaptation from the SLR guidelines [25]. The major modification was not to use search strings, but primary studies were selected by reading through all content in all full studies in SPLC conferences. The reason for this was twofold. Firstly, we did not want to exclude any studies based on title and abstract only, since we knew several relevant studies, e.g. [30], that were not recognizable by their title or abstract. Secondly, it was impossible to construct a search string that would result in both relevant papers as well as give an amount of studies that could be subjected to reading all content through. As an example, one potential search string resulted in 122190 hits in IEEE database when subjecting the search on all content; and 408 hits when searching on metadata only and thus excluding studies based on abstract and title. There were also minor modifications to [25]. Firstly, we did not assess the quality of the primary studies nor exclude any studies based on quality, mostly because the main focus of the primary studies was not on quality attribute variability. Secondly, we intentionally concentrated on qualitative instead of quantitative analysis. Quantitative analysis is beneficial for accumulating empirical evidence. In an immature topic, the primary studies tend to be qualitative or formative, proposing, e.g., a method. Therefore, the analysis and conclusions of the secondary study were qualitative, and generalizations were analytical [40]. The research questions were set as follows. RQ1: What is the rationale for varying quality attributes in software product lines? RQ2: How is quality attribute variability in software product lines captured? RQ3: Which specific quality attributes are varying?

INTRODUCTION

Software product lines have emerged as an important approach for efficiently developing varying software products [4]. In order to cope with this, software product line assets must explicitly manage and handle variability. Variability is the ability of a system to be efficiently extended, changed, customized or configured for use in a particular context [35]. Quality attributes, such as performance or reliability, are often defined via a taxonomy of characteristics, e.g., [17]. Most studies in software product lines concentrate on functional variability. Thus, the state-of-the-art on quality attribute variability needs clarification. Further, there are reasons against intentional quality variability within industrial product lines: quality variability that affects architecture is difficult to be realized and is to be avoided [13]. Thus, the existence of industrial quality variability needs to be studied.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SPLC’12 September 02-07 2012, Salvador, Brazil Copyright 2012 ACM 978-1-4503-1094-9/12/09 ...$15.00.

41

The rationale also affects the strategy for achieving quality variability. The first rationale, variation in the user and usage needs, does not cause any variants as such, but quality attribute variability has to be intentionally brought to the product line. This strategy is ”differentiation” (Figure 1). Differentiation strategy needs a solution to create and derive variability in quality attributes. Examples of differentiation strategies include deriving feature configurations [27, 2], web personalization components [38], or service compositions [10, 34] to match a particular quality attribute requirement. The second rationale, variation in hardware, resources and functionality, causes variation in quality attributes as such. Thus, there are two possible strategies (Figure 1). The first strategy is ”impact management”: just manage the quality attribute variation caused by the rationale, without building explicit mechanisms that try to affect resulting quality attributes. For example, variation in the game refresh time is caused by and evaluated against feature variability [7], without trying to actively select a particular feature configuration. The second strategy is ”adaptation”: the software is intentionally adapted to varying hardware, resources, or functionality, and explicit variation points or selection mechanisms are built for this purpose. For example, [14, 39] both derive a component composition that takes into account the varying amount of available memory.

Table 1: Study selection Retrieving full studies from SPLC 2000 to SPLC 2010 = 221 studies Reading all content (title, abstract, body, figures) = 21 match, 18 borderline studies Reading the match and borderline studies again = 29 match studies, selected for analysis

RQ4: What empirical evidence exists about varying quality attributes in industrial software product lines? Table 1 describes how primary studies were selected. The first author did the manual reading and applied predefined inclusion and exclusion criteria in the selection: Inclusion: says explicitly or uses an example/case where quality attributes vary in a software product line. Varying means that different products in a software product line have different levels of quality attributes. Exclusion: contributes to quality variability by only using common feature definitions: a feature ”is a user-visible aspect, quality, or characteristic” [19] or ”is specified by a set of functional and quality requirements” [4]. These studies had too little contribution to be subjected to analysis. Exclusion: does not address quality attributes observable via execution [3]. Varying quality attributes are argued to be observable via execution [6]. In fact, no concrete counterexamples were found in the manual reading. Out of 221 initial studies, 29 studies were subjected to further analysis. The analysis was conducted in a bottomup fashion, by first conceptualizing findings in the primary studies, and then elaborating categories from such conceptualizations. To assess researcher bias, a small-scale study selection replication was conducted by the second author. The first and the second agreed on the important match studies, but the first author accepted a slightly wider range of borderline studies. Also, no false negatives were found.

3.

3.2

Capturing quality attribute variability

Varying quality attributes are typically captured, i.e., modelled or specified in an explicit form. Two dimensions are identified (Figure 2). Firstly, quality attributes can be captured as quality attribute features. Quality attributes are specified as featurelike entities, organized into trees, and variability is represented with mandatory, optional, and alternative relations. Quality attribute features can also have a refinement hierarchy: a quality feature can be composed into concerns, which can then be composed into scenarios [7]. As an example of this category, optional feature ”Usability” has alternative child features ”Usability for handicapped”, ”Usability for APT resident”, and ”Usability for VIPs” [27]. Quality attribute features are used in [27, 7, 10, 12, 11, 9, 20]. Not all studies call quality attribute features as ”features”, but the topology and relations resemble features very much. Secondly, quality attributes can be captured by embedding them to other entities. Typically, quality is captured as a feature or component attribute, and variability is achieved by varying either attribute values or features and components. As an example, leaf features are attached with attributes that characterize memory consumption of that feature; composite features then specify how memory consumption is aggregated over its constituent child features [37].

RESULTS

In the following, each section presents the results for one research question.

3.1 Rationale for varying quality attributes Two reasons to vary quality attributes are identified. The first rationale (Figure 1) is variation in the user and usage needs: some users prefer or require a different level of quality than others. Such a rationale may stem from different user, physical, social and business contexts [27], different geographical market segments [20], ”high-end” and ”low-end” segments [23, 24, 26], different privacy legislations and users’ preferences [38, 15], dynamic usage changes, like user’s hands becoming busy with other things [14], and even evolution of the data volumes and load over time [16]. The second rationale (Figure 1) is variation in hardware, resources, or functionality that directly affect the product quality attributes. Examples include different CPU and RAM capabilities that affect performance and memory consumption [39]; variation in network capacity, batteries running low, or devices running out of memory [14]; variation in product hardware and chipsets, which may cause the need to tune software to achieve optimal performance [22]; and variation in functional features, which may affect error rate and memory consumption [32].

Figure 1: Rationale and possible strategies

42

e.g., variability in response time and data search [23, 24], and variability in load and data volumes [16]. Security variability is often addressed through functional variability: security is either realized with or affected by functionality. An example is [9], where security variability is achieved by varying functionality, like different access control methods. Another example is privacy variability in personalized web systems [38, 15], where personalization is characterized by privacy-violating functionality (e.g. ”Cross-site tracking”). Further, security may have an order, like ”low, medium, high” [1, 34], but even these levels are related to functionality (e.g., to encryption and dispatch service [34]). Usability variability is mostly linked to functionality. ”Usability for handicapped people” is enabled by extended elevator door time, whereas ”Usability for APT residents” is enabled by cancelling elevator calls [27]. Alternatives ”Speechbased UI” and ”Text-based UI” imply different user interaction [14]. Usability is also captured as subjective user experience, like optional ”Passenger comfort” [27] or ”Attractive graphics” [7], which is then realized by functional features. Other quality attributes, like reliability, correctness, accuracy, or error rate, are also mentioned [2, 20, 27, 30, 32]. Table 2 relates specific quality attributes to the dimensions described in Figure 2. Some categories are more heavily used for certain quality attributes. Memory consumption and performance lean more towards numeric and embedded approaches. In contrast, security and usability are often handled as discrete, separate features, perhaps since they are often treated through functionality. This may indicate that there is no ”one-size-fits-all” approach for quality variability.

Figure 2: Dimensions of capturing quality variability Embedded approaches are used for features in [37, 2, 1], and for components in [39, 14, 38, 15]. Thirdly, quality attributes are captured by other means: as variation points [34], dependencies [32], or via hardware constraints to software features [21, 5]. Quality attributes can also be specified by enumerating product variants [23, 24]; this is typically used for scoping purposes. The second dimension in Figure 2 characterizes the possible variant space for quality attribute variants. Firstly, the variant space can characterized with a numeric metric, like integer for memory consumption [37, 21, 14]. Secondly, the variant space can consist of discrete, ordered levels, like ”high, medium, low” for usability [7], ”5s, 15s, 30s” for response time [10], and ”128, 256” for encryption type [34]. Thirdly, the variants can be captured as discrete characteristics without any specific order: ”Keeping n user session data” for privacy [38], and ”Minimum waiting time” for performance [27]. It is unknown whether all combinations from two dimensions (Figure 2) make sense. For example, the studies do not combine quality attribute features with numeric variants, but use either discrete ordered levels or discrete characteristics instead. This is perhaps due to the discrete and finite nature of quality features. When quality attributes are embedded, for example, as attributes in features, the variants can be both numeric as well as discrete. Finally, discrete, ordered variants, like ”high, medium, low”, are used both as quality features as well as embedded into other entities.

3.4

Cases, empirical evidence

The empirical evidence on industrial software product lines with varying quality attributes is scarce. Firstly, two studies discuss direct empirical evidence on varying quality attributes. In the first study, three companies that have quality variability are described [30]. The varying quality attributes are observable via execution (security, performance, accuracy), and variation between wireless and wired communication causes quality variation [30]. Unfortunately, capturing or realizing quality variability is not covered. A second study seems to contain evidence on accuracy and coverage variability in the railway domain [24]. Secondly, there are studies that utilize an example and mention an industrial software product line. However, it is not known which parts of the examples are directly from the industrial cases, and which parts have been modified to highlight certain issues. Hence such examples cannot be considered as direct empirical evidence. The examples include a satellite communication software [37], an elevator control software [27], an enterprise system platform [16], and an automotive product line [5]. Finally, some studies use an example to illustrate or validate an approach. We do not consider such examples as empirical evidence on the existence of industrial quality variability.

3.3 Specific quality attributes that vary The primary studies refer to several specific varying quality attributes (Table 2). Memory consumption variability is caused by variation in hardware or resources (Figure 1), to which software must be adapted. There are two possible approaches. Firstly, hardware is explicitly captured as features (e.g., memory either 1024kB or 2048kB) that are related to functional features with constraints (e.g., DiagnosticsAccess excludes 1024kB) [5, 21]. Secondly, hardware or resources are not present, but just set a limit for the overall software memory consumption; the memory consumption of individual features or components is then aggregated to meet this limit [14, 37, 39]. Performance is varied in different ways: all categories in Table 2 are utilized in the primary studies. Performance is varied for different reasons: the rationale may be varying hardware and resources, such as CPU [21, 37, 39] and network connection [21, 30]; but the rationale may be different user needs [16, 27]. An observation is that architecture is often discussed in conjunction with performance variability,

4.

DISCUSSION ON THE RESULTS

One observation is that the specific quality attribute, e.g., performance or security, needs to be considered, rather than quality attributes in general. Another observation is that the success of feature modeling seems to have a strong influence

43

Table 2: Specific varying quality attributes and dimensions from Figure 2 against primary studies. Some primary studies mention quality variability only at a general level [12, 28, 29, 31, 36]. Quality attributes Memory consumption Performance Security Usability Quality attribute features [10, 27] [9, 10] [7, 20, 27] Embedded to entities [14, 37, 39] [2, 14, 21, 37, 39] [1, 15, 38] [14] Other [5, 21, 24] [21, 23, 24, 34] [34] Possible variant space Memory consumption Performance Security Usability Numeric metric [14, 21, 37, 39] [14, 21, 37, 39] [14] Discrete ordered levels [5, 24] [2, 10, 23, 24, 34] [1, 10, 34] [7] Discrete characteristics [27] [9, 15, 38] [7, 14, 20, 27] Not enough information [32] [6, 11, 16, 22, 26, 30, 32] [6, 11, 16, 30] [6, 26]

on the primary studies. We discuss these as follows. Firstly, quality attributes are often proposed to be captured as mandatory, optional or alternative quality features. It is yet unclear which specific quality attributes are amenable to be captured in such a hierarchy. Especially optionality may be problematic. Regardless of whether optional feature ”Response time of status update < 3s” is selected, the status update function will always have a response time, and that response time may even accidentally be less than three seconds. In contrast, it is much easier to say when functionality is present or not present in the product. Secondly, quality attributes may be transformed directly to functional features, e.g., security transformed to access control features [9]. Transformation from quality requirements to functionality is one architectural strategy [4]. Such transformation may be suitable for security or usability, but may be difficult for performance or resource consumption. Thirdly, quality attributes are proposed to be embedded to functional features. This may be suitable for quality attributes that are specified in conjunction with functions, e.g., a feature ”Call handling” with an attribute ”response time”. With such a feature, it is easy to specify and select a variant, but more difficult to realize the variability. If ”Call handling” functionality is scattered among several components, it is difficult to implement varying ”response time” values. In contrast, some studies embed quality attributes to components or services, thus localizing the variant realization in the scope of one component. However, such approaches need to characterize quality on a per-component basis, and to calculate the overall quality attribute value for a particular composition. As an example, how to embed usability as attributes into components, so that the overall product usability can be aggregated from these attributes? Finally, the realization of quality attribute variability is often analyzed only against functional features. But quality attributes cannot be just byproducts of features. Since software architecture is the key in achieving many quality attributes [3, 4], realizing quality attribute variability requires explicit solutions in the product line architecture or components. For example, how to handle variability in data volumes [16] or the analysis of service performance [23, 24] by considering only the relation to functional features?

5.

DISCUSSION ON THE METHOD

Traditional SLRs aim at completeness, objectiveness, replicability, and possibility to assess validity [25, 33]. We argue that our method fulfills all objectives but completeness similarly as traditional SLRs do.

44

The completeness of our method is different, since we intentionally set the scope to find all relevant studies from one forum instead of all possible forums. This scope causes a threat to external validity and generalizability. However, in certain aspects, the completeness of our method is deeper than in traditional SLRs. If a study is not recognizable by its title or abstract, or it uses different terminology, the search-based review will exclude the study. Thus, traditional systematic reviews aim at breadth in completeness, whereas our method aims at depth in completeness. One systematic review uses, instead of search strings, manual reading of more than 100 journals [18]. Compared to our method, the manual reading in [18] excludes studies based on titles and abstracts. We did not exclude any studies based on titles and abstracts, but read all content through. Thus we had to limit manual reading to one forum. Traditional SLRs search on metadata, and include and exclude studies based on titles, abstracts and keywords, before studying the content. However, this has several drawbacks. Firstly, abstracts and titles do not necessarily convey the contribution in such a way that search, inclusion, and exclusion can be based on them. If the main contribution of the study is different from the topic of the secondary study, even a well reported study may not be recognizable by its title and abstract. Most of the primary studies in our work were not recognizable by their title or abstract; this supports our decision to exclude papers based on full content. Secondly, studies in software engineering often use unestablished terms, and even more so for an immature topic such as quality attribute variability. Hence, it can be argued that search-based reviews are more suitable for analyzing topics that have an established name, like CMM and CMMI in [33]. Finally, if one wants to search for cases or examples, it may not possible to use only conceptual terms in search.

6.

CONCLUSIONS

A systematically conducted literature review on quality attribute variability in software product lines was presented. The analysis covered a rationale for and a means of capturing quality variability, specific quality attributes, and empirical evidence. Firstly, due to the diversity of quality attributes, there may not be ”one-size-fits-all” approach for quality attribute variability. Specific quality attributes are treated differently in the primary studies. Secondly, the success of feature modelling is heavily affecting the proposed approaches. Questions still remain: which quality attributes can be captured as or analyzed against features, and how is architectural

variability covered? Thirdly, there is a need for empirical descriptive accounts, e.g., by means of case studies [40], instead of new methods. Empirical evidence is needed as follows: why and which quality attributes vary, what their variants are, and how they are realized in the architecture. Otherwise the research community cannot evaluate which approaches are needed and suitable. As future work, we are extending the identification of primary studies with backward and forward references and studies from relevant authors. This will extend the scope to other forums, and thus shed light on the generalizability of our results.

7.

Transactions on Software Engineering, 33(1), 2007. [19] K. Kang, S. Cohen, J. Hess, W. Novak, and A. Peterson. Feature-oriented domain analysis (FODA) feasibility study. Technical Report CMU/SEI-90-TR-21, SEI, 1990. [20] K. Kang, P. Donohoe, E. Koh, J. Lee, and K. Lee. Using a marketing and product plan as a key driver for product line asset development. SPLC, 2002. [21] A. S. Karatas, H. Oguztuzun, and A. Dogru. Mapping extended feature models to constraint logic programming over finite domains. SPLC, 2010. [22] K. Kim, H. Kim, and W. Kim. Building software product line from the legacy systems: Experience in the digital audio and video domain. SPLC, 2007. [23] T. Kishi and N. Noda. Aspect-oriented analysis of product line architecture. SPLC, 2000. [24] T. Kishi, N. Noda, and T. Katayama. A method for product line scoping based on a decision-making framework. SPLC, 2002. [25] B. Kitchenham. Procedures for performing systematic reviews. Technical Report TR/SE-0401, 2004. [26] J. Lee, K. Kang, and S. Kim. A feature-based approach to product line production planning. SPLC, 2004. [27] K. Lee and K. Kang. Using context as key driver for feature selection. SPLC, 2010. [28] F. Linden. Engineering software architectures, processes and platforms for system families. SPLC, 2002. [29] S. Montagud and S. Abrahao. Gathering current knowledge about quality evaluation in software product lines. SPLC, 2009. [30] E. Niemel¨ a, M. Matinlassi, and A. Taulavuori. Practical evaluation of software product family architectures. SPLC, 2004. [31] J. Oldevik and O. Haugen. Higher-order transformations for product lines. SPLC, 2007. [32] M. Sinnema, S. Deelstra, J. Nijhuis, and J. Bosch. COVAMOF: A framework for modeling variability in software product families. SPLC, 2004. [33] M. Staples and M. Niazi. Experiences using systematic review guidelines. Jour. Syst. Soft., 80(9), 2007. [34] H. Sun, R. Lutz, and S. Basu. Product-line-based requirements customization for web service compositions. SPLC, 2009. [35] M. Svahnberg, J. van Gurp, and J. Bosch. A taxononomy of variability realization techniques. Software—Practice and Experience, 35(8), 2005. [36] S. Thiel and A. Hein. Systematic integration of variability into product line architecture design. SPLC, 2002. [37] T. T. Tun, Q. Boucher, A. Classen, A. Hubaux, and P. Heymans. Relating requirements and feature configurations: A systematic approach. SPLC, 2009. [38] Y. Wang, A. Kobsa, A. Hoek, and J. White. PLA-based runtime dynamism in support of privacy-enhanced web personalization. SPLC, 2006. [39] J. White, D. Schmidt, E. Wuchner, and A. Nechypurenko. Automating product-line variant selection for mobile devices. SPLC, 2007. [40] R. Yin. Case Study Research. 2nd edition, 1994.

REFERENCES

[1] E. Bagheri, M. Asadi, D. Gasevic, and S. Soltani. Stratified analytic hierarchy process: Prioritization and selection of software features. SPLC, 2010. [2] E. Bagheri, T. Noia, A. Ragone, and D. Gasevic. Configuring software product line feature models based on stakeholders’ soft and hard requirements. SPLC, 2010. [3] L. Bass, P. Clements, and R. Kazman. Software Architecture in Practice. 2nd edition, 2003. [4] J. Bosch. Design and Use of Software Architectures: Adapting and Evolving a Product-Line Approach. 2000. [5] G. Botterweck, S. Thiel, D. Nestor, S. Abid, and C. Cawley. Visual tool support for configuring and understanding software product lines. SPLC, 2008. [6] L. Etxeberria and G. Sagardui. Product-line architecture: New issues for evaluation. SPLC, 2005. [7] L. Etxeberria and G. Sagardui. Variability driven quality evaluation in software product lines. SPLC, 2008. [8] L. Etxeberria, G. Sagardui, and L. Belategi. Modelling variation in quality attributes. VaMOS, 2007. [9] Y. Ghanam and F. Maurer. Linking feature models to code artifacts using executable acceptance tests. SPLC, 2010. [10] I. Gimenes, M. Fantinato, and M. Toledo. A product line for business process management. SPLC, 2008. [11] M. Griss. Implementing product-line features by composing aspects. SPLC, 2000. [12] A. Gruler, A. Harhurin, and J. Hartmann. Development and configuration of service-based product lines. SPLC, 2007. [13] S. Hallsteinsen, S. Schouten, G. Boot, and T. Fægri. Dealing with architectural variation in product populations. Software Product Lines—Research Issues in Engineering and Management. 2006. [14] S. Hallsteinsen, E. Stav, A. Solberg, and J. Floch. Using product line techniques to build adaptive systems. SPLC, 2006. [15] S. Hendrickson, Y. Wang, A. Hoek, R. Taylor, and A. Kobsa. Modeling PLA variation of privacy-enhancing personalized systems. SPLC, 2009. [16] Y. Ishida. Software product lines approach in enterprise system development. SPLC, 2007. [17] ISO/IEC 9126-1. Software engineering - product quality - part 1: Quality model, 2001. [18] M. Jorgensen and M. Shepperd. A systematic review of software development cost estimation studies. IEEE

45

Publication II

Myllärniemi, Savolainen, Raatikainen, Männistö. Performance variability in software product lines: proposing theories from a case study. Accepted for publication in Empirical Software Engineering, 47 pages, Online February 2015.

c 2015 Springer.

Reprinted with permission.

103

Empir Software Eng DOI 10.1007/s10664-014-9359-z

Performance variability in software product lines: proposing theories from a case study Varvana Myll¨arniemi · Juha Savolainen · Mikko Raatikainen · Tomi M¨annist¨o

© Springer Science+Business Media New York 2015

Abstract In the software product line research, product variants typically differ by their functionality and quality attributes are not purposefully varied. The goal is to study purposeful performance variability in software product lines, in particular, the motivation to vary performance, and the strategy for realizing performance variability in the product line architecture. The research method was a theory-building case study that was augmented with a systematic literature review. The case was a mobile network base station product line with capacity variability. The data collection, analysis and theorizing were conducted in several stages: the initial case study results were augmented with accounts from the literature. We constructed three theoretical models to explain and characterize performance variability in software product lines: the models aim to be generalizable beyond the single case. The results describe capacity variability in a base station product line. Thereafter, theoretical models of performance variability in software product lines in general are proposed. Performance variability is motivated by customer needs and characteristics, by trade-offs and by varying operating environment constraints. Performance variability can be realized by hardware or software means; moreover, the software can either realize performance differences in an emergent way through impacts from other variability or by utilizing purposeful varying design tactics. The results point out two differences compared with the prevailing

Communicated by: Ebrahim Bagheri, David Benavides, Per Runeson and Klaus Schmid V. Myll¨arniemi () · M. Raatikainen Aalto University, Finland P.O. Box 15400, FI-00076 Aalto, Espoo, Finland e-mail: [email protected] M. Raatikainen e-mail: [email protected] J. Savolainen Danfoss Power Electronics A/S, Nordborg, Denmark e-mail: [email protected] T. M¨annist¨o University of Helsinki, Helsinki, Finland e-mail: [email protected]

Empir Software Eng

literature. Firstly, when the customer needs and characteristics enable price differentiation, performance may be varied even with no trade-offs or production cost differences involved. Secondly, due to the dominance of feature modeling, the literature focuses on the impact management realization. However, performance variability can be realized through purposeful design tactics to downgrade the available software resources and by having more efficient hardware. Keywords Case study · Software product line · Variability · Software architecture

1 Introduction Companies that develop software products face the diversity of customer needs. Instead of offering a single product as a compromise of the varying needs, companies offer several products with slightly varying capabilities. Software product lines have gained popularity as an approach to efficiently developing such varying products. A software product line is a set of software-intensive products that share a common, managed set of features, a common software architecture and a set of reusable assets (Bosch 2000; Clements and Northrop 2001). Instead of developing products independently, the products of a product line are developed by reusing existing product line assets in a prescribed way; these assets include software components, requirements, test cases, and other reusable artifacts. In a software product line, the architecture and the assets must be able to cover commonality and variability. Commonality represents those aspects that are shared among the products. Thus, commonality enables reuse, and consequently increases development efficiency. Variability is the ability of a system to be efficiently extended, changed, customized or configured for use in a particular context (Svahnberg et al. 2005). Thus, variability represents those aspects that enable product differentiation and customization for different customer needs. Variability manifests itself in many different levels (van Gurp et al. 2001): in the requirements, in the architecture, and in the implementation. One of the key challenges in product line engineering is the efficient management and realization of variability. Consequently, variability has been a focus of intense research during the recent years. However, the variability of quality attributes, and in particular, the variability of performance, has received less research attention. From the research point of view, and from the point of view of industrial cases reported in the research, the product variants seem to differ from each other mostly through their functional capabilities (Galster et al. 2014), and performance is kept more or less similar, or at least its variability is not purposeful and explicitly managed. There are certain aspects of purposeful performance variability in software product lines that call for investigation. Firstly, performance is continuous (Regnell et al. 2008): instead of being either included or excluded from a product variant, performance is measured as different shades of product goodness. Thus, different customer needs may often be addressed with the same product: if customer A has the requirement that the response time of function X should be 500 ms, and customer B requires 1000 ms, a product with response time of 500 ms will satisfy both needs. By contrast, functional variants often cannot be ordered or substituted with each other. For the product line owner, all additional and explicitly managed variability, for example, the ability to produce both 500 ms and 1000 ms variants, adds to the cost and complexity of the product line. Therefore, the motivation to purposefully vary performance should be studied.

Empir Software Eng

Secondly, if performance is decided to be varied as a software product line, the differences between the products must be realized as variability in the product line architecture and assets. Because of the architectural nature of performance and other quality attributes, the realization of performance variability may crosscut throughout the product line architecture. It has been argued that quality attribute variability that affects software architecture is difficult to realize and hence is to be avoided (Hallsteinsen et al. 2006a). Thus, the strategy for realizing performance variability needs to be studied. Finally, the literature on performance variability and quality attribute variability in general mostly lacks empirical evidence, at least evidence that is explicitly reported and drawn from industrial product lines (Myll¨arniemi et al. 2012). Even if studies address quality attribute variability, they often do not describe the study context or the research design (Galster et al. 2014). By contrast, there is even evidence of industrial contexts in which quality variability was not needed (Galster and Avgeriou 2012). The lack of empirical evidence also applies to software product line engineering in general (Ahnassay et al. 2013). Further, the existing empirical evidence on software product line engineering tends to focus on constructing and evaluating methods, techniques and approaches (Ahnassay et al. 2013), that is, it is mostly about validating artifacts (Hevner et al. 2004) or prescriptive design theories (Gregor 2006). Thus, there is a need to observe the phenomenon in its real-life context (Yin 1994), that is, to study how quality attribute variability exhibits in industrial product lines. To address the above issues, this paper presents a theory-building case study (Yin 1994; Urquhart et al. 2010). The goal is to study the motivation to purposefully vary performance and the realization of performance variability in software product lines. The research questions are as follows: RQ1 RQ2 RQ3 RQ4

Which characteristic of performance is decided to be varying? Why is performance decided to be varied? What is the strategy for realizing performance variability within the product line architecture? Why is the realization strategy chosen?

To answer the research questions, the study was conducted as a single-case case study (Yin 1994; Runeson and H¨ost 2009) that was augmented with a systematic literature review (Wohlin and Prikladniki 2013). We conducted a post mortem case study in the domain of 3G (3rd generation) mobile telephone networks. The case company is Nokia, formerly Nokia Solutions and Networks, and the product line in the focus of this study was a base station in the 3G radio access network. This software product line was designed to exhibit purposeful capacity variability. In addition to describing the results as capacity variability for base station product lines, we propose a number of theoretical models to address performance variability in software product lines in general. To build the theoretical models, we adopted a grounded theory approach to iteratively collecting and analyzing data (Urquhart et al. 2010): the analysis and synthesis utilized the case account as a basis, and augmented the theory categories, boundaries and example instantiations from the literature. The case study was partly explanatory, partly descriptive (Runeson and H¨ost 2009); consequently, the resulting theoretical proposals include both describing and explaining models (Gregor 2006). The scope of this study was on purposeful variability, which means that unintended, indirect quality variability (Niemel¨a and Immonen 2007) was not addressed. Further, the focus was on performance, which includes subattributes such as response time,

Empir Software Eng

memory consumption, and capacity. Finally, the scope included both software product lines and software-intensive product lines; software-intensive here implies that the product line encompasses both hardware and software. One of the contributions is to describe and explain capacity variability in a base station product line, thus accumulating the reported empirical evidence on quality attribute variability in industrial product lines. Another contribution is the combined analysis of the existing literature and the case account. The main contribution is to propose theoretical models that describe and explain performance variability in software product lines. By building the results into these models through analytical generalization (Yin 1994), we aim at generalizing the results beyond the setting or domain of this single case study. The proposed theoretical models indicate two fundamental differences between the prevailing research approaches and the case account. Firstly, performance variability is motivated by customer needs and characteristics, by design trade-offs and by varying operating environment constraints. Typically, literature explains performance variability as a way to resolve the trade-offs and constraints stemming from the solution domain. Due to price differentiation to the customers and their evolving needs, performance variability may be motivated only by the problem domain, that is, performance may be varied even with no trade-offs involved and even when the cost to produce the variants is the same. Secondly, due to the dominance of feature modeling, the existing literature focuses on realizing differences in performance by managing the impact and interactions of software features on product performance; we formulate this as the impact management realization. By contrast, the case company realized variability both through hardware and software; the software utilized a purposeful design tactic to downgrade the available resource with minimum dependencies to other variability. This realization enabled varying capacity separately of other variability and upgrading it at runtime as the needs evolved. This paper has been extended from an earlier publication (Myll¨arniemi et al. 2013); all content from that publication has been thoroughly revised and updated. As a novel contribution, this paper presents the following: – – –

An extended review of the previous work. An extended data collection and analysis to include also accounts from the literature. Proposed theoretical models to answer the research questions in more general terms, beyond the domain of the case study.

The rest of this paper is organized as follows. Section 2 lays out basic theoretical concepts related to the research questions and reviews previous work. Section 3 describes the research method. Section 4 describes the results as capacity variability in a base station product line. Section 5 describes results as theoretical models on performance variability in software product lines. Section 6 discusses the validity of the results and the lessons learned, while Section 7 concludes.

2 Background and Review of Previous Work In the following, Sections 2.1 and 2.2 describe the theoretical foundation for our research topic. Thereafter, Sections 2.3 and 2.4 present a review of the previous work on the research topic.

Empir Software Eng

2.1 Performance as a Quality Attribute Quality attributes, such as performance, security and availability, play a significant role in many industrial software products. In such a system, failing to meet one quality requirement may render the whole system useless. Quality attributes can be defined as characteristics that affect an item’s quality (IEEE Std 61012-1990 1990). However, due to the vague definition, quality attributes are often defined via attribute taxonomies (ISO/IEC 9126-1 2001; ISO/IEC 25010 2011; Boehm et al. 1978; McCall et al. 1977) and then defining the constituent subattributes in more concrete terms. In the context of software products, special focus is on external quality attributes (ISO/IEC 9126-1 2001), or on product quality attributes (ISO/IEC 25010 2011). This is because external quality attributes can be used to distinguish products from each other to the customers. Finally, quality attributes can be divided into those that are observable at runtime, and to those that are not (Bass et al. 2003). The former relate to the dynamic properties of the computer system or to the quality properties in use (ISO/IEC 25010 2011), whereas the latter relate to the static properties of software (ISO/IEC 25010 2011). Quality requirements, also known as non-functional requirements (Mylopoulos et al. 1992; Berntsson Svensson et al. 2012), characterize quality attributes: a quality requirement is a requirement that a quality attribute is present in software (IEEE Std 1061-1998 1998). For software products, the quality requirements and their target values need to be decided by taking into consideration the benefit to the market, the cost of achieving, and even the competitor products (Regnell et al. 2008). Yet, there are several practical challenges, for example, quality requirements tend to be neglected and overlooked (Berntsson Svensson et al. 2012). Performance is an important quality attribute in the industry (Berntsson Svensson et al. 2012). Poor performance may imply lost revenue, decreased productivity, increased development and hardware costs, and damaged customer relations (Smith and Williams LG 2002). Performance can be defined as the degree to which a system or component accomplishes its designated functions within given constraints, such as speed, accuracy, or memory usage (IEEE Std 61012-1990 1990). Performance is relative to the amount of resources used to meet those constraints; example resources include other software products and the software and hardware configuration of the system (ISO/IEC 25010 2011). Thus, performance is a dynamic property of the computer system (ISO/IEC 25010 2011), which also implies it is observable at runtime (Bass et al. 2003) and can be used to distinguish products from each other. Performance can be divided into subattributes of time behavior, resource utilization and capacity (ISO/IEC 25010 2011). Time behavior can refer to the latency of responding to an event, or to the throughput of processing events in a given time interval (Bass et al. 2003; Barbacci et al. 1995). Time behavior requirements may be defined relative to the amount of resources needed to meet the constraints, and to the load of the system (ISO/IEC 25010 2011; Bass et al. 2003). Resource utilization refers to the amount of resources the system uses to perform its functions (ISO/IEC 25010 2011); typical examples of resources include both static and dynamic memory. Capacity means the degree to which the maximum limits of a product or system parameter meet requirements (ISO/IEC 25010 2011); as a concrete example, capacity can be defined as the maximum achievable throughput without violating the specified latency requirements (Barbacci et al. 1995). The software architecture is critical to the realization of many quality attributes (Bass et al. 2003): a significant part of the quality attributes are determined by the choices done

Empir Software Eng

during the architecture design. This is also true for performance: most performance failures are caused by not considering performance early in the design (Smith and Williams LG 2002). Performance is affected by several software architecture design decisions: the type and amount of communication among components; the functionality that has been allocated to these components; and the allocation of the shared resources (Bass et al. 2003). In many systems, the functionality is decentralized, which means that performing a given function is likely to require collaboration among many different components (Smith and Williams LG 2002). Due to the architectural nature, performance should be designed in and evaluated at the architectural level (Bass et al. 2003). To address performance during design, performance tactics and patterns encapsulate reusable solutions (Bass et al. 2003; Smith and Williams LG 2002). Performance tactics include decreasing resource demand; increasing or parallelizing resources; and enhancing resource arbitration (Bass et al. 2003). Further, there are many design decisions that improve other quality attributes at the expense of performance: for example, most of the availability tactics (Bass et al. 2003) increase the overhead and complexity. Such situations are called trade-offs and they are usually resolved by finding a global, multi-attribute optimum (Barbacci et al. 1995). The way these trade-offs are resolved during the architectural design forms the quality attributes of the system; later in the implementation individual quality attributes cannot be easily improved. 2.2 Variability in Software Product Lines To manage and represent variability in product lines, features and feature modeling (Kang et al. 1990; Kang et al. 2002; Czarnecki et al. 2005) have become de facto standard in the research community. A feature can be seen as a characteristic of a system that is visible to the end-user (Kang et al. 1990), or in general, as a system property that is relevant to some stakeholder and is used to capture commonalities or discriminate among product variants (Czarnecki et al. 2005). A feature model then represents the variability and relations of features. In addition to managing variability through features, the variability of the architecture (Thiel and Hein 2002) and the implementation (Svahnberg et al. 2005) needs to be managed. Variability management typically focuses on functional variability. Quality attribute variability in software product lines has been studied to some degree (Myll¨arniemi et al. 2012; Etxeberria et al. 2007). However, quality attribute variability is typically not the main contribution but only supports other more central aspects in the study (Myll¨arniemi et al. 2012). Further, quality attribute variation can be both purposeful and unintentional. This is because any variability in the product line may also cause indirect variation in the quality attributes (Niemel¨a and Immonen 2007). At least the following combinations of quality attributes, variability, and product lines can be identified. Firstly, software product lines can have purposeful quality attribute variability: this is the focus of our study. The product line has the ability to deliberately create quality attribute differences between the products, that is, the products exhibit purposefully different quality attributes to serve different needs. The products are developed as a product line, which means that the product line and its assets need to explicitly manage and realize quality attribute variability. Secondly, there can be product lines that do not have purposeful quality attribute variability. Either all products in the product line exhibit more or less similar quality attributes, or the quality differences between the products are unintentional. For example,

Empir Software Eng

the product line architecture is designed to address a common, “the worst case” quality requirement (Hallsteinsen et al. 2006a). Thirdly, there can be products with purposefully different quality attributes, but the products are not developed as a product line, for example, under one product line architecture. This kind of approach is well tailored to specific needs, but at the same time is costly, since the level of reuse will be lower. This may be an option if the needed architectural solutions are very different or conflicting. 2.3 Performance Variability in Software Product Lines In the following, we briefly review the related work on performance variability in software product lines. Most of the studies that address quality attribute variability discuss quality attributes in general and the contribution is not limited to specific quality attributes, such as performance. Only a handful of studies focus on specific quality attributes: as an example, Mellado et al. (2008) propose a process of security requirements engineering for software product lines. Instead, a typical case is to propose a method or a construct that is promised to be applicable to all quality attributes, and then utilize a concrete example with specific quality attributes. However, questions that been raised about whether a blanket solution can cover all quality attributes equally well (Myll¨arniemi et al. 2012; Berntsson Svensson et al. 2012). Nevertheless, when looking at the examples utilized in the studies, it seems that performance, and in particular, time behavior and resource consumption, are two quality attributes that are often proposed to be varied in software product lines (Myll¨arniemi et al. 2012). Performance variability can be represented and managed in many different ways and on many different levels; this is also noted in two literature reviews (Myll¨arniemi et al. 2012; Etxeberria et al. 2007). Some approaches focus more on representing and managing performance variability in the problem space, as performance requirements or goals, or as performance options that can be selected during application engineering. Some approaches focus more on how performance variability is realized, either through architectural tactics, or through the interplay of features and feature impacts in the software product line. Firstly, performance variability can be represented by capturing how features in a feature model impact performance: when features are varied, so is performance. A feature impact characterizes how a particular feature contributes to performance: for example, feature Credit Card contributes 50ms to the overall response time (Soltani et al. 2012). Such feature impacts can be represented as feature attributes (Benavides et al. 2005) or listed as the properties of features (White et al. 2009; Soltani et al. 2012); moreover, the impacts can be both quantitative or qualitative (feature Encryption has a negative contribution to the response time (Jarzabek et al. 2006)). Representing the impact of features has been used for response time (Soltani et al. 2012), CPU consumption (White et al. 2009; Guo et al. 2011), memory consumption (Tun et al. 2009; Guo et al. 2011), and speed (Bagheri et al. 2010). However, the challenge is to characterize or measure the impact per feature; it has been claimed that time behavior can only be characterized per variant and not per feature (Siegmund et al. 2012b). Further, the feature impacts may depend on the presence of other features. This is called feature interaction (Siegmund et al. 2013; Siegmund et al. 2012a; Sincero et al. 2010; Etxeberria and Sagardui 2008): a certain combination of features may create a bigger memory footprint, compared with simply aggregating the memory footprints of individual features. Feature interactions may occur when the same code unit participates in implementing multiple features; when a certain combination of features requires additional code; or when

Empir Software Eng

two features share the same resource (Siegmund et al. 2012b; Siegmund et al. 2013). Feature interactions have been addressed for memory footprint, main memory consumption and time behavior (Siegmund et al. 2012a; Siegmund et al. 2013). Another way to represent performance variability is to capture varying performance directly as “quality attribute features” or as other feature-like entities (Lee and Kang 2010; Etxeberria and Sagardui 2008; Gimenes et al. 2008): this makes it straightforward to select the desired performance variant for a product. However, the realization of different performance variants must also be addressed, for example, by characterizing qualitatively how each functional or technological feature contributes to the quality attribute features (Lee and Kang 2010; Etxeberria and Sagardui 2008). More in the problem domain, there are also approaches that represent performance variability as softgoals. Softgoals represent requirements that do not have clear-cut satisfaction criteria; instead, they are satisfied when there is sufficient positive evidence and little negative evidence (Mylopoulos et al. 2001). Typically, performance is represented as softgoals, which are in turn operationalized as varying goals (Yu et al. 2008) or varying tasks (Gonz´alez-Baixauli et al. 2007); these varying goals and tasks are then mapped to solution-space features. Alternatively, the softgoals can be operationalized directly as varying features (Jarzabek et al. 2006). The operationalization captures the qualitative impact, such as hurt or help, from the varying goals, tasks or features onto performance softgoals. However, since performance is one of the quality attributes for which it is relatively easy to define clear-cut satisfaction criteria with quantifiable measures, softgoals are more commonly used for the variability of other quality attributes, such as security or usability. Yet, even performance requirements may be represented in less specific form as softgoals during the early stages of product line development, for example, High performance, and later converted into features with clear-cut satisfaction criteria (Jarzabek et al. 2006). Finally, it is also possible to attach performance information directly to variation points, that is, to an orthogonal variability model (Roos-Frantz et al. 2012). In addition to representing variability, one must be able to derive or optimize products with specific characteristics. However, algorithms that take into account quantitative impacts or optimization are computationally very expensive. Earlier CSP-based solvers resulted in exponential solution times to the size of the problem (Benavides et al. 2005). White et al. (2009) showed that finding an optimal variant that adheres to both the feature model constraints and the system resource constraints is an NP-hard problem. Therefore, approximation algorithms (Guo et al. 2011; White et al. 2009) as well as HTN planning (Soltani et al. 2012) have been proposed. In contrast to the multitude of approaches that study performance variability at the feature level, there is much less explicit discussion on the realization of performance variability in the product line architecture. In principle, varying architectural design decisions can be represented as varying features: for example, feature Euclidean describes a certain variant of a face recognition algorithm (White et al. 2009). This is because features can be used to capture design decisions (Jarzabek et al. 2006), domain technology or implementation techniques (Lee and Kang 2010). Thus, it is not always clear-cut whether the approach is focusing on features or on architectural design decisions in realizing performance variability. Nevertheless, the different architectural tactics are used to analyze how varying performance requirements can be met (Kishi and Noda 2000; Kishi et al. 2002). Also the effect of different algorithms (White et al. 2009; Bagheri et al. 2010) or different patterns (Hallsteinsen et al. 2006b) to time behavior and resource consumption has been captured.

Empir Software Eng

The role of hardware is sometimes discussed in conjunction with performance variability; varying hardware constrains the resource consumption and thus affects how software features can be selected (Botterweck et al. 2008; Karatas et al. 2010). For example, hardware features 1024kB and 2048kB represent the choice between different memory components in a system (Botterweck et al. 2008). Thereafter, explicit constraints relate the software and hardware features: for example, software feature DiagnosticsAccess excludes 1024kB (Botterweck et al. 2008). Such explicit constraints may stem from known incompatibility issues, or from the externalized knowledge on reference configurations, as described by Sinnema et al. (2006). The varying hardware can also be outside the scope of the product line, for example, when the application resource consumption is constrained by the mobile device capabilities (White et al. 2007). 2.4 Empirical Evidence on Quality Attribute Variability in Industrial Product Lines In the following, we review empirical evidence on quality attribute variability in industrial product lines. In particular, we are interested in the evidence on the existence, characteristics, and practices of quality attribute variability in industrial product lines. Typically, such evidence has been produced following the observational research path (Stol and Fitzgerald 2013), which may range from informal experience reports to rigorous case studies. Nevertheless, we are also interested in those methods or prescriptive theories that have been tested in the industrial context, for example, with experiments. The empirical evidence on quality attribute variability in industrial product lines is scarce (Myll¨arniemi et al. 2012). There are case studies and reported industrial experience on product lines and variability, also in the telecommunication domain (Jaring and Bosch 2002), but the focus of the reported empirical evidence has not been on quality attribute variability. Due to the lack of studies, we review the empirical evidence on quality attribute variability in general, instead of focusing only on performance variability. Moreover, the rigor of empirical evidence differs. As discussed by Runeson and H¨ost (2009), the term “case study” is an overloaded word in software engineering research: the presented case studies range from ambitious and organized case studies to small toy examples, c.f., (Yin 1994; Dub´e and Par´e 2003; Runeson and H¨ost 2009). In fact, it is very common in software engineering that a “case study” is merely an example used to provide a proof-of-concept for a method or a construct, similarly as described by (Hevner et al. 2004). According to Yin (1994), a case study studies a phenomenon within its reallife context. For studies in software engineering, the context is typically a company. Thus, a case study in software engineering should describe the case company and how the phenomenon of interest exhibits there. Since the phenomenon and its context are not always distinguishable from each other, data collection and data analysis strategies become an integral part of case studies (Yin 1994). Thus, a case study should explain the data collection and analysis procedures and establish a chain of evidence from the data to the results. There are a few studies (Kishi et al. 2002; Sinnema et al. 2006; Niemel¨a et al. 2004; Myll¨arniemi et al. 2006b; Hallsteinsen et al. 2006a) that can be characterized as case studies and that describe a product line company with quality attribute variability, that is, say directly that the phenomenon of quality attribute variability happens within its real-life industrial context. In a similar fashion, there are studies that describe a specific open-source software that is explicitly mentioned to have quality attribute variability (Siegmund et al. 2012b; Sincero et al. 2010). By contrast, there is evidence on industrial contexts in which variability in quality attributes was not needed (Galster and Avgeriou 2012). However, in

Empir Software Eng

most of these studies, quality attribute variability is only a minor characteristic, and the main contribution lies elsewhere. Some studies merely mention quality attribute variability briefly (Niemel¨a et al. 2004). In many studies, the varying quality attribute is mentioned to include performance (Sinnema et al. 2006; Myll¨arniemi et al. 2006b; Niemel¨a et al. 2004; Siegmund et al. 2012b; Sincero et al. 2010). However, the motivation and the realization of quality attribute variability are not discussed in these studies, or discussed only very briefly. For the motivation, the refresh rate and memory consumption of the 3D mobile game software is varied to maximize the game attractiveness and playability on all devices with varying capabilities (Myll¨arniemi et al. 2006b). For the realization, the variability of adaptability, availability, suitability and interoperability causes architectural variation in a product line, which is realized with varying patterns (Hallsteinsen et al. 2006a). Also, performance and memory consumption variability may be the result of selecting various installation options in infrastructure-oriented software product lines (Siegmund et al. 2012b; Sincero et al. 2010). However, none of these studies, with the exception of (Myll¨arniemi et al. 2006b; Galster and Avgeriou 2012), provides an adequately explicit description about the method using which the case study data was collected or analyzed. Therefore, it is hard to assess the level of rigor and resulting construct validity in terms of correspondence to the real phenomenon. There are also studies (Lee and Kang 2010; Jarzabek et al. 2006; Tun et al. 2009; Kuusela and Savolainen 2000) that propose a method or a construct, utilize an example of varying quality attributes, and mention or imply an industrial product line behind the example. In a similar manner as in case studies, the rigor of empirical evidence differs (Shaw 2002; Fettke et al. 2010). Consequently, it is not clear if these studies are actually rigorous empirical studies about a contemporary phenomena, or are the studies merely statements backed up by exemplary experience (Fettke et al. 2010), or slices of real life or toy examples (Shaw 2002) being influenced by industrial software product lines. Finally, in addition to empirical evidence about quality attribute variability in industrial contexts, there are also other kinds of empirical studies: for example, a student experiment reported, slightly surprisingly, that students were able to identify varying quality attributes better than varying functionality (Galster and Avgeriou 2011). To summarize, there is scarcely empirical research, and case studies in particular, that describe quality attribute variability in its real-life context and allow assessment of data collection and analysis procedures.

3 Research Method 3.1 Research Design This research was carried out following the case study methodology (Yin 1994; Patton 1990; Runeson and H¨ost 2009). The case study was augmented with a systematic literature review (Wohlin and Prikladniki 2013; Wohlin 2014) (see Fig. 1). Additionally, the analysis and theory building utilized some guidelines from the grounded theory methodology (Urquhart et al. 2010). Qualitative methods in general permit researchers to study selected issues in depth and detail (Patton 1990). A case study is a suitable approach for situations in which the phenomenon of interest is complex, non-manipulable and the understanding of the topic is still lacking (Yin 1994): quality attribute variability in industrial software product lines fits all

Empir Software Eng

Legend

Proposed theoretical models of performance variability in software product lines

Logical artifact in the theory building

was utilized to build

Case account of capacity variability in a base station product line was utilized to build

was utilized to augment and refine

Examples and accounts of performance variability in software product lines from the primary studies

was utilized to build

Data

Validation

Internal documents Informal discussions, notes Second author’s first-hand experience, notes Publicly available information about the domain Comments and answers from the chief architects

Two reviews of the case account by the chief architects

was utilized to identify

139 primary studies of quality attribute variability in software product lines from a systematic literature review

Fig. 1 The theorizing levels in this study: from the data to the accounts, and from the accounts to the more general proposed theoretical models. Also the scope and focus of each level are indicated

such characteristics. Further, since the research questions are about “why”and “how”, a case study seemed appropriate (Yin 1994). According to Yin (1994), the main component in the case study research is a theory, both as the starting point and as the end result. In empirical software engineering, two distinct research designs can be identified: theory building (observational path) and theory testing (hypothetical path) (Stol and Fitzgerald 2013). For example, the theoretical model of the open source software communities was built based on a case study on Apache; later on, in a follow-up study, the model was tested with Mozilla to close the theorizing cycle (Stol and Fitzgerald 2013). This research was designed to follow the observational path (Stol and Fitzgerald 2013), that is, to build theories from the empirical data and observations, and theory testing was left as future work. To follow the observational path, the empirical data and observations were achieved through a case study with the unit of analysis (Yin 1994) covering capacity variability in a mobile network base station. Thereafter, the theory building expanded the unit of analysis to cover performance variability in software product lines in general (Fig. 1). The results were formulated into three descriptive and explanatory theoretical models, which were aligned to consist of theory constructs, relations and scope (Gregor 2006). Consequently, the theorizing consisted of three levels: the data level, the account level, and the theory level (Fig. 1); these levels also expanded the case-specific scope into more a generalized one. 3.2 Case Selection The case company was Nokia, formerly Nokia Solutions and Networks, and the selected case was a base station in the 3G radio access network. The case selection utilized snowball and convenience sampling methods (Patton 1990). Snowball sampling selects cases on the basis of asking well-informed people of suitable information-rich cases: the first author asked the second author about his knowledge on

Empir Software Eng

cases that exhibit quality attribute variability in a product line. By contrast, convenience sampling selects easily accessible cases for the study. Initially, three product lines from the case company portfolio were identified as potential cases, as they exhibited quality attribute variability. However, only one case was included in the final version of this study, mainly because the results could be validated and published without confidentiality issues. Additionally, the remaining case had rich information that was available and accessible; and it was possible to collect additional data from the people who had been developing the case product line. The case study was performed post mortem: the studied product line was designed and even partially developed, but discontinued before the production stage; the reason is discussed in Section 4. This of course creates threats to the study validity, which are discussed in Section 6.1. However, the post mortem nature of this study made it possible to access confidential project documentation making this single case an information-rich special case. Moreover, although this particular base station was discontinued, the case company and the people who participated in the development continued to work with similar base stations. At the same time, this single case was also a typical case from the unit of analysis perspective, because similar findings seem to apply to other base stations in the case company portfolio. 3.3 Data Collection The data collection took place iteratively, as illustrated in Table 1, and it consisted of collecting the case data and finding and selecting the primary studies (the lowest level in Fig. 1).

Table 1 The stages of the overall research process Activity

Detailed description

1 Data collection

Collecting internal documents and publicly available information; conduc-

2 Analysis

Light-weight coding; identification of the main concepts; formulation of the

3 Validation

A review of the findings by the chief architects.

4 Data collection

Comments and answers to a list of open questions from the chief architects.

5 Analysis

Identification of new concepts and findings; revision of the research questions.

ting informaldiscussions; recording first-hand experience. detailed research questions and initial findings.

6 Validation

A review of the findings by the chief architects.

7 Analysis

Constructing the first case account.

8 Reporting

Myll¨arniemi et al. (2013).

9 Data collection

139 primary studies of quality attribute variability in software product lines, selected through a systematic literature review (see Table 2).

10Analysis

Identification of accounts and passages on performance variability from the primary studies; identification of example instantiations for existing categories; identification of new concepts and categories; theory building.

The data collection, analysis and validation activities were iterated to build the theoretical models

Empir Software Eng

3.3.1 Collecting Case Data The main data source for the case account consisted of various documents, including a product line software architecture document, a detailed subsystem architecture document, a product line architecture evaluation document, and a product specification document. In total, roughly 300 pages of technical documentation were included in the analysis. All these documents were originally deemed for internal use within the company. Further understanding about the application domain was acquired from various sources, especially from an edited book (Holma and Toskala 2000) by the employees of the case study company. In addition, open or unclear issues within the documents were discussed in informal meetings between the authors. Secondly, the second author had participated in the architectural evaluation of the case product line, which had resulted in notes and observational first-hand experience. Hence, the second author acted as one data source. Consequently, data was also collected via informal discussions that took place between the first and the second author as well as with one employee at the case company who was familiar with the case. Written notes about these discussions were stored. These informal discussions gave background information as well as clarified unclear issues in the documents. The discussions also explicated implicit rationales and other contextual facts not covered in the technical documents. Further, for the validation of the findings that also provided an opportunity to collect additional data, the results were reviewed twice by a group of the chief architects of the case product line. These chief architects were involved with the case project from the beginning to the end and thus had first-hand experience. In the review process, the first author provided a written list of questions to clarify open issues. Answers and comments were collected and refined via e-mails and phone discussions. Finally, triangulation was used in several forms. In particular, we compared the experiences of the second author, the comments from the chief architects and the original documents from the time the product line was designed to each other. Thus, there was triangulation between the responders and triangulation between the responders and the documents: this aimed at preventing the subjective interpretations of individuals to bias the results. Additionally, the investigator triangulation was applied: the first author’s analysis was subjected to the other authors. When collecting data, a case study database (Yin 1994) was established. This included all documents, notes from the informal meetings, e-mails, and other observations. All data was produced into textual form. 3.3.2 Searching and Selecting Primary Studies Existing literature constituted another source of data (lowest level in Fig. 1). For this purpose, we followed the systematic review guidelines (Wohlin and Prikladniki 2013; Wohlin 2014) that utilize snowballing as the primary search strategy. The data collection in our review protocol had wider scope than within our case study: primary studies were searched and selected to address quality attribute variability in software product lines in general (Fig. 1). During the analysis phase, the primary studies were analyzed only from the performance variability point of view. In the following, we briefly outline the search and selection protocol. As discussed by Myll¨arniemi et al. (2012), the protocol did not utilize any search strings or database searches. Thus, the search protocol was not dependent on any specific terms utilized

Empir Software Eng

to characterize quality attribute variability; such terms tend to be highly heterogeneous. Further, the search protocol did not exclude any primary studies based on metadata only, but all potential publications were retrieved and their full content was read before the decision to exclude was made. Thus, studies that did not mention quality attribute variability in the title or abstract but nevertheless contributed to it were not excluded; this increased the completeness. The scope of the search strategy was set to cover purposeful quality attribute variability in software product lines. For this purpose, the following inclusion and exclusion criteria were utilized. Inclusion criteria The primary study says explicitly (OR uses an example / case) that there is purposeful variability of quality attributes in a software product line OR that different products in a software product line have purposefully different quality attributes. Here, purposeful quality attribute variability refers to intentional, managed ability to choose or derive products with different quality attributes. Exclusion criteria The primary study does not explicitly mention that the quality attribute variability takes place in a software product line or a software product family. For example, component-based software, service-oriented software and self-adaptive architectures without any link to software product line paradigm are excluded. Exclusion criteria Quality attribute variability is not part of the study contribution, for example, it is mentioned only in the related work, discussion, or future work. Exclusion criteria The study is not a peer-reviewed publication: for example, books, book chapters, websites and tech reports are excluded. The contribution is not assessable from the study: for example, studies not written in English are excluded, as well as tutorial and panel summaries. Table 2 illustrates the search and selection iterations in the research protocol; the order of iterations followed the guidelines by Wohlin (2014). Firstly, the initial start set (Wohlin 2014) was identified and selected by reading through all full publications in the Software Product Line Conferences up until 2010; further details about this process is reported by Myll¨arniemi et al. (2012). After applying the revised set of inclusion and exclusion criteria, 26 primary studies were selected for snowballing. For backward snowballing, the primary studies in the start set were processed as follows. The reference list of each primary study was pruned based on the recommendations by Wohlin (2014): by firstly looking at the publication type, and thereafter by looking at the context of the actual reference in the primary study. If an item in the reference list passed both criteria, it was deemed as a candidate for selection. After all reference lists in the start set had been examined, the candidates for selection were recorded, duplicates removed, and new, previously unprocessed studies retrieved. The inclusion and exclusion criteria were then applied, based on the full content, for all retrievedstudies. For forward snowballing, the primary studies in the start set were processed as follows. Two citation databases were used: ISI Web of Science and Scopus. The forward citations covered studies published up until February 2013. For each primary study in the start set, the studies that cited it in either database were recorded, duplicates removed, and new, previously unprocessed studies retrieved. The inclusion and exclusion criteria were then applied, based on the full content, for all retrieved studies. As the result of the iterations in Table 2, 140 primary studies were selected; however, during the analysis phase, one primary study was still excluded based on its contribution.

Empir Software Eng Table 2 The backward and forward snowballing iterations taken to select the 139 primary studies Search action

Start set

Candidate for selection

Selected as new

Manual reading



221

26

Manual reading: 26 primary studies selected Backward snowballing

26

92

28

Backward snowballing

28

74

7

Backward snowballing

7

17

1

Backward snowballing

1





Backward iterations: 36 primary studies selected as new Forward snowballing

62 (= 26+36)

342

54

Forward snowballing

54

69

9

Forward snowballing

9

1



Forward iterations: 63 primary studies selected as new Backward snowballing

63

155

13

Backward snowballing

13

30

1

Backward snowballing

1





Backward iterations: 14 primary studies selected as new Forward snowballing

14

52

1

Forward snowballing

1









Forward iterations: 1 primary study selected as new Backward snowballing

1

In total: 140 primary studies selected; 1 primary study excluded in analysis

3.4 Data Analysis The data analysis was iterated with the data collection and validation (Table 1). We adopted some of the analysis principles from the grounded theory (Urquhart et al. 2010; Strauss and Corbin 1998). The analysis included understanding and uncovering the phenomena of the case; constructing a descriptive and explanatory account; making conceptual generalizations; and constructing descriptions and explanations about the phenomena in more general terms (Lee and Baskerville 2003; Gregor 2006). In the first analysis stage, the first author analyzed all the data from the case, which was in textual form, using light-weight coding of text passages. Through coding, initial concepts were identified and compared between different sources for data, thus following the constant comparison guideline (Urquhart et al. 2010). For example, concepts that were identified in the informal discussions were also analyzed from the documents and the publicly available information. The low-level concepts were then generalized to understand the phenomenon of the case. Further analysis took place in the informal discussions where the case and generalizations were discussed. Notes of the analysis were kept and recorded in the case study database. However, to minimize researcher bias due to close involvement with the case, the first author acted as a primary investigator in the analysis. In the second analysis stage, new case data that emerged from the validation session was added to analysis, thus serving as an additional slice of data (Urquhart et al. 2010). Since this data collection was partly analytically driven, that is, driven by the open questions

Empir Software Eng

that were raised from the first analysis, this can be considered as an instance of theoretical sampling (Urquhart et al. 2010). The resulting analysis identified new low-level concepts, which were in turn compared and analyzed against other data, and generalized. During the analysis, issues with the emerging concepts were resolved with e-mail exchange with the case chief architect, again recording all additional data to the case study database. In the third analysis stage, the final validation comments were taken into account, and the case study account and findings were reported. These results have been reported in our earlier work (Myll¨arniemi et al. 2013). In the last analysis stage, the existing literature served as additional slices of data. For this purpose, the 139 primary studies of quality attribute variability were analyzed, and accounts and examples that cover performance variability were coded in a light-weight manner in the publications. When building the theory, the primary studies were again visited to find instances of the already identified categories, thus serving as a way to saturate categories (Urquhart et al. 2010). The primary studies also served as a way to identify a few new categories as well as to state the theory boundaries (Gregor 2006). When analyzing the primary studies, there was an explicit aim to ensure neutrality and objectiveness: in particular, we analyzed accounts from the literature that were both similar and contrasting to the findings of our case study. In the resulting models, most of the explanations and characterizations were originally identified from the case account; however, additional examples were drawn from the literature to aim at theoretical saturation. Also, literature served as a way to identify the scope of the explanations and characterizations, that is, to identify the boundaries of the theory (Gregor 2006). Further, a few categories were identified solely from the primary studies through comparison to the case account. As a concrete example of the analysis interplay between the case account and primary studies, the case account was first used to identify the downgrading realization (Section 5.2). Thereafter, this characterization was compared with the approaches often presented in literature, which led to the identification and formulation of the impact management realization. Finally, the trade-off realization was identified from the literature and the taxonomy (Fig. 7) was identified based on the concept of a design tactic (Bass et al. 2003). As a result, we constructed three theoretical models for describing and explaining (Gregor 2006) the phenomenon. To ensure analytical generalization (Yin 1994), the constructs and relations in the models were described in domain-independent terms, aiming at raising the degree of conceptualization and scope (Urquhart et al. 2010). To mark the theory boundaries (Gregor 2006) as settings in which models can be applied, we identified the scope either analytically or relying on existing literature. Thus, building theoretical models with domain-independent constructs and interpretations and explicit scope aimed at generalizing the results beyond the domain of the case account.

4 Results as the Case Account In the following, we give an overview of the case product line and describe the results to each research question; these are summarized in Table 3. The case account focuses on capacity variability in a base station product line, which means the account is kept specific to the case study domain. Thereafter, Section 5 describes the results in more general terms by proposing theoretical models on performance variability in software product lines.

Empir Software Eng Table 3 Summary of the case account RQ1

Which characteristic of performance is decided to be varying? Capacity, that is, the maximum number of phone calls one base station can serve at a time. Characterized for a certain base station configuration as the available uplink and downlink channel elements. Capacity was a key driver for the operators in making the investments in the 3G networks.

RQ2

Why is performance decided to be varied? Initial differences in the capacity needs due to different usage estimations. The importance of capacity to the operators, which enabled price differentiation. Operators were able to understand the characteristics related to capacity and network planning, and these characteristics were guaranteed by the vendor. The evolution of the usage and hence capacity needs for long-lived products. Starting with smaller investments and upgrading later supported price differentiation and brought flexibility to the operators.

RQ3

What is the strategy for realizing performance variability within the product line architecture? Both software and hardware were used to realize capacity variability. Software realization: Downgrading by restricting channel elements visible for the software components. Mostly utilized at runtime variability binding. Hardware realization: Different installed hardware in products, software scalability achieved with resource abstraction and layers. Mostly utilized when the base station was taken into use.

RQ4

Why is the realization strategy chosen? Motivation for the downgrading software realization: Quick and efficient runtime rebinding, compared with the cost and difficulty of on-site maintenance for hardware upgrades; this better supported the operators in starting with smaller investments and upgrading capacity as needs evolved. The downgrading mechanism was architecturally focused, and the mechanism was mostly independent of other software variability in the base station; this made testing easier. The runtime capacity variability was not introduced to resolve design trade-offs but to enable upgrades and price differentiation. Motivation for the hardware realization: Trade-off between capacity and production costs: expensive Bill of Material (BOM) for efficient hardware. Known practices for implementing the software scaling.

4.1 Overview of the Case Product Line The case study company is Nokia, formerly Nokia Solutions and Networks. Nokia is one of the largest vendors in the domain of mobile telecommunication network products. This case study covers the product line named IP-BTS, which is a configurable base station in 3G (3rd generation) radio access networks. Products from the IP-BTS product line were base stations (Node Bs) that utilized 3G radio access technologies, such as WCDMA (Wideband Code Division Multiple Access).

Empir Software Eng

The domain and scope of the IP-BTS product line is illustrated in Fig. 2. The IP-BTS base station took care of connectivity between mobile phones and the rest of the core telephony network infrastructure. For this purpose, the IP-BTS had responsibilities in three different planes: the user plane carried speech and packet data; the control plane controlled the data, connections, cells and channels; and the management plane took care of network management and base station administration. A base station typically contains a cabinet, an antenna mast and the actual antenna. Additionally, a radio network controller (RNC) was responsible for controlling several base stations such as handling soft handover to another base station when a mobile phone moves out of the range of one base station. Together the IP-BTS base stations and RNCs formed a Radio Access Network (RAN), which was responsible for handling traffic and signaling between a mobile phone and the Core Network. The IP-BTS base station was designed to work in All-IP RAN, that is, in a radio access network based on IP (Internet Protocol). From the design point of view, the IP-BTS base station was a complex telecommunication network element operating in a resource-constrained, embedded environment. To characterize the size of the software, the design was divided into less than 20 systemlevel components, out of which some were described in more detail in separate architecture documents; overall the software would have consisted of millions of lines of code. The customers of IP-BTS were mobile phone operators who invested in the 3G infrastructure. The motivation that initiated the design of IP-BTS was the anticipated introduction of All-IP radio access networks (Bu et al. 2006). IP-BTS was designed to support several

CN (Core Network)

Iu-CS

Iu-CS

Iu-PS

Iu-PS

All-IP 3G RAN (Radio Access Network)

RNC (Radio Network Controller) IuB

IuR

RNC

IuB

IuB

Scope of the case study product line

P-BTS ase station

Uu

Uu

IP-BTS base station

Uu

UE UE UE (UE= User Equipment, e.g. mobile phone)

IP-BTS base station Uu

Uu

UE

UE

Uu

UE

Fig. 2 The products from the case product line, IP-BTS, were base stations (Node Bs) in All-IP 3G RAN

Empir Software Eng

radio access standards as well as both IP-based and traditional, point-to-point RAN: the idea was to be more flexible and thus replace the first-generation Node Bs that supported only one specific RAN and radio access technology. However, IP-BTS project was discontinued before reaching the production stage: after the operators had made the large investments to the first 3G network elements, the idea of yet another round of investments did not take off, despite the argued benefits of IP-based RAN (Bu et al. 2006). When IP-BTS was discontinued, it was at the prototype stage; this took place approximately ten years ago. In total, the lifespan of the product line was approximately two years from the initial planning to the termination of the project. Despite the discontinuation, IP-BTS was designed to a detailed level, along with implemented prototypes. Since the architecture was designed and evaluated, technical documentation existed. Additionally, similar kind of flexibility in one base station has been later utilized successfully in the case company. In the following, we describe the main variability of the IP-BTS product line, focusing only on the aspects relevant for this case study (Fig. 3). One major source of variability in the IP-BTS base station was caused by the need to support several radio access standards and both traditional and IP-based RAN; this is represented as feature Radio access technology in Fig. 3. This choice also affected other functionality in the base station. For example, the user plane functionality, that is, carrying the data, was specific to the radio access protocol. Similarly, managing the base station cells, channels and resources at the control plane was also partly dependent on the radio

IP-BTS base station

HW configuration @installation-time

Radio access technology @build-time

Radio access standard

WCDMA

...

User plane processor units

...

Licenses @runtime

...

number = {n, … , m}

Processor Memory type

EDGE HSDPA Multimode

RAN

IP-RAN

Basic Licensed1 UL/DL UL/DL

Point-topoint RAN

Number of channels elements UL/DL

...

Licensedmax UL/DL

Operators use UL/DL (uplink/ downlink) channel elements for

calculating the capacity. 1 voice channel (12.2kbs) = 1 channel element. {value of Licensedmax = maximum of HW resources} @start-up

Resource management Legend Channel management and monitoring

HW management

...

{WCDMA => WCDMA channel management} {EDGE => EDGE channel

Feature @bindingtime attribute = {values}

management} ... WCDMA channel management Channel element downgrading @runtime

CEbasic

CE1

...

@build-time

CEmax

EDGE channel HSDPA channel management management @build-time @build-time

{Basic UL/DL => CEbasic} {Licensed1 UL/DL => CE1} . . . {Licensedmax UL/DL => CEmax}

{Constraint} Comment

Optional Mandatory

Alternative

Or

Fig. 3 An overview of the IP-BTS base station variability. The diagram is constructed merely to illustrate the case study results: only the features related to the results are shown; and some exact values have been obfuscated

Empir Software Eng

access technology. The radio access variability was bound at build-time and it was realized mostly through composition of software components. To lessen the cross-cutting effect of radio access variability, the software design separated the common, reusable parts of the base station software from the radio access specific parts. In addition, the IP-BTS product line had hardware variability that was bound when the hardware was installed and the base station was started (see feature HW configuration in Fig. 3). The IP-BTS base station was designed to support a varying set of hardware configurations and even new kinds of hardware components. In particular, there were dedicated hardware processor units for user plane processing to handle high-speed data streams; this was because the user plane processing was tightly constrained by requirements on capacity, throughput, latency and jitter. The drawback of hardware variability was in the rebinding effort: hardware changes required physical on-site maintenance and a break of service. Typically, the on-site hardware upgrades of the base stations involve the laborious installing of new hardware units, start-up, testing, integration to network, and taking into use. Further, the number of base stations is typically high (hundreds or even thousands), they may be geographically very scattered, and their accessibility is sometimes poor. Because of this, the hardware rebinding happens quite seldom: on average, the lifetime of the hardware in the case company base stations is about eight years although certain hot-spot areas may require more frequent hardware upgrades. Finally, it was possible to vary the base station functionality at runtime through licenses (represented as feature Licenses in Fig. 3). This eased rebinding task compared with hardware variability: the operators could remotely purchase and enable new functionality or capabilities in the base stations. When an operator wished to upgrade a base station through licenses, she entered her new license key to the network management system, which in turn connected to the base station; the new capabilities were then immediately available. This is called a license key driven configuration (Linden et al. 2003) and it is a known practice in the telecommunication domain (Jaring and Bosch 2002). Licenses and other runtime variability were mostly used to vary the management plane as well certain aspects of the control plane, such as channel management. To realize the licenses and other runtime variability, the design utilized parameterization and default parameter values for startup. 4.2 Varying Performance Characteristic (RQ1) The varying performance characteristic in the case study was the base station capacity (Table 3). In the IP-BTS base station, capacity referred to the maximum number of voice calls a base station can serve. Capacity could also have been defined as the amount of packet data one base station can route in a given time, but at the time of IP-BTS design, 3G packet data transfer was not that common. In general, capacity variability was not specific to the case base station, but has been a common phenomenon in the case study domain. In general, capacity and coverage were and still are two key drivers when the operators are making the investments in the mobile telecommunication networks. Therefore, the operators planned capacity and coverage in several stages and at different levels of detail. For capacity, the network planning involved estimating the traffic density and subscriber growth forecasts; and the planning result gave the needed number of base stations along with required station configurations and dimensioning parameters, such as base station interference and power. Due to the complex calculations involved, the operators did the network and capacity planning with dedicated tools. After the hardware configurations and dimensioning parameters were known, the capacity of a base station was characterized as the number of channel elements in both uplink and

Empir Software Eng

downlink directions (see feature Number of channel elements UL/DL in Fig. 3). A channel element was an abstraction of the resources that were needed to provide capacity for one voice channel. One voice channel could carry dozens of voice calls, and thus a channel element directly related to the maximum number of phone calls that could be served. The amount of channel elements that a base station configuration supported was calculated at the base station start-up. Thereafter, the operators could configure the base station capacity through purchasing a new license to increase the number of channel elements. 4.3 Motivation to Vary Performance (RQ2) The following explanations for the decision of varying capacity in the IP-BTS were identified (Table 3). First and foremost, the capacity needs for the base stations varied: different base stations had to cover different usage, that is, serve different number of phone calls. The usage varied both between the operators and between the operators’ individual base stations. Second, the capacity variants could be differentiated in pricing: a base station with higher capacity could be more expensive. The capacity was a key driver for the operators when making the investments in the networks. The operators could estimate the business value and the return of investment (ROI) of various capacity levels. Because a higher capacity could be justified financially, the customer was willing to pay more for better capacity and price differentiation was possible. The operators were well versed with the matters related to the base station capacity; this made it easier for the case company to distinguish the capacity variants, for example, through the number of channel elements available. Additionally, the capacity-related characteristics of the base stations were guaranteed. In general, a base station must deliver the quality that the operator pays for; the operator typically tests the products herself, possibly subjecting several competing products to a test bench before making the investment decision. Since the operators understood what the base station capacity meant and trusted that the promised capacity was delivered, it was easier to conduct price differentiation. Finally, the capacity needs evolved over time. The products in the telecommunication domain are very long-lived and have to be able to cope with evolving capacity needs after the installation. Although the usage of 3G networks was modest in the beginning, the traffic exploded with the deployment of new 3G-enabled devices. If the usage, that is, the number of phone calls made, exceeds the available capacity of a base station, users will experience unacceptable congestion. Therefore, the operators wanted to adjust the base stations to follow the evolving needs: as the usage of networks grows, the operators could upgrade the base station capacity. This mode of upgrading more capacity also supported price differentiation: the ability to start with smaller initial investments and to easily purchase more capacity later better served the needs of the operators. This enabled the case company to both differentiate between the products as well as from other competitors. 4.4 Performance Variability Realization (RQ3) The IP-BTS base station utilized both software and hardware means to vary capacity (Table 3); this is illustrated in Fig. 4. Traditionally, different levels of capacity in the telecommunication domain have been achieved by having different hardware in the product variants. With the hardware realization, a base station with more efficient hardware configuration yielded better capacity.

Licensedmax UL/DL

Licensedn UL/DL

Licensed’ 1 UL/DL

Hardware realization for capacity variability: Different hardware configurations, scalability of software by abstraction of resources. Utilized particularly when taking the base station in to use.

Basic’ UL/DL

Software realization for capacity variability: Downgrading available channel elements, abstraction of resources. Utilized particularly at runtime.

Basic UL/DL

Capacity as the number of voice channels

Empir Software Eng

Legend A base station variant; the height of the bar indicates the number of voice channels available in the variant

... Variants

Fig. 4 Both software and hardware were used to realize capacity variability in IP-BTS base stations

To implement capacity variability through hardware, the product line architecture was designed to support varying hardware configurations including different numbers of processing units and memory, and different processor types. Since there was dedicated hardware for the user plane processing, that is, for handling voice call traffic, the capacity could be upgraded by adding more or better hardware units to the user plane. This is illustrated in Fig. 3 as the varying feature User plane processor units. When the base station was started, the maximum number of channel elements available was determined from the installed hardware resources; this maximum amount is represented as feature Licensedmax UL/DL in Fig. 3. To enable the software to accommodate to the varying hardware, that is, to make the software scalable, the software architecture of the IP-BTS utilized a layered architecture to limit the hardware visibility, and abstraction of the hardware and available resources with property files. Only system-level software component BTS O&M (Table 4) was aware of the actual hardware configuration; for the rest of the system-level software components, the hardware capability was accessed through virtual devices and drivers. This is illustrated in

Table 4 Software architecture elements that were responsible for software realization of capacity variability Element

Responsibilities

BTS O&M

System component in the management plane that is responsible of capacity variability through runtime licenses as well as through managing the hardware configuration.

Option Manager, License Key Manager

Two logical components in the management plane that support setting runtime variability with licenses and corresponding options and provide a database and corresponding operations for application level parameters.

Resource Manager

A component in the control plane that is responsible for monitoring and restricting dedicated capacity resources, e.g., channel elements, in the user plane; a channel element corresponds to the capacity of one voice channel.

Empir Software Eng

Fig. 3 as features Resource management and HW management. The abstraction and management of actual hardware resources is an established practice in the telecommunication domain. However, the drawback of the hardware realization was that it did not support runtime rebinding. Therefore, software realization was used to vary the base station capacity at runtime (Fig. 4). For this purpose, the operators could buy licenses to enable different numbers of uplink and downlink channels (feature Number of channel elements UL/DL in Fig. 3). To implement capacity variability through software, the software architecture was designed to downgrade the capacity, that is, to downgrade the maximum number of uplink and downlink channel elements achieved with the current hardware configuration. At the time of designing the IP-BTS architecture, the exact downgrading mechanism was not decided, but later the decision was made to restrict the number of channel elements; this design has been used in other base stations in the case company. The variability imposed by this realization strategy is illustrated in Fig. 3 as feature Channel element downgrading. This feature was implemented by a dedicated component Resource Manager in the IPBTS software architecture (Table 4). Component Resource Manager monitored and limited the number used of resources by other software components, including the channel elements. Since a set of channel elements corresponded to a certain set of hardware resources, more channel elements could be added by enabling the corresponding, dedicated hardware resources in the user plane. Moreover, this could be done at runtime, compared with build-time variability. There were two important architectural aspects in realizing capacity variability with software. First, the software realization of capacity variability was not crosscutting: the aim was to implement the variability solution behind a handful of components (see Table 4). That the variability was not crosscutting of course required that the actual hardware resources were hidden from most software components; however, this was realized through software scaling. Second, the realization mechanism, that is, limiting the amount of available channel elements, was mostly independent of other software variability in the base station. Also the impact from other software variability to the capacity was kept to the minimum. Since the dedicated channel resources were reserved for handling the user plane traffic, the variability of management and control plane functionality did not directly impact the user plane capacity. There is, however, one exception to this: as channel elements are part of the radio access standards, reducing the channel elements needed to be implemented differently for different radio access technologies. Thus, the runtime downgrading was dependent on the radio access technology used, that is, on a specific build-time variability choice. This is illustrated in Fig. 3 as feature Channel management and monitoring having separate variants for different radio access technologies. 4.5 Motivation for the Realization (RQ4) The following explanations were identified for the selected realization strategies in the IPBTS base station (Table 3). The main motivation for the software realization was to make capacity upgrades easy for the operators; this supported both price differentiation between products and differentiation to the competitors. Due to the remote location of base stations, the cost of on-site hardware upgrade was high, it took more time, and the upgrades required that compatible hardware components should be available even after several years of installation. Because of this, the

Empir Software Eng

Initial investment for operator

Capacity (price)

Initial investment for operator

software realization brought flexibility depicted in Fig. 5: capacity rebinding could be made more often and operators could start with smaller initial investments and pay more as the needs evolved. Consequently, after the deployment of a base station, capacity rebinding was designed to be done mainly via software (see Fig. 4), and hardware upgrades were done only when the maximum license capacity was not enough. Moreover, the software realization was not introduced to resolve design trade-offs between capacity and other quality attributes, but the realization was about enabling upgrades and differentiation of capacity. Therefore, the software realization could simply utilize the downgrading strategy. Although several trade-offs existed in the base station design, they were resolved during the design in a way that fulfilled the maximum capacity requirements (represented as feature Licensedmax UL/DL in Fig. 3) in a certain hardware configuration; this design was then explicitly downgraded to reach lower capacity levels. Moreover, although the number of channel elements was reduced in the base station, this did not change other capabilities, such as the design of the channels or way the data was managed inside the base station. Similarly, the software realization was not about adjusting to differences in the production costs of capacity variants. With the downgrading realization, the production costs for the different license-based capacity levels were the same; however, the price of the capacity licenses varied. For the hardware realization, the main motivation was to minimize the cost of Bill of Materials (BOM) for the lower-priced and lower capacity base station, that is, to resolve the trade-off between hardware costs and capacity. This was because the hardware played a major role in the cost of the base stations. The hardware realization meant that the lower capacity variants had less efficient hardware configuration and a smaller cost of BOM. By contrast, the software realization meant all capacity variants had the same hardware cost, and due to price differentiation, the price-to-cost ratio was worse in the lower capacity base stations. Another motivation for the hardware realization was its ease and efficiency: the capacity as the amount of channel elements was directly affected by the user plane

Capacity > Usage

Legend Evolution of a base station that realizes capacity variability through software

Usage > Capacity (Congestion occurs)

Evolution of a base station that realizes capacity variability only through hardware A usage level where operator has purchased and upgraded to a new capacity variant

Usage

Fig. 5 Compared to the hardware realization (gray line), the capacity upgrades with the software realization (black line) could be made more often and the operators could start with less expensive base stations and upgrade as needed. This ensured both customer satisfaction and better price differentiation

Empir Software Eng

hardware (Fig. 3), and the domain had established practices for building software that scaled to different hardware configurations. This efficiency of the hardware is highlighted by the fact that even the software realization relied partly on hardware to alter capacity: adding more channel elements meant enabling related hardware resources in software. Additionally, the testing effort also affected the decisions on the realization. The vendor must thoroughly test the base station to ensure the capacity can be guaranteed to the operators. Since the software realization downgraded capacity without affecting other quality attributes, and the mechanism of downgrading channel elements was mostly independent of other variability in the base station, the testing effort was reduced by utilizing samplebased testing (c.f., sample-based analysis by Thum et al. (2014)). That is, instead of testing all capacity variants against all software variability in the base station, it was sufficient to test only the maximum, minimum, and selected throttled-down variants per one hardware configuration and radio technology. This was because the realization of the channel element management was dependent on the hardware configuration and the radio access technology (Fig. 3), but independent of other software variability in the base station. In the telecommunication domain in general, the hardware realization has been the traditional way of varying capacity, partly because of the ability to have lower BOM and production costs to lower capacity variants, partly because it has been straightforward to design and test. In the case study, the specific reason to utilize the software realization was to enable flexible upgrades to match evolving needs and better price differentiation for the operators. Further, the testing and implementation complexity of the software realization was alleviated by simply downgrading the resources needed to deliver the required channels.

5 Results as the Proposed Theoretical Models The case account in Section 4 serves to characterize the real-world phenomenon in its context, that is, capacity variability in a base station product line. However, from the mere case account, it difficult to see how the results can be generalized beyond this domain or to other performance attributes. To enable analytical generalization, the case study results can be built as theories (Yin 1994; Urquhart et al. 2010). Besides enabling generalization, theories allow knowledge to be accumulated in a systematic manner; this accumulated knowledge enlightens both research and practice (Gregor 2006). To describe the results in more general terms, we propose three theoretical models to characterize and explain performance variability in software product lines (see Table 5). The models have been constructed by augmenting the analysis of the case study account with examples and accounts in the literature (see Fig. 1). Each theoretical model consists of a number of characterizations or explanations, that is, of theory constructs and relations between the constructs that fit the theory type (Gregor 2006). All proposed models address purposeful performance variability in software product lines in general, not capacity variability in base stations in particular. The following tactics have been used to enable generalization (Table 5). First, the explanations and characterizations are defined using domain-independent concepts (Urquhart et al. 2010). Second, the scope is stated as boundaries showing the limits of generalizations (Gregor 2006), e.g., by identifying settings in which the explanations and characterizations may not be applicable. Third, where appropriate, further validation has been drawn by having several example instantiations either from the case account or from the literature. The example instantiations also indicate the origin: if the case is not mentioned as an example, the explanation or

Empir Software Eng Table 5 Summary of the proposed theoretical models Proposed theoretical model

Addresses Described in

Explaining the decision of varying performance purposefully

RQ2

Fig. 6, Tables 6 and 7

Characterizing the strategies for realizing performance variability

RQ3

Fig. 7, Tables 8 and 9,

Explaining the strategies for realizing performance variability

RQ4

Fig. 8, Table 10

All theoretical models contain Explanations or characterizations that are defined through domain-independent concepts to enable generalization (Urquhart et al. 2010). Scope as the identified limits of generalization (Gregor 2006). Example instantiations either from the case account or from the literature. Graphical model of the explanations and characterizations. Tables to describe the definitions, scope and example instantiations. The models describe performance variability in software product lines, and are based on the case account and accounts from the literature

characterization originated from the literature. Most of the explanations and characterizations in the models originated from the case account. 5.1 Motivation to Vary Performance (RQ2) When performance is varied purposefully, there is an explicit decision behind it. Based on the case account and previous studies, the decision of purposefully varying performance may be explained by the customer needs and characteristics; by product and Explanations related to product and design trade-offs

Explanations related to the customers Differences in the customer performance needs, caused for example by differences in the amount of events or data that need to be handled Differences in how customers are willing to pay for better performance

Trade-off between performance and production costs

Purposeful performance variability in a software product line

Evolution of the customer performance needs over time, long-lived products

Differences in the resources available in the product operating environment that constrain performance

Ability to distinguish the performance differences and guarantee the performance to the customer

Legend

An explanation to the decision

Trade-off between performance and other quality attributes

Explanations related to the operating environment constraints

A class of explanations

A decision

Identified as one explanation; no predictive causality

Fig. 6 Explaining the decision of purposefully varying performance in a software product line. The identified explanations can motivate the decision but do not necessarily imply causality. Details, scope and example instantiations are given in Tables 6 and 7

Empir Software Eng

design trade-offs; and by varying constraints stemming from the operating environment; these are illustrated in Fig. 6 and described in more detail in Tables 6 and 7. Firstly, the decision of varying performance may be explained by the customer needs and characteristics (Fig. 6), that is, by explanations related to the problem domain. These explanations include different customer needs; evolution of the customer needs; the customer’s ability to understand and trust the performance differences; and the willingness of some customers to pay more for better performance. From this perspective, one overarching driver is the ability to conduct price differentiation, that is, charging a higher price for better quality (Phillips 2005), or even price discrimination (Belobaba et al. 2009), that is, price differentiation without differences in the production costs. This was also evident in the case study. Another driver is the evolution of the customer performance needs: the amount of events or data that needs to be handled tend to grow in the long run. In fact, the concept of “variation point” has been suggested to be named as “evolution point” (Kozuka and Ishida 2011). Upgrading to better performance also supports price differentiation: the customer can start with inexpensive but less efficient product and upgrade to a premium version when the needs evolve. Secondly, the decision of varying performance may be explained by trade-offs stemming from the products or design (Fig. 6), that is, by explanations related to the solution domain. Such trade-offs may be between performance and other quality attributes or between performance and production costs; in particular, the latter are caused by the higher cost of more efficient hardware. Thirdly, the decision of varying performance may be explained by varying resources in the product operating environment that constrain performance (Fig. 6). This explanation creates a non-negotiable constraint, which constitutes a good reason to adapt the product. When looking at the prevailing literature, it seems most studies do not explicitly discuss the motivation to purposefully vary performance. When the motivation is discussed in the literature, the focus is typically on the solution domain: performance is motivated either by trade-offs or by the operating environment constraints. By contrast, the explanations related to the problem domain, that is, related to the customers, played a major role in the case account: this may be because the capacity was a key selling point to the customers. The value of Fig. 6 is in highlighting the diversity of situations in which it makes sense to vary performance. For each product line and context, the relevancy of the proposed explanations can be analyzed; thus the model in Fig. 6 helps to make more informed decisions regarding the product line variability. 5.2 Performance Variability Realization (RQ3) In order to realize product variants with different performance, the product line architecture must be able to create differences in performance. Based on the case account and literature review, a variety of strategies for realizing performance variability were identified; a taxonomy of the strategies is given in Fig. 7. Most importantly, performance variability can be realized with software and hardware means (Table 8). This is because performance is affected both by the software design and implementation and by the available hardware resources. Although it sounds quite obvious that performance variability can be realized through hardware differences, the literature does not really discuss this phenomenon. When hardware is discussed in conjunction with performance variability in the literature, it is not treated as a means of creating performance differences, but as a constraint to resource consumption (Section 2.3). In fact, hardware

Empir Software Eng Table 6 Explaining the decision of purposefully varying performance: the customers Differences in the customer performance needs, caused for example by differences in the amount of events or data, explain the decision of varying performance. Description

If there are differences in the customer performance needs, these differences can be satisfied with different performance variants. The customer performance needs are affected, for example, by the amount of events or data that the system must handle or store.

Scope

Differences in the explicitly stated customer needs is not always the reason to have performance variants (Myll¨arniemi et al. 2006b). Differences in the customer performance needs do not always lead to different variants (Kishi et al. 2001).

Example instantiation The case study; enterprise software systems (Ishida 2007); information terminals (Kishi and Noda 2000). Differences in how customers are willing to pay for better performance explain the decision of varying performance. Description

Differences in how customers are willing to pay for performance enable price differentiation. Price differentiation is a powerful way for vendors to improve profitability (Phillips 2005). Price differentiation can take place even without any differences in the production costs; this is called price discrimination (Belobaba et al. 2009).

Scope

The inability of the customer to explicitly understand or justify the higher price, for example, by relating it to her own business value and revenue, may decrease the willingness to pay more.

Example instantiation The case study; an electronic patient data exchange product line: some hospitals are willing to pay 15.000 Euro more for 0.3s smaller latency (Bartholdt et al. 2009). The evolution of the customer performance needs over time, together with long-lived products, explains the decision of varying performance. Description

If the performance needs increase over time, and needs exceed the capabilities of the product, it motivates to support future performance upgrades. By supporting flexible performance upgrades or “pay-as-you-go” models, the vendor ensures customer satisfaction and continuity.

Scope

Is more relevant if the product is designed to operate many years and the cost or effort of changing to another vendor is considerable (all indicated by the example instantiations). If the products are inexpensive and short-lived, instead of rebinding to a better performance variant, the customer can just buy a new product.

Example instantiation The case study; enterprise software systems (Ishida 2007). The ability to distinguish the performance differences and guarantee the performance to the customer explains the decision of varying performance. Description

To make differentiation easier, the customers must understand the differences between the performance variants and trust that the performance is what they pay for. This is not always the case: in many consumer product domains, the notion of “quality” is often described in imprecise and vague terms, and quality of the products is not guaranteed.

Empir Software Eng Table 6

(continued)

Scope

Idifferentiation between the products is not used, the differences in performance do not

Example instantiation

The case study.

need to be communicated to the customer (Myll¨arniemi et al. 2006b).

realization is only applicable to time behavior and capacity, since these are system properties (Table 8). Further, there are several different ways to realize performance variability with software (Table 9). In particular, the software realization in the case study was clearly different from the prevailing approaches in the literature: the realization either can rely on a specific design tactic or on managing impacts from other variability. In the following, we describe these software realization strategies, focusing especially on those identified primarily from the literature. As discussed in Section 2.1, performance is an architectural quality attribute (Bass et al. 2003). For time behavior, several entities in the architecture participate in the execution and thus contribute to the overall response time. For resource consumption, all included code modules in the product increase the overall binary footprint and all memory allocations increase the overall heap or stack memory consumption. Consequently, it is possible to vary performance by varying any of the software parts that contribute to performance. In such a case, performance variability is an emergent “byproduct” of other variability, or results from the impact of other variability. Functional variability may indirectly cause variation in qualities (Niemel¨a and Immonen 2007) and may even be an unwanted consequence: managing quality attributes in a product line is difficult, since each functional feature influences to some degree all system quality attributes (Bartholdt et al. 2009). However, if carefully managed, it is possible to use indirect variation to realize purposeful performance variability. In the impact management realization, differences in performance are realized by managing the indirect variation, that is, impact from varying features or components (see Table 9). To make the impact management possible, one must be able to characterize or measure how each varying feature impacts the performance and know how these impacts can be aggregated into overall product performance (see also Section 2.3). For example, if the leaf features are characterized with the memory consumption of that feature, the memory consumption of the composite features is then the sum from the constituent features (Tun et al. 2009). In addition, the impact management realization also needs to manage feature interactions (Siegmund et al. 2012b; Siegmund et al. 2013): the features (and components) in a software product line are not independent of each other, but their combinations may have unexpected effect on performance compared with having them in isolation. For example, when both features Replication and Cryptography are selected, the overall memory footprint is 32KB higher than the sum of the footprint of each feature when used separately (Siegmund et al. 2012b). Based on the literature, it seems that the impact management realization is relatively straightforward for resource consumption. Siegmund et al. (2013) illustrate how one can measure the impact and interactions of individual features on footprint and main memory consumption. Thereafter, the aggregation is about summing up the impacts (Tun et al. 2009; White et al. 2007). However, the impact management realization seems to be more challenging for system-level performance properties, such as time behavior and capacity. Although Soltani et al. (2012) imply that response time can be measured for and assigned per feature, Siegmund et al. (2012b) argue that time behavior is not meaningful at a feature level,

Empir Software Eng Table 7 Explaining the decision of purposefully varying performance: trade-offs and operating environment constraints A trade-off between performance and production costs explains the decision of varying performance. Description

A decision on the product or design may create a trade-off between performance production costs, for example, through more expensive hardware; this trade-off can be resolved with variability by having separate variants to optimize performance and low production cost, respectively. Otherwise the resulting cost (and product price) may end up being prohibitively high for some customers (Hallsteinsen et al. 2006a). This explanation leads to product differentiation, that is, having different price categories based on the production costs and product quality (Phillips 2005; Belobaba et al. 2009).

Scope

Assumes the production cost differences are considerable and reflected in the pricing; and that some customer segments are willing to pay the higher price.

Example instantiation The case study: more expensive base station hardware for better capacity. An electronic patient data exchange product line (Bartholdt et al. 2009): an expensive license for a better-performing external software component. A trade-off between performance and other quality attributes explains the decision of varying performance. Description

A decision that enhances other quality attributes, such as security, reliability or modifiability, may impose a penalty on performance (Bass et al. 2003; Barbacci et al. 1995); this trade-off can be resolved with variability by having separate variants to optimize performance and other quality attributes, respectively.

Scope

Assumes the trade-off between quality attributes is considerable; and the customers should have different, conflicting needs, or different preferences over quality attributes.

Example instantiation An electronic patient data exchange product line (Bartholdt et al. 2009): the secure channel increased the response time by 50 percent. Terminal application that varies the length of the encryption key (Myll¨arniemi et al. 2006a). Having better graphics increased game attractiveness but decreased performance (Myll¨arniemi et al. 2006b). Differences in the resources available in the product operating environment that constrain performance explain the decision of varying performance. Description

Performance may be constrained by resources that are outside the control of the product line owner; examples include CPU, buses, memory, disk, and network connection. If these external resources vary, it may be necessary to adapt the product performance instead of providing a product that consumes the least amount of resources.

Scope

The resources have to constrain performance and be outside the product line scope (for example, hardware resources for software-only product lines). The single solution that consumes the least amount of resources must have otherwise unwanted consequences.

Example instantiation The varying operating environment resources of mobile and embedded software products are often stated as the reason to vary: train ticket reservation service (White et al.2007); mobile games (Myll¨arniemi et al. 2006b); database management systems (Siegmund et al. 2012b); personal mobile assistants (Hallsteinsen et al. 2006b).

Empir Software Eng

but can be characterized only per product variant. Further, Soltani et al. (2012) argue that summing up the impacts can be also applied to response time; however, this is not the case if the leaf features do not directly map to tasks that are executed sequentially, or if there is contention for system resources. Another challenge of the impact management realization is that the derivation is difficult without dedicated tool support. It may be possible to manage the impacts manually by trying to codify the tacit knowledge into heuristics, or by comparing with predefined reference configurations (Sinnema et al. 2006). However, when the case study company needed to create a high performance variant, the manual impact management caused the product derivation to take up to several months instead of only a few hours (Sinnema et al. 2006). Even when supported with a tool that evaluated the performance of a given configuration, only a few experts were capable of performing a directed optimization towards the highperformance configuration (Sinnema et al. 2006). Even with tool support, the algorithms behind the tools may be computationally expensive, as discussed in Section 2.3. In addition to being impacted by other variability, performance can also be altered through explicit architecture design. Several design tactics, such as decreasing resource demand or increasing resources, or patterns as their instantiations, can be used to improve performance (see Section 2.1). Further, some tactics and patterns improve other quality attributes, such as security or reliability, at the expense of performance. In a design tactic realization (see Table 9), varying architectural tactics or patterns are purposefully introduced in the design to create performance variability. Compared with the impact management realization, in which the performance differences emerge as an impact from the overall variability, a design tactic realization relies on a purposeful design mechanism using which performance can be altered. Varying architectural styles and patterns to vary quality attributes is also addressed elsewhere (Cavalcanti et al. 2011; Matinlassi 2005; Hallsteinsen et al. 2003). Two different kinds of design tactic realizations can be identified (see Table 9). Firstly, in the downgrading realization, one or more varying design tactics are used in the design to decrease performance without trying to affect other capabilities; the case study downgrading involved disabling the available hardware resources programmatically. By contrast, the trade-off realization varies design tactics that increase or decrease performance at the

Legend

Performance variability realization strategy

Characterization in the proposed model

Hardware realization

Is-a

Software realization

Impact management realization

Downgrading realization

Design tactic realization

Trade-off tactic realization

Fig. 7 Characterizing the strategies for realizing performance variability in a software product line. Definitions, scope and example instantiations are given in Tables 8 and 9

Empir Software Eng Table 8 Characterizing of the realization: hardware and software. See the taxonomy in Fig. 7; Table 9 elaborates the software strategies Performance variability realization strategy Description

The explicit product line architecture design means, and to the corresponding implementation, of purposefully creating differences in performance between the product variants. A software product line can apply several strategies simultaneously.

Hardware realization Description

Differences in performance are achieved by having different installed hardware in

Scope

Only applicable to those software product lines that include both software and hardware.

the product variants. Applicable to time behavior and capacity: they are system properties that are directly affected by the hardware resources and exhibit only at a system level. Not applicable to memory consumption, which is a property of the software that is constrained by the hardware: one cannot purposefully vary the software memory consumption by varying hardware. Example instantiation

The case study. Also instantiated for reliability as hardware redundancy in weather station systems (Kuusela and Savolainen 2000).

Software realization Description

Differences in performance are achieved by varying software; all products have the same

Scope

When the product line scope consists of software only, software realization is the

hardware installed. only choice (c.f., (Myll¨arniemi et al. 2006b)). Applicable to time behavior, capacity and memory consumption (see example instantiations). Example instantiation

The case study, database management systems (Siegmund et al. 2012b); mobile phone games (Myll¨arniemi et al. 2006b); graph product line (Sincero et al. 2009; Bagheri et al. 2012).

expense of other quality attributes. As an example, 3D mobile games utilized a number of tactics related to game graphics and game levels to decrease the resource demand at the expense of game attractiveness and playability (Myll¨arniemi et al. 2006b). The applicability of the design tactic realization is limited as follows. Firstly, the selected design tactic must considerably affect performance: if the tactic has only a small impact on the overall product performance, indirect variability may outweigh any performance differences achieved through the tactic. Secondly, a crosscutting tactic may cause architecture-wide variation (Hallsteinsen et al. 2006a) and hence be difficult to develop and manage; it is therefore advisable to localize the tactic realization. In the case study, one component was able to implement the resource downgrading and the actual resources were abstracted from other software components. 5.3 Motivation for the Realization (RQ4) There may be different kinds of explanations behind the decision of realizing performance variability with a certain strategy described in Section 5.2. Based on the case account and

Empir Software Eng Table 9 Characterizing of the software realization: impact management and design tactics. See the taxonomy in Fig. 7 Impact management realization is-a Software Realization Description

Differences in performance are achieved by indirect variation from software features or components: performance variability is an emergent by product from other software variability. This realization happens by characterizing or measuring the impact of each feature or component to performance; during derivation, individual impacts are aggregated to the overall product performance, taking into account feature or component interactions. A feature impact characterizes how a particular feature contributes to performance; a feature interaction occurs when the feature impacts depend on the presence of other features.

Scope

May be difficult for system-level performance attributes, such as time behavior or capacity.

Example instantiation

Database management systems (Siegmund et al. 2012b); Web shop product line

May be difficult without efficient derivation support. (Soltani et al. 2012); intelligent traffic systems (Sinnema et al. 2006). Design tactic realization is-a Software Realization Description

Creates differences in performance by one or more purposefully introduced, varying design tactics (Bass et al. 2003) that affect performance; performance variability is managed through these tactics. Can be either about downgrading or trading off tactic consequences (see below).

Scope

The selected tactics should affect performance considerably compared to indirect variability. A cross-cutting tactic may be difficult to develop and manage.

Downgrading realization is-a Design Tactic Realization Description

Vary design tactics with the purpose of decreasing performance without affecting other quality attributes, for example, by limiting the available resources with software. Can be done by limiting both hardware and software-based resources through operating system or middleware, for example, as enabled hardware, or as the connections or processes serving incoming requests. Thus, can be both hardware neutral and hardware dependent, c.f., Jaring et al. (2004).

Scope

Downgrading the resources cannot be used to create differences in resource consumption

Example instantiation

The downgrading of the channel elements in the case study.

(see Table 8). See also the scope of the design tactic realization.

Trade-off tactic realization is-a Design Tactic Realization Description

Vary design tactics with the purpose of decreasing performance but increasing other

Scope

See the scope of the design tactic realization.

Example instantiation

Attractiveness and resource consumption in mobile phone games (Myll¨arniemi et al.

quality attributes or lowering the production costs.

2006b); memory consumption and resilience without connectivity for maintenance assistant applications (Hallsteinsen et al. 2006b); patient data exchange system (Bartholdt et al. 2009).

Empir Software Eng

accounts from the literature, the identified explanations behind different realization strategies are illustrated in Fig. 8 and Table 10. However, the model is not exhaustive, but other explanations may also be identified. One overarching theme is that the reason to vary performance in the first place (RQ2) also affects the decisions on the selected realization strategy (RQ4). If performance variability is motivated by a trade-off in the software, it is straightforward to vary that trade-off to realize performance differences. Similarly, if performance is motivated by the hardware production cost differences, the realization should involve having different hardware in the products. Thus, when a trade-off in the solution domain motivates performance variability, it makes sense to vary performance through that trade-off. By contrast, when there are no trade-offs involved, but variability is motivated by price differentiation, downgrading is a straightforward way to alter performance without affecting any other capabilities. That is, it is not always necessary to try to maximize performance from the design. If some customers are satisfied with lower performance and are willing to pay less, and there are no specific trade-offs involved, it may make sense to simply downgrade the premium version. Moreover, downgrading also supports the mode of offering an inexpensive (or even completely free) version with less performance, and later letting the customers upgrade to a premium-priced version. Similar examples can be found elsewhere, for example, in the way Spotify packages the better bitrate to its premium version in order to attract the customers to pay for its services. Thus, doing price differentiation with downgrading may actually increase the competitive edge to the competitors since it is possible to target a wider setting of different customer needs. This was also evident in the case study.

6 Discussion 6.1 Validity and Reliability Validity refers to whether the results correspond to the reality. The results of a case study can be studied from different perspectives: construct validity, internal validity, external validity, and reliability (Yin 1994).

Trade-off between performance and hardware production costs; ability to develop scalable software

Hardware realization

Legend An explanation to the decision

Performance variability is motivated by trade-offs in the software design

Software realization

The need for the customers to rebind performance easily at run time without changing product functionality

Design tactic realization

Performance variability is motivated by differentiation, not by trade-offs

Downgrading realization

The need to vary or optimize several product characteristics; functionality-wise similar, rich or invisible variability

Impact management realization

A decision

Explains; no predictive causality

Fig. 8 Explaining the decision of using a specific realization strategy. Details, scope and example instantiations are given in Table 10

Empir Software Eng Table 10 Explaining the decision of using a specific realization strategy A trade-off between performance and hardware production costs, along with the l ability to develop scalable software, explains the hardware realization. Description

With the hardware realization, variants with lower performance typically have less expensive hardware, which either means a better profit margin or a lower product price. (This is especially relevant when the hardware is expensive or when the products are massmarket products with tight profit margins.) The gain in the production costs should outweigh the effort of developing scalable software; therefore, it must be known how to implement software scaling, for example, through explicit resource management.

Scope

Not applicable if the product line scope consists of software only, or the performance

Example instantiation

The case study.

attribute is not a system property (see Table 8).

If performance variability is motivated by trade-offs in the software design, it explains the use of the software realization. Description

If the decision of varying performance is motivated by a trade-off, and the trade-off concerns software design, it is straightforward to realize the variability through software by varying that particular design trade-off.

Scope

Assumes it is possible to vary the design characteristic that causes the trade-off.

Example instantiation

Game graphics that improve attractiveness but decrease performance (Myll¨arniemi et al. 2006b). Messaging queue that improves performance and increases production costs (Bartholdt et al. 2009).

The need for the customer to rebind performance easily at runtime without changing product functionality explains the use of the design tactic realization. Description

If the customer needs to upgrade the performance, and the aim is to provide easy and quick rebinding, it motivates the use of software realization. Moreover, if the aim is to upgrade performance independently of other varying functionality, it is easier to use a design tactic realization. This is because the impact management realization changes performance by changing other features: for example, to decrease the resource consumption, one may have to drop out feature Statistics (Siegmund et al. 2012b).

Scope

The assumption that hardware upgrades cannot be automated may not hold for infrastructure-as-a-service. It may be difficult at runtime to rebind a design tactic that is not independent of other variability: the variants must be tested beforehand and exhaustive variant-based testing (Siegmund et al. 2012b) may not be possible.

Example instantiation The case study: hardware upgrades were difficult, and the selected, mostly independent design tactic could be validated with sample-based testing beforehand. If performance variability is motivated by differentiation, not by trade-offs, it explains the use of the downgrading realization. Description

If performance variability is introduced to cater price differentiation, and there are no specific trade-offs involved, performance can be varied by simply downgrading the best available performance.

Empir Software Eng Table 10

(continued)

Scope

Differentiation can also be realized with design trade-offs, for example, to conduct price differentiation to cater for different hardware production costs.

Example instantiation The licensed capacity variability in the case study.

The need to vary or optimize several product characteristics, along with functionality-wise similar, rich or invisible variability, explains the use of the impact management realization. Description

The impact management realization supports derivation that takes into account functionality and quality attributes all at once (Ognjanovic et al. 2012; Soltani et al. 2012) and enables multi-attri bute optimization (Olaechea et al. 2012). Impact management realization needs other variability to alter performance; this is not an issue when the product line has functionally similar features with different performance characteristics, rich variability (Olaechea et al2012) or user-invisible variability (Sincero et al. 2010).

Scope Example instantiation

Database management systems (Siegmund et al. 2012b), algorithm-oriented applications(Bagheri et al. 2010; Bagheri et al. 2012).

Construct validity is about establishing correct operational measures to be able to answer the research questions (Yin 1994). One threat to construct validity may be posed by the lack of interviews that provide rich qualitative data. That is, do the measures or the interpretations from the documents really correspond to the concepts? However, the lacking richness and the risk of incorrect interpretations were alleviated by having an involved participant as an author, as well as by asking clarifying questions from the chief architects. Further, validation with the key informants (Yin 1994) was used twice as a tactic to enhance construct validity. Triangulation and multiple sources of evidence were also used to address construct validity (Yin 1994). The threat of biased observations from the participating author; the threat of incorrect interpretation from the documents; and the threat of incorrect measures in the questions and answers with the chief architects were all mitigated by checking all sources of data against each other. Another threat to construct validity is the post mortem nature of this case study: the product line was discontinued before it was taken into use. Even if the measures were correct, do the measures properly represent concepts related to operation, such as future capacity upgrades? Also, do the measures on a discontinued base station correctly represent successful base stations? This threat was mitigated by the architects contrasting this specific base station to other operational base stations. Consequently, similar results seem to apply to base stations in general in the case company portfolio. Further, the units of the analysis, that is, the performance variability and the related design decisions, were established similarly to successful base stations before the product line was discontinued. Finally, the unit of analysis and the conceptualizations made of it were not directly related to the reasons of discontinuing the project. Further, another threat to construct validity was that only architects and architecture evaluators were involved in the data collection, and the documents were only architectural in nature. Therefore, can the measures be used to operationalize concepts related to customers and their intentions, that is, to answer research question RQ2? However, when studying the motivation to purposefully vary, it is often not about the intentions of the customers them-

Empir Software Eng

selves, but about how the product line owner interprets the intentions of the customers. Further, software architects have to have a solid understanding of the stakeholders’ needs and concerns (Bass et al. 2003) in order to be able to make informed decisions. As a final threat to construct validity, the data analysis of both case data and existing literature utilized only light-weight coding that served to identify low-level concepts and relations. A large part of the case account analysis was conducted through writing and informal discussions, whereas the analysis of the literature took place mostly through comparison to the case account. Thus, are the operationalized high-level concepts and relations grounded to the data (Urquhart et al. 2010)? However, during different stages of the analysis, newly identified concepts were checked against the existing data, and the case study account was validated. Additionally, when augmenting the theory with the literature, the original primary studies were revisited once again. Internal validity is about mitigating the threats to establish incorrect causal relationships between the constructs (Yin 1994). Although this was an explanatory case study, the point was not to establish causality. The relations between the constructs in the explaining theories were not about causality, that is, “if X, then Y”. Instead, they were about insufficient and unnecessary (Shadish et al. 2002) but affecting factors that contributed to the motivation: “Y was motivated by X”. The motivating factors were validated with the chief architects, which validates the inferences made to create the explanations. However, there may be other rival explanations (Yin 1994), that is, motivating factors that were not identified and which may also be explaining the phenomenon. External validity is about generalizing the findings of a case study (Yin 1994). Even the results of a single case study can be of value when generalizing analytically (Yin 1994). This is because case studies are generalizable to theories and not to populations (Runeson and H¨ost 2009; Yin 1994). Therefore, instead of only describing the case account, the results were formulated into the proposed theoretical models (Section 5). To ensure generalization to other domains and settings, the theory constructs and relations were described in a domain-independent way; and the scope of each characterization and explanation was described as limit to generalization (Table 5). Where possible, several instantiations from the case and the literature were utilized to further validate the models. As a threat to external validity, it is possible that we identified the scope of our theories (Gregor 2006) incorrectly. That is, there may be some other situations in which the characterizations and explanations do not hold. This is partly because we did not employ any literal or theoretical replication (Yin 1994); only utilized a number of accounts and examples from the literature; and partly deduced the scope analytically. Therefore, future empirical evidence is needed to test the proposed theory scope: are there any specific situations in which the explanations and characterizations do not apply, and what is the reason for this? The post mortem nature of this case study may have implications for external validity: how can the results be generalized from a product line that was designed approximately ten years ago and then discontinued? However, the case unit of analysis is representative in the case company portfolio and the role of capacity in mobile networks is even more crucial today. Further, since the proposed models address the phenomenon in more general terms, the generalizability of the results is about the generalizability of the models: how well do the characterizations and explanations apply to modern or market-wise successful software product lines? At least within the case study domain, the characterizations and explanations seem apply to more current base stations as well. Further, example instantiations from the literature also mitigate the threats to external validity. Finally, we could not identify

Empir Software Eng

any characterization or explanation that was related to the reason of discontinuing the case product line. Reliability in a case study is about demonstrating that the protocol can be repeated with similar results (Yin 1994). For this purpose, the main tactic was related to producing all data into written form, and to establish a case study database to which all steps and actions were recorded. Validity of the results is also affected by the literature review, but its role is slightly different from a standalone systematic literature review. Although the selection process in the review protocol was conducted independently from the case study, the analysis and synthesis were carried out in conjunction with the analysis of the case account. The aim of our literature review was not to provide an analysis and synthesis of the literature as a standalone contribution, but the aim was to find both confirming and contrasting findings compared to the case account. Therefore, some proposed practices for stand-alone systematic literature reviews, for example, regarding the way individual studies should be described, may be exaggerated within the scope of this study. Nevertheless, we assess the quality of this systematic literature review with the questions used by Kitchenham et al. (2009). Firstly, are the inclusion and exclusion criteria described and appropriate (Kitchenham et al. 2009)? We believe this is a crucial aspect in a literature review that utilizes snowballing; therefore, we spent effort and several iterations on formulating the criteria. Secondly, is the literature search likely to have covered all relevant studies (Kitchenham et al. 2009)? This is mostly decided by the ability of the snowballing protocol (Wohlin and Prikladniki 2013; Wohlin 2014). Some indication is given by the high number of selected primary studies (139) compared with, e.g., the number of selected studies (196) about any variability and not just in software product lines (Galster et al. 2014). We did not exclude any studies based on metadata only, which meant more detailed scrutiny of the primary studies. Thirdly, is the quality or validity of the primary studies assessed (Kitchenham et al. 2009)? Within the scope of this study, a full quality assessment was not done; only the level of empirical evidence was evaluated (Section 2.4). However, studies with poor quality did not provide example accounts to be utilized. Fourthly, are the individual studies and their data described (Kitchenham et al. 2009)? To keep this study focused on the case, individual studies were not described; however, citations were used when appropriate to ground the results to the original primary studies. 6.2 Lessons Learned In the following, we discuss the novel insights that can be learned from our contribution. What can the research community and industrial practice gain from the case account in Section 4, and in particular, from the theoretical models proposed inSection 5? Firstly, to argue the novelty of our contribution: to the best of our knowledge, the characterizations and explanations in Section 5 have not been explicated before. Moreover, although several example instantiations exist in the literature, and literature was used as the main data source to identify some characterizations and explanations, our analysis and synthesis on them are completely novel: that is, the higher-level concepts and their relations in the proposed models have not been explicitly described before. Also, our aim is at gaining fundamental understanding about performance variability in its real-life context, in contrast to the studies that propose a method or technique and then validate it with an industrybased example. Even if some characterizations in the proposed models seem relatively obvious, like the hardware realization, they have remained more or less as tacit knowledge. Besides explicating and synthesizing such common knowledge into a more general model,

Empir Software Eng

the value of our study is in showing that such phenomenon is really happening and relevant in industrial product lines. One important contribution is that the decision of varying performance may be motivated by the customer needs and characteristics, by trade-offs or by varying constraints (Fig. 6). Consequently, the decision-making requires understanding the customer needs, customer value, pricing, technical constraints, production costs and design trade-offs. This indicates that quality attribute variability is a challenging topic that requires the careful analysis of both problem and solution domain. By contrast, the current literature does not much discuss the reason to purposefully vary performance, and in particular, does not discuss the customer needs and characteristics. It often seems that performance variability is just driven by the trade-offs or constraints that force to vary, instead of being driven by the differences in the customer needs and valuations. As an example, despite being an obvious explanation, it was somewhat difficult to find studies that explicitly say performance variability is due to different customer needs (Table 6). There may be several reasons for this lack of attention. Firstly, trade-offs may very well be one important souce of explanations in industrial product lines that vary performance: for example, the case study on 3D mobile phone games was partly explained by a trade-off between performance and game attractiveness (Myll¨arniemi et al. 2006b). After all, trade-offs indicate situations in which all customer needs cannot be satisfied with one product. Another reason may be the difficulty of realizing and managing performance variability, which causes the research effort to focus on the technical matters. However, some studies may just make certain assumptions, for example, that the customer always wants the best possible performance instead of wanting what fulfills her needs, and consequently trade-offs are the only reason to vary performance. However, as Fig. 6 indicates, the customer wants what fulfills her varying needs, and the willingness to pay a certain price is tied to satisfying these needs. Finally, this gap may be due to the prevailing constructive research paradigm: it is difficult to study the customer needs without empirical research that is conducted in a real industrial context. Another important contribution is to identify the variety of ways performance variability can be realized (Fig. 7). In the literature, the feature impact management realization is the prevailing although implicitly stated strategy. Therefore, it is interesting to see that the case company utilized a purposeful design tactic to downgrade performance: the aim was to keep the impact and interactions between capacity variability and other variability to the minimum, instead of utilizing impacts from other variability to realize capacity variability in an emergent fashion. There may be several explanations for this gap between literature and the case study. Firstly, the focus on impact management realization may be due to the dominance of feature modeling in the research community: the research has simply extended the feature models with quality attributes, instead of starting from the characteristics of quality attribute variability. Consequently, the impact management may be difficult to use to vary time behavior and capacity (Table 9). Secondly, the case study needed to support runtime rebinding and price differentiation: simply downgrading performance was a viable option when the aim was to let the customers start with an inexpensive product and later upgrade to a premium version (Table 10). Further, when the aim is to purposefully create and guarantee differences in performance, it may make sense to utilize a purposefully introduced design mechanism, instead of relying on emergent variability. Because of this gap, it would be extremely valuable to report more industrial cases, and to contrast their realization mechanisms to the proposed theoretical model in Section 5.2.

Empir Software Eng

The proposed theoretical models also provide insight into the nature of trade-offs in performance variability. Trade-offs in the solution domain may explain both the motivation (Section 5.1) and affect the selection of the realization strategy (Sections 5.2 and 5.3). In general, many quality attributes are often largely determined by the way the trade-offs are resolved in the design. It seems the models indicate two alternative approaches to realize performance variability in regard to these trade-offs. First option is to have several different resolutions of the trade-offs and switch between these either as varying design tactics or as emergent variability. The second option is to design the architecture and resolve the tradeoffs to fulfill the requirements in the full system configuration; thereafter, the performance can be either downgraded with software, or upgraded by having better hardware. To better serve price differentiation and future upgrades, the case study took the latter approach. Further, the proposed models in Section 5 indicate that different performance attributes, such as time behavior, capacity and resource consumption, should be treated and analyzed separately. Some of the characterizations and explanations only apply to certain performance attributes, for example, the hardware realization is not applicable to main memory consumption (Table 8). Further, the impact management realization is more complicated for response time than for memory footprint (Table 9). Also, since the case study was about capacity, that is, about maximum throughput, it may explain the use of both hardware realization and hardware dependent downgrading realization (Tables 8 and 9). After all, the hardware configuration typically sets the maximum limits to throughput and response time, whereas the response time and throughput usually vary between each execution. Therefore, we stress the need for the researchers to carefully explicate the scope and the assumptions their proposed methods make on the nature of varying quality attributes: even “performance” cannot be treated as one uniformly behaving quality attribute. The literature about quality attribute variability does not much address hardware; moreover, hardware is mostly treated as a constraint to memory consumption. This may be because software is often used to implement value-adding features and to differentiate products in software product line engineering. However, the case account and the proposed models pointed out the importance of hardware in performance variability, both as the means to realize and as a driver in the decision-making. The aim to optimize the trade-off between performance and hardware production costs both motivates the decision to vary in the first place (Table 7) as well as motivates the selection of the hardware realization (Table 10). However, the often stated assumption “a better performance variant costs more” is not necessarily true, as indicated by the case study: performance may be varied and priced differently even when the production costs are the same. How should the proposed models be utilized, or what is their value? For the practitioners, the explanations in Section 5.1 may help in analyzing the various drivers for the decision of varying, and also highlight the need to analyze customer needs and characteristics besides focusing only on solution domain trade-offs and constraints. Further, the models in Sections 5.2 and 5.3 may help in understanding the variety of different realization strategies for performance variability. For the researchers, the theoretical models help in positioning both existing methods and techniques as well as the reported accounts and case examples. The proposed models can also be used when designing future empirical research, for example, by following the hypothetical path (Stol and Fitzgerald 2013) and aiming at replication or negation: this is similar to doing multiple experiments on the same topic. As a concrete example, the explanations about the motivation to vary performance (Section 5.1) could be used to construct survey or interview questions about quality attribute variability in general.

Empir Software Eng

It may be that further empirical studies identify completely new characterizations and explanations, or even refute the models in Section 5 by identifying situations in which they are not applicable. This enables to accumulate knowledge and to build theories incrementally (Stol and Fitzgerald 2013).

7 Conclusions This paper studied the motivation and the realization of purposeful performance variability in software product lines. The study was conducted as a descriptive and explanatory case study of capacity variability in a post mortem mobile network base station product line. To build theories about performance variability in more general terms, the data analysis augmented the case study account with the existing literature to build theories, following the observational path to empirical software engineering research. As a result, we proposed theoretical models to explain the motivation to vary performance and to characterize and explain the realization of performance variability. The theoretical models were constructed to be applicable beyond this single case study and to performance in general: each characterization and explanation was defined in domain-independent concepts; scope was described as the identified limits of generalization; and example instantiations were drawn also from the existing literature. There are several lessons to be learned. Firstly, performance variability is not only motivated by trade-offs and constraints in the solution domain, but also customer needs and characteristics need to be analyzed in the decision-making. In particular, price differentiation and the need to offer future upgrades, that is, the problem domain explanations, may be a good enough reason to vary. Thus, performance variability is not only about resolving trade-offs; and the better performance variant does not always cost more to develop or produce. However, trade-offs and constraints are important explanations as well, since they indicate situations in which all customer needs cannot be satisfied with one product. The trade-offs, which can be between performance and other quality attributes, or performance and production costs, provide both a reason and a possible realization strategy in the form of a design tactic. From the realization point of view, the prevailing yet implicitly stated way to realize performance in the literature is the impact management realization, that is, performance variability is the emergent result of other variability in the software product line. Instead, downgrading the available resources and having more efficient hardware proved to be a straightforward way to vary performance; this was because the performance variability was introduced to support price differentiation and future runtime upgrades, not to resolve software trade-offs. Thus, there are clear differences between the case account and the dominant approaches in the literature. Finally, the proposed theoretical models indicate that different quality attributes may need to be addressed separately: even performance cannot be treated as one uniformly behaving attribute. As future work, further empirical research of quality attribute variability in industrial product lines is needed; the proposed models can be of value in this respect. Firstly, the characterizations and explanations can be used deductively, that is, to generate testable propositions that can be confirmed or refuted: the refuting studies are particularly important, since they help in better defining the scope of the theories. Secondly, the proposed theoretical models can be used inductively, that is, to construct new characterizations and explanations about performance variability. Also, it would be interesting to see how well the proposed models can be applied to other quality attributes than performance. Some char-

Empir Software Eng

acterizations and explanations may be specific to performance: for example, the hardware realization may be less applicable to security or usability. However, the explanations about the motivation to vary performance may be more or less directly applicable to other quality attributes as well. This calls for future work.

Acknowledgments Aki Nyyss¨onen, Jukka Peltola, Ari Evisalmi, and Juha Timonen from Nokia are acknowledged for evaluating the validity of and commenting on our results. Anssi Karhinen and Juha Kuusela are acknowledged for ideas and comments.

References Ahnassay A, Bagheri E, Gasevic D (2013) Empirical evaluation in software product line engineering. Tech Rep TR-LS3-130084R4T, Laboratory for Systems Software and Semantics Ryerson University Bagheri E, Di Noia T, Ragone A, Gasevic D (2010) Configuring software product line feature models based on stakeholders’ soft and hard requirements. In: Software Product Line Conference Bagheri E, Noia TD, Gasevic D, Ragone A (2012) Formalizing interactive staged feature model configuration. J Softw Evolut Proc 24(4):375–400. doi:10.1002/smr.534 Barbacci M, Longstaff T, Klein M, Weinstock C (1995) Quality attributes. Tech Rep CMU/SEI-95-TR-021, SEI Bartholdt J, Medak M, Oberhauser R (2009) Integrating quality modeling with feature modeling in software product lines. In: International Conference on Software Engineering Advances (ICSEA). doi:10.1109/ICSEA.2009.59 Bass L, Clements P, Kazman R (2003) Software Architecture in Practice, 2nd edn. Addison-Wesley Belobaba P, Odoni A, Barnhart C (2009) The Global Airline Industry. Wiley Benavides D, Mart´ın-Arroyo PT, Cort´es AR (2005) Automated reasoning on feature models. In: International Conference on Advanced Information Systems Engineering (CAiSE). doi:10.1007/11431855 34 Berntsson Svensson R, Gorschek T, Regnell B, Torkar R, Shahrokni A, Feldt R (2012) Quality requirements in industrial practice – an extended interview study at eleven companies. IEEE Trans Softw Eng 38(4):923–935. doi:10.1109/TSE.2011.47 Boehm B, Brown J, Kasper H, Lipow M, Macleod G, Merrit M (1978) Characteristics of Software Quality. North-Holland Publishing Company Bosch J (2000) Design and Use of Software Architectures: Adapting and Evolving a Product-Line Approach. Addison-Wesley Botterweck G, Thiel S, Nestor D, bin Abid S, Cawley C (2008) Visual tool support for configuring and understanding software product lines. In: Software Product Line Conference. doi:10.1109/SPLC.2008.32 Bu T, Chan MC, Ramjee R (2006) Connectivity, performance, and resiliency of IP-based CDMA radio access networks. IEEE Trans Mob Comput 5(8). doi:10.1109/TMC.2006.108 Cavalcanti RdO, de Almeida ES, Meira SR (2011) Extending the RiPLE-DE process with quality attribute variability realization. In: Joint Conference on Quality of Software Architectures and Architecting Critical Systems (QoSA-ISARCS) Clements P, Northrop L (2001) Software Product Lines—Practices and Patterns. Addison-Wesley Czarnecki K, Helsen S, Eisenecker UW (2005) Formalizing cardinality-based feature models and their specialization. Softw Proc Improv Pract 10(1):7–29. doi:10.1002/spip.213 Dub´e L, Par´e G (2003) Rigor in information systems positivist case research: Current practices, trends, and recommendations. MIS Q 27(4):597–635 Etxeberria L, Sagardui G (2008) Variability driven quality evaluation in software product lines. In: Software Product Line Conference. doi:10.1109/SPLC.2008.37 Etxeberria L, Sagardui G, Belategi L (2007) Modelling variation in quality attributes. In: VaMOS Fettke P, Houy C, Loos P (2010) On the relevance of design knowledge for design-oriented business and information systems engineering – conceptual foundations, application example, and implications. Bus Inf Syst Eng 2(6):347–358. doi:10.1007/s12599-010-0126-4 Galster M, Avgeriou P (2011) Handling variability in software architecture: Problems and implications. In: Working IEEE/IFIP Conference on Software Architecture, (WICSA). doi:10.1109/WICSA.2011.30 Galster M, Avgeriou P (2012) A variability viewpoint for enterprise software systems. In: Working IEEE/IFIP Conference on Software Architecture (WICSA) and European Conference on Software Architecture (ECSA). doi:10.1109/WICSA-ECSA.212.43

Empir Software Eng Galster M, Weyns D, Tofan D, Michalik B, Avgeriou P (2014) Variability in software systems—a systematic literature review. IEEE Trans Softw Eng 40(3):282–306. doi:10.1109/TSE.2013.56 Gimenes IMdS, Fantinato M, de Toledo MBF (2008) A product line for business process management. In: Software Product Line Conference. doi:10.1109/SPLC.2008.10 Gonz´alez-Baixauli B, Laguna MA, do Prado Leite JCS (2007) Using goal-models to analyze variability. In: VaMoS Gregor S (2006) The nature of theory in information systems. MIS Q 30(3):611–642 Guo J, White J, Wang G, Li J, Wang Y (2011) A genetic algorithm for optimized feature selection with resource constraints in software product lines. J Syst Softw 84(12). doi:10.1016/j.jss.2011.06.026 van Gurp J, Bosch J, Svahnberg M (2001) On the notion of variability in software product lines. In: Working IEEE/IFIP Conference on Software Architecture, (WICSA). doi:10.1109/WICSA.2001.948406 Hallsteinsen S, Fægri TE, Syrstad M (2003) Patterns in product family architecture design. In: Software Product Family Engineering (PFE). doi:10.1007/978-3-540-24667-1 19 Hallsteinsen S, Schouten G, Boot G, Fægri T (2006a) Dealing with architectural variation in productpopulations. In: K¨ak¨ol¨a T, Due˜nas JC (eds) Software Product Lines – Research Issues in Engineering and Management. Springer Hallsteinsen S, Stav E, Solberg A, Floch J (2006b) Using product line techniques to build adaptive systems In: Software Product Line Conference. doi:10.1109/SPLINE.2006.1691586 Hevner AR, March ST, Park J, Ram S (2004) Design science in IS research. MIS Q 28(1):75–105 Holma H, Toskala A (eds) (2000) WCDMA for UMTS: radio access for third generation mobile communications. Wiley IEEE Std 1061-1998 (1998) IEEE standard for a software quality metrics methodology IEEE Std 61012-1990 (1990) IEEE standard glossary of software engineering terminology Ishida Y (2007) Software product lines approach in enterprise system development In: Software Product Line Conference ISO/IEC 25010 (2011) Software engineering—product quality—part 1: Quality model ISO/IEC 9126-1 (2001) Systems and software engineering—systems and software quality requirements and evaluation (SQuaRE)— system and software quality models Jaring M, Bosch J (2002) Representing variability in software product lines: A case study. In: Software Product Line Conference Jaring M, Krikhaar RL, Bosch J (2004) Representing variability in a family of MRI scanners. Softw: Practice and Experience 34(1):69–100 Jarzabek S, Yang B, Yoeun S (2006) Addressing quality attributes in domain analysis for product lines. IEE Proc-Softw 153(2) Kang K, Cohen S, Hess J, Novak W, Peterson A (1990) Feature-oriented domain analysis (FODA) feasibility study. Tech Rep CMU/SEI-90-TR-21, ADA 235785, Software Engineering Institute Kang K, Lee J, Donohoe P (2002) Feature-oriented product line engineering. IEEE Softw 19(4) Karatas AS, Oguztuzun H, Dogru A (2010) Mapping extended feature models to constraint logic programming over finite domains. In: Software Product Line Conference Kishi T, Noda N (2000) Aspect-oriented analysis of product line architecture. In: Software Product Line Conference Kishi T, Noda N, Katayama T (2001) Architectural design for evolution by analyzing requirements on quality attributes. In: Asia-Pacific Software Engineering Conference. doi:10.1109/APSEC.2001.991466 Kishi T, Noda N, Katayama T (2002) A method for product line scoping based on a decision-making framework. In: Software Product Line Conference Kitchenham B, Pearl Brereton O, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering—a systematic literature review. Inf Softw Technology 51(1):7–15. doi:10.1016/j.infsof.2008.09.009 Kozuka N, Ishida Y (2011) Building a product line architecture for variant-rich enterprise applications using a data-oriented approach. In: Software Product Line Conference Kuusela J, Savolainen J (2000) Requirements engineering for product families. In: International Conference on Software Engineering (ICSE) Lee AS, Baskerville RL (2003) Generalizing generalizability in information systems research. Inf Systems Research 14(3):221–243. doi:10.1287/isre.14.3.221.16560 Lee K, Kang KC (2010) Using context as key driver for feature selection. In: Software Product Line Conference Linden F, Bosch J, Kamsties E, K¨ans¨al¨a K, Krzanik L, Obbink H (2003) Software product family evaluation. In: Software Product-Family Engineering (PFE) Matinlassi M (2005) Quality-driven software architecture model transformation. In: Working IEEE/IFIP Conference on Software Architecture

Empir Software Eng McCall JA, Richards P, Walters G (1977) Factors in software quality. Tech Rep TR-77-369, RADC Mellado D, Fern´andez-Medina E, Piattini M (2008) Towards security requirements management for software product lines: A security domain requirements engineering process. Comput Stand Interfaces 30(6):361– 371 Myll¨arniemi V, M¨annist¨o T, Raatikainen M (2006a) Quality attribute variability within a software product family architecture. In: Quality of Software Architectures (QoSA), vol 2 Myll¨arniemi V, Raatikainen M, M¨annist¨o T (2006b) Inter-organisational approach in rapid software product family development—a case study. In: International Conference on Software Reuse Myll¨arniemi V, Raatikainen M, M¨annist¨o T (2012) A systematically conducted literature review: quality attribute variability in software product lines. In: Software Product Line Conference Myll¨arniemi V, Savolainen J, M¨annist¨o T (2013) Performance variability in software product lines: A case study in the telecommunication domain. In: Software Product Line Conference Mylopoulos J, Chung L, Nixon B (1992) Representing and using nonfunctional requirements: A processoriented approach. IEEE Trans Softw Eng 18(6) Mylopoulos J, Chung L, Liao S, Wang H, Yu E (2001) Exploring alternatives during requirements analysis. IEEE Softw 18(1):92–96 Niemel¨a E, Immonen A (2007) Capturing quality requirements of product family architecture. Inf and Softw Technology 49(11-12) Niemel¨a E, Matinlassi M, Taulavuori A (2004) Practical evaluation of software product family architectures. In: Software Product Line Conference Ognjanovic I, Mohabbati B, Gaevic D, Bagheri E, Bokovic M (2012) A metaheuristic approach for the configuration of business process families. In: International Conference on Services Computing (SCC) Olaechea R, Stewart S, Czarnecki K, Rayside D (2012) Modelling and multi-objective optimization of quality attributes in variability-rich software. In: Fourth International Workshop on Nonfunctional System Properties in Domain Specific Modeling Languages Patton MQ (1990) Qualitative Evaluation and Research Methods, 2nd edn. Sage Publications Phillips R (2005) Pricing and Revenue Optimization. Stanford University Press Regnell B, Berntsson-Svensson R, Olsson T (2008) Supporting roadmapping of quality requirements. IEEE Softw 25(2):42–47 Roos-Frantz F, Benavides D, Ruiz-Corts A, Heuer A, Lauenroth K (2012) Quality-aware analysis in product line engineering with the orthogonal variability model. Softw Quality J 20(3-4):519–565. doi:10.1007/s11219-011-9156-5 Runeson P, H¨ost M (2009) Guidelines for conducting and reporting case study research in software engineering. Empirical Softw Eng 14(2):131–164. doi:10.1007/s10664-008-9102-8 Shadish WR, Cook TD, Campbell DT (2002) Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Miffl in Boston Shaw M (2002) What makes good research in software engineering? Int J STTT 4(1):1–7. doi:10.1007/s10009-002-0083-4 Siegmund N, Kolesnikov S, Kastner C, Apel S, Batory D, Rosenmuller M, Saake G (2012a) Predicting performance via automated feature-interaction detection. In: International Conference on Software Engineering Siegmund N, Rosenm¨uller M, Kuhlemann M, Kastner C, Apel S, Saake G (2012b) SPL Conqueror: Toward optimization of non-functional properties in software. Softw Quality J 20(3-4) Siegmund N, Rosenmuller M, Kastner C, Giarrusso PG, Apel S, Kolesnikov SS (2013) Scalable prediction of non-functional properties in software product lines: Footprint and memory consumption. Inf Softw Technol 55(3):491–507 Sincero J, Schroder-Preikschat W, Spinczyk O (2009) Towards tool support for the configuration of nonfunctional properties in SPLs. In: Hawaii International Conference on System Sciences (HICSS). doi:10.1109/HICSS.2009.472 Sincero J, Schroder-Preikschat W, Spinczyk O (2010) Approaching non-functional properties of software product lines: Learning from products. In: Software Engineering Conference (APSEC). doi:10.1109/APSEC.2010.26 Sinnema M, Deelstra S, Nijhuis J, Bosch J (2006) Modeling dependencies in product families with COVAMOF. In: Engineering of Computer Based Systems (ECBS) Smith CU, Williams LG (2002) Performance Solutions A Practical Guide to Creating Responsive, Scalable Software. Addison-Wesley Soltani S, Asadi M, Gasevic D, Hatala M, Bagheri E (2012) Automated planning for feature model configuration based on functional and non-functional requirements. In: Software Product Line Conference Stol KJ, Fitzgerald B (2013) Uncovering theories in software engineering SEMAT Workshop on General Theory of Software Engineering (GTSE)

Empir Software Eng Strauss A, Corbin J (1998) Basics of Qualitative Research, 2nd edn. Sage Svahnberg M, van Gurp J, Bosch J (2005) A taxononomy of variability realization techniques. Softw— Practice and Experience 35(8) Thiel S, Hein A (2002) Modelling and using product line variability in automotive systems. IEEE Softw 19(4):66–72 Thum T, Apel S, Kastner C, Schaefer I, Saake G (2014) A classification and survey of analysis strategies for software product lines. ACM Comput Surv 47(1). To appear Tun TT, Boucher Q, Classen A, Hubaux A, Heymans P (2009) Relating requirements and feature configurations: A systematic approach. In: Software Product Line Conference Urquhart C, Lehmann H, Myers MD (2010) Putting the theory back into grounded theory: guidelines for grounded theory studies in information systems. Inf Syst J 20(4):357–381. doi:10.1111/j.13652575.2009.00328.x White J, Schmidt DC, Wuchner E, Nechypurenko A (2007) Automating product-line variant selection for mobile devices. In: Software Product Line Conference White J, Dougherty B, Schmidt DC (2009) Selecting highly optimal architectural feature sets with filtered cartesian flattening. J Syst Softw 82(8) Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Conference on Evaluation and Assessment in Software Engineering Wohlin C, Prikladniki R (2013) Systematic literature reviews in software engineering. Inf Softw Technol 55(6):919–920. doi:10.1016/j.infsof.2013.02.002 Yin RK (1994) Case Study Research, 2nd edn. Sage, Thousand Oaks Yu Y, do Prado Leite JCS, Lapouchnian A, Mylopoulos J (2008) Configuring features with stakeholder goals. In: SAC

Varvana Myll¨arniemi is a researcher and teacher at the Department of Computer Science and Engineering, Aalto University, Finland. Her research interests include software architectures, quality attributes, software product lines and software engineering. Myll¨arniemi has an MSc in computer science from Helsinki University of Technology, and she is currently finalizing her PhD on the topic of quality attribute variability. Contact her at [email protected].

Empir Software Eng

Juha Savolainen is a software director at Danfoss Power Electronics, Denmark. His research interests include software engineering, software product lines, requirements engineering and software architectures. Savolainen has a PhD in computer science from Aalto University. He’s a member of the IEEE Computer Society. Contact him at [email protected].

Mikko Raatikainen is a researcher at Aalto University, Finland. His research interests include empirical research in software engineering, software architecture and variability management. Raatikainen has an MSc (Tech.) in software engineering from Helsinki University of Technology. Contact him at [email protected].

Empir Software Eng

Tomi M¨annist¨o is a professor of software engineering at the Department of Computer Science, University of Helsinki, Finland. His research interests include software architectures, variability modelling, management and evolution, configuration knowledge, and flexible requirements engineering. M¨annist¨o has a PhD in computer science from Helsinki University of Technology, currently Aalto University. He is a member of the IFIP TC2 Working Group 2.10 Software Architecture, IEEE Computer Society and ACM. Contact him at [email protected].

Publication III

Myllärniemi, Raatikainen, Männistö. Inter-organisational Approach in Rapid Software Product Family Development—A Case Study. In International Conference on Software Reuse, ICSR, Italy, pp.73–86, June 2006.

c 2006 Springer.

Reprinted with permission.

153

Inter-organisational Approach in Rapid Software Product Family Development — A Case Study Varvana Myll¨ arniemi, Mikko Raatikainen, and Tomi M¨ annist¨ o Helsinki University of Technology Software Business and Engineering Institute (SoberIT) P.O. Box 9210, 02015 TKK, Finland {varvana.myllarniemi, mikko.raatikainen, tomi.mannisto}@tkk.fi

Abstract. Software product families provide an efficient means of reuse between a set of related products. However, software product families are often solely associated with intra-organisational reuse. This paper presents a case study of Fathammer, a small company developing games for different mobile devices. Reuse at Fathammer takes place at multiple levels. The game framework and engine of Fathammer is reused by partner companies that in turn produce game assets to be reused by Fathammer while developing games for various devices. Very rapid development of games is a necessity for Fathammer, whereas maintainability of games is not important. The above characteristics in particular distinguish Fathammer from other case studies and practices usually presented in the product family literature. The results show the applicability and challenges of software product family practices in the context of multiple collaborating companies and a fast-changing domain.

1

Introduction

Software reuse is a means of enhancing the efficiency of software development. Several reuse techniques have emerged over the years, one of them being software product families. A software product family is a set of products that share a common, managed set of features [1], a common architecture and a set of reusable components [2]. Typically, the reuse that takes place in a software product family is intraorganisational. Recently, the possibility for more open family development has been identified. van der Linden et al. [3] note that some software product family organisations may cross company borders. Also the research challenges raised by the transition from closed system development towards open networks have been identified [4]. However, so far ideas rather than solid practices have been presented. Very few cases have been reported on software product family organisations that cross company borders. This paper provides insight to a setting in which inter-organisational reuse takes place within a software product family. We present a case study Fathammer, a Finnish company that produces 3D games for various mobile devices. Fathammer develops X-Forge, a game framework and engine, on top of which M. Morisio (Ed.): ICSR 2006, LNCS 4039, pp. 73–86, 2006. c Springer-Verlag Berlin Heidelberg 2006 

74

V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o

game titles are built. Game title development is outsourced to partner development studios around the world. By reusing game titles, Fathammer derives several game instances each of which is targeted for a certain device and market. Thus the reuse takes place at multiple levels and across company borders. In addition to the inter-organisational aspect, Fathammer displays a mix of characteristics that is not typically found in reported software product family case studies. The nature of the domain demands very short time-to-market and development cycles, and puts pressure towards cutting them down even more. However, tight schedules should not kill creativity. The case company is relatively small and ad hoc in its practices. The above-mentioned characteristics are in contrast to the well-known case studies of successful product families in the domain of medical, automotive and telecommunication systems [5]. Further, software product family engineering originated from embedded systems development [3]. Therefore, a successful software product family often seems to be characterised by a stable domain, long-lived products, long development cycles and mature engineering practices inherited from embedded systems engineering. Consequently, this case study indicates that software product family practices can be applied in the context of multiple collaborating companies and within a fast-changing domain. Hence this study refines the applicability of software product family practices. Further, the case study brings out challenges and issues that should be studied by the software product family research community. The paper is organised as follows. Section 2 describes the research methods. Section 3 gives an introduction to the case company. Section 4 reports the results of the study. Section 5 compares the results to related research. Section 6 discusses the results and identifies lessons learned. Section 7 discusses the validity of results. Finally, Section 8 draws conclusions and suggests future work.

2

Research Method

The goal of the research is to study state of the practice of different kinds of software product families in the industry in order to sharpen the understanding of the feasibility of the different kind of software product family engineering. The study was carried out as a qualitative descriptive case study [6] at Fathammer. We applied the CASFIS framework [7], which is a framework designed for research on industrial software product families. Fathammer was chosen to the study due to its unique mix of characteristics. We identified these characteristics when the R&D director and process manager of Fathammer gave us a brief overview of their current practices before the study. The overview also gave background information that enabled to tailor the CASFIS framework and focus the study. The data collection was based on an interview, a validation session, documentation analysis, and a review. The primary data collection method was the interview of the process manager and derivation manager of Fathammer. The interview took about three hours. The interview questions of CASFIS [8] were

Rapid Software Product Family Development

75

Fig. 1. A screenshot from one game title called Stuntcar Extreme. (Copyright Fathammer, reproduced with permission.).

slightly modified for the context of Fathammer. The interview was voice-recorded and transcribed, and notes were taken. A few months later, a validation session of roughly two hours was held, during which clarifying questions and uncertain issues were discussed. Documentation analysis covered Fathammer public and non-public documents that were identified to be relevant during the interview. Finally, the process manager reviewed this paper. The analysis followed the principles of grounded theory approach using deductive coding of data [9]. The initial results were reported in the validation session. For the final analysis, data from the validation session was added. The data was analysed using ATLAS.ti [10], which is a software tool designed for qualitative data analysis.

3

Fathammer

This section presents an overall description of Fathammer, divided according to BAPO (Business, Artifact, Process, Organisation) [11] concerns. 3.1

Business

Fathammer (www.fathammer.com) is a Finnish company that produces 3D games for various mobile devices. The domain of mobile games requires very short timeto-market and development cycles. However, tight schedules should not outweigh creativity, since games must be addictive and fun to be successful. Games are distributed either through device manufacturers, operators or game portals, which are called sales channels in this study. Fathammer has made a strategic decision to stay independent of sales channels. The approach has been multi-device from the start, which means that Fathammer provides support for all devices that are feasible technically and sales-wise. The goal of Fathammer is to produce games on top of proprietary game technology. Towards this end, Fathammer has developed a technology platform called

76

V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o

X-Forge. X-Forge is licensed globally to third party development studios. Currently there are over 80 licensees of X-Forge around the world. Over 15 game titles that use X-Forge have been shipped. 3.2

Artifact

X-Forge is a C++ game development system and a multi-device game engine. Firstly, X-Forge provides functionality common to all games, such as graphics rendering, object world and physics. Secondly, X-Forge abstracts underlying hardware and operating system from games. Currently X-Forge comprises more than 150 KLOC (thousands of lines of code). Game titles are built by reusing and extending the assets X-Forge provides. The game title architecture is largely determined by X-Forge. A large game title may include over 50 KLOC. Besides code, a game title includes graphics, sounds, and other auxiliary files. The use of X-Forge alone cannot guarantee that the resulting game title is optimal for a certain mobile device. Further, some sales channels may require their own modifications, such as adding an operator logo to the game. Finally, many localisations and languages have to be supported. Thus each game title is specialised into a number of game instances called SKUs (Stock Keeping Units). One SKU is targeted for certain device configuration and localisation settings, and is distributed through a certain sales channel. For each game title, approximately three to ten SKUs are produced—the exact number depends on market needs. 3.3

Process

The development process of Fathammer (Fig.2) is divided into X-Forge development and game development. X-Forge is developed continuously. This includes extending the common functionality, providing support for new platforms, and correcting found bugs. Only rarely functionality implemented in a certain game title is merged into X-Forge. The lower part of Fig.2 illustrates the development process for one game title and its SKUs. In the preproduction phase, the game concept and its feasibility are checked. In the production phase, the game title is developed iteratively. In the postproduction phase, SKUs are derived from the game title assets. In

Fig. 2. A coarse-grained view of the development process at Fathammer. Preproduction and production combined take approximately 4 to 6 months.

Rapid Software Product Family Development

77

some cases, the derivation consists of setting appropriate parameter values and configuration settings and re-compiling the code. Usually the derivation also requires a small amount of development or graphics design effort. Fathammer does not maintain its game titles. Once a game title, i.e., its first SKU, is released, the game title is not developed anymore, other than to produce new SKUs when viable business opportunities emerge. New SKUs can be produced even years after the initial release. In general, the evolution of X-Forge does not affect previous game releases. However, in a few exceptional cases, Fathammer has ported an old game title to a new version of X-Forge in order to produce an SKU to a new device. 3.4

Organisation

Currently the organisation of Fathammer comprises 36 employees. The organisation reflects the division of the development process. There are separate roles for those responsible of X-Forge development, preproduction, production and postproduction. The development organisation crosses company borders. Development of one game title (production phase in Fig.2) is outsourced to a partner game development studio. These partner development studios are licensees of X-Forge, and they are located around the world.

4

Results

This section reports the results from the case study analysis and describes three interesting characteristics of Fathammer. 4.1

Inter-company Collaboration

Fathammer develops its software product family in a network of collaborative companies. Four types of co-operation relationships can be identified (Fig.3). Firstly, Fathammer sells licenses to third party game development studios that wish to develop mobile games reusing X-Forge platform. However, due to Fathammer’s recent decision to concentrate on game production, licenses are currently sold only to major publishers and game development studios used for outsourcing. Secondly, Fathammer game title development is outsourced to partner game development studios, which are also licensees of X-Forge. Fathammer makes the final decisions concerning content of a game, budget and schedule of the development. A benefit of outsourced game development is that Fathammer can increase volume of its software product family without increasing its own size. Fathammer can concentrate on its core capabilities and yet produce a large game portfolio. Another benefit is that by outsourcing game development, Fathammer expands the network of licensed developers, and thus promotes the use of X-Forge as a game technology. However, Fathammer has met some major difficulties in

78

V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o

Fig. 3. Collaboration with Fathammer and other companies

outsourcing game title development. In particular, outsourcing development of software that supports variability has proved to be harder than expected. Thirdly, device manufacturers form a major source of constraints and requirements, since the hardware and operating system underneath affects mobile games considerably. However, Fathammer cannot affect the properties of devices, but the software has to be adapted to device peculiarities. Sometimes the implementation of a device may contain a few surprises, causing difficulties in the game development. Fourthly, the games are distributed through sales channels. These sales channels may require a customised version of the game, which essentially means producing a new SKU. As an example, an operator may want to distribute a game that portraits operator logos. 4.2

Hierarchical Software Product Family

Fathammer software product family artifacts have been organised hierarchically (Fig.4). There are two levels of reuse and three levels of artifacts in the hierarchy: game titles are built reusing X-Forge, while game titles provide means for rapid SKU production. The software product family of Fathammer is hierarchical also from the organisational point of view. Further, this hierarchy crosses company borders. X-Forge is developed in-house. Then, X-Forge is reused by partner game development

Rapid Software Product Family Development

79

Fig. 4. Fathammer software product family is hierarchical in nature

studios during the game title development. Finally, Fathammer reuses the game title assets during SKU derivation. The hierarchical model of software product family engineering has brought several benefits to Fathammer. Firstly, the amount of variability Fathammer has to cope with is considerable. A hierarchical model eases the overall variability handling, provides a separation of concerns and eases derivation for each level. Secondly, the hierarchical model is easily combined with the geographical distribution that stems from the outsourced game title development. This diminishes the inevitable overhead of distributed development. Thirdly, a hierarchy eases managing entities with different life cycles. The long-lived and more stable part, X-Forge, is separated into its own layer of software product family, which can be maintained and developed more independently. Short-lived game titles and SKUs are developed separately. However, a hierarchical approach brings also challenges. A hierarchy makes the organisation and development structure more complex, and weakens the link from the lower level to the upper level entities. However, since Fathammer is such a small organisation, these drawbacks do not have much overall impact. 4.3

Challenges of Rapid Variant Production

The main reason for Fathammer to reuse software is to enable rapid production of variants, since both game titles and SKUs must be produced in a short time frame. This can be achieved by efficient variability management and implementation. The devices on which Fathammer builds its games vary drastically, ranging from cell phones and portable game consoles to PDAs and Pocket PCs. To cope with this variability, Fathammer has developed X-Forge as a multi-device platform that abstracts hardware away. However, X-Forge alone does not suffice, since some of the hardware-related variability must be taken care of during game title development and SKU derivation. As an example, game controls should be easy to use, regardless of the input controls of the device. Further, game graphics and menus have to adopt to varying display properties, such as resolution and orientation. The postponement of these issues slows SKU production down remarkably.

80

V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o

The varying hardware also causes quality attribute variability. The differences in the device computing resources might be enormous, and thus the software may need to vary its behaviour in order to provide the best perceived quality on all devices. In effect, this requires varying the performance level and memory consumption of software by tuning the game. Fathammer handles this variability in a straightforward manner: varying quality attributes are transformed into varying functionality. Since graphics form a major factor in the consumption of computing and memory resources, it is easy to tune the performance and memory consumption level by, e.g., changing the number of drawn polygons, the drawing algorithms, materials and applied textures. To complicate the situation even further, game titles and SKUs are very shortlived; in fact they are not maintained at all after release. We call this kind of approach as a disposable software product family. A disposable software product family sets certain requirements for variability implementation and management. It is not feasible to build extensive variability mechanisms into game titles, since those mechanisms are used only for deriving a number of SKUs. The challenge is, how to implement variability mechanisms to game titles in a light-weight yet effective manner? Further, building necessary variability mechanisms must be balanced with outsourcing. The initial approach was to outsource software without any variability. But when a game had not been designed to support variability, variability was really difficult to add afterwards. In other words, this approach accelerated the development of the first SKU, but delayed the development of further variants. Therefore, Fathammer decided to explicitly specify some variability to outsourced development. However, it would have been infeasible to require implementation for all possible variability; this would have slowed down the game production too much. Therefore, verification configurations model was introduced. The game developers are given a number of separate device configurations that represent the range of existing mobile devices. These configurations specify the most critical aspects of the device, such as screen resolution and memory size. At the moment, the number of verification configurations used is three. A game is developed from the start to support these configurations. This enforces a game developer to identify variation points, i.e., the locations in the artifacts where the configurations differ from each other, and implement variation mechanisms for them. To conclude, extending existing variation points with new variants is easier than creating new variation points from scratch.

5

Related Research

This section discusses related research, compares it with Fathammer practices, and identifies possible mismatch. 5.1

Intra-company and Inter-company Collaboration

Reported case studies of software product families tend to operate in closed, centralised structures of development [12]. Thus the notion of software product

Rapid Software Product Family Development

81

family development being comprised of networks of external interoperating component suppliers has not materialised as anticipated [12]. The concept of open networks is presented in product family evaluation framework [3]. BAPO-O, the organisation dimension of the framework, promotes level 4 as inter-company model, and level 5 as open business model. At level 4, software product family engineering takes place between several companies, while at level 5 the business is open for everyone who sees the advantage. However, to the best of our knowledge, there are no reported examples of level 5 approaches. Even if the collaboration is intra-organisational, a software product family approach within a large company with separate divisions may raise conflicts that hinder the promotion of common interest [13]. Geographical distribution can bring its own challenges to software product family practices. Nokia has tried to answer this organisational challenge by organising its units to be aligned with product family development in order to minimise the overhead of distributed development [14]. Challenges of outsourcing and global software development [15] have been to some extent covered in software engineering research. However, outsourcing in the context of software product families has gained only little research. To the best of our knowledge, there is no research of outsourcing development of software that should support variability. In comparison with related research, Fathammer software product family is relatively open. Four kinds of collaborations shape the software product family practices of Fathammer, and one of these collaborations involves global outsourcing. 5.2

Hierarchical Software Product Family

The hierarchical model of software product families is argued to be primarily suitable for large organisations with long-lived products [2, 16]. A considerable maturity with respect to development process and management is required [16]. According to Bosch [16], systems with relatively stable requirement sets and long lifetimes are substantially more suitable than products whose requirements change frequently and drastically, e.g., due to new technological possibilities. Fathammer seems to be almost the exact opposite of the most optimal environment described in [16]. Despite this, Fathammer has succeeded in creating a hierarchical software product family model that suits its needs very well. However, there are a couple of success factors mentioned in [16] that apply to Fathammer case. Firstly, the geographical distribution that is due to outsourcing is easily combined with the hierarchical model of development. Secondly, the hierarchical model is especially suitable to situations where the amount of variability is large. A drawback of the model is that agile reactions to changed requirements are difficult to make [16]. If an asset on the top level of hierarchy changes, the change must be propagated down all levels of hierarchy. However, this is not an issue for Fathammer, since game titles are not evolved after the release. Even if something changes at the upper level of hierarchy, i.e. in X-Forge, there is usually no need to accommodate existing game titles to these changes.

82

5.3

V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o

Challenges of Rapid Variant Production

For Fathammer, one of the biggest obstacles to rapid variant production is caused by varying hardware. Hardware enforced variability is software variability that depends on or is presupposed by the hardware configuration [17]. However, literature on variability most often refers to hardware neutral variability, assuming that variability originates purely from software [17]. Similar problems of hardware variability has been encountered at Nokia [14]. This is called hardware challenge, main sources of which consist of keys, display and scrolling, sound and backwards compatibility. Although in many respects Fathammer faces the same problems as Nokia, there are a few significant differences. Firstly, Nokia has control over the hardware and operating systems of the devices, whereas Fathammer does not. Secondly, Fathammer has to operate on many device manufacturers with many devices, which amplifies the differences between variants. Hardware also creates a need for quality attribute variability at Fathammer. Although quality attributes have been studied quite extensively, surprisingly little research on quality attribute variability has been carried out. Only a few studies mention this phenomenon [18, 19]. However, it is possible that varying quality attributes are more difficult to handle than varying functionality. Unlike functionality, many qualities are architectural in nature [20]. Therefore, changing a quality attribute may require system-wide changes in the architecture. Svahnberg et al. [21] point out that variability should not be introduced too early during the development, since the cost of managing and tracking variants throughout the variability implementation process may be too high. Since the short life span of Fathammer game titles require light-weight variability handling, the cost of early introduction would be even more severe. However, the difficulties with outsourced development indicate that variability shouldn’t be introduced too late either. 5.4

Related Case Studies

One of the early pioneers of software product family development has been Nokia [14]. Many of the challenges faced by Nokia are similar to the ones faced by Fathammer. However, these companies are vastly different. Nokia is a huge organisation with very solid practices. Fathammer products are short-lived, and Fathammer has less control over their development. However, it is interesting to see that also Nokia regards openness to be vital for future success [4]. There are case studies of small to medium sized companies that have applied software product families successfully: MarketMaker [22] and Salion [23]. However, Fathammer operates in a domain that seems to require considerably more flexibility and shorter life cycles. A recent study presented how Java mobile games could be re-engineered using aspect oriented techniques [24]. However, this study was not an industrial case study. It merely showed that it is technically feasible to construct a software product family from Java mobile games. In contrast, our case shows the

Rapid Software Product Family Development

83

feasibility, technically, organisationally and business-wise, of creating a software product family in such a domain.

6

Discussion and Lessons Learned

Based on the results of the case study (Section 4) and the comparison with related research (Section 5), we identify lessons that can be learned from Fathammer case. Software product family engineering can be applied to small companies without matured engineering practices and to domains that are fast-changing and require short development cycles. Fathammer has successfully developed its products in a software product family. The benefits of this model of development have not been directly measured, but one indicator of the success is that Fathammer is currently building a similar game framework for Java mobile games. The drawback of this approach is that Fathammer games are not optimised for certain devices. However, Fathammer has made the strategic decision to serve many instead of focusing on a few devices only. A software product family development can cross company borders. Game title development is outsourced to geographically distributed partners. Outsourcing is combined with selling licenses to X-Forge platform. To cope with the overhead involved in distributed development, Fathammer has organised its reuse hierarchy (see Fig.4) to match the outsourced development. The inter-organisational approach has been feasible even in a relatively small software product family. In fact, outsourcing is seen as a way of increasing volume without increasing company size. Outsourcing development of software that is reusable and variable brings new challenges. The challenge lies in specifying the required variability for outsourced software. Extending existing variation points with new variants is easier than creating new variation points from scratch. It is not necessary to specify all possible variants, but it is essential to ensure that the outsourced software implements some mechanisms for all variation points. To address this issue, Fathammer is applying verification configurations model to its development. Hierarchical software product families can be applied to small companies operating on fast-changing domains. Fathammer has regarded the hierarchical model to be very well suitable for organising its software product family engineering. The affecting factors were the need for distributed development, the large amount of variability, and the differences in the life spans of X-Forge platform, game titles and SKUs. The hierarchical model provides a separation of concerns that suits these needs well. Hardware enforced variability can be a challenge to rapid variant production. Although X-Forge as a multi-device platform abstracts hardware away, there are inevitably some device-related issues that set challenges to rapid game title and SKU production. This is especially apparent when dealing with many

84

V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o

device manufactures whose products differ from each other considerably. To conclude, a multi-device platform may not be enough to cover all hardware enforced variability. Quality attribute variability can be resolved by transforming it to varying functionality. 3D games are very performance-intensive. When this kind of software has to be adapted to varying hardware, it is necessary to vary the performance level of software. Fathammer does this by transforming varying quality to varying functionality. However, there is one factor, namely graphics, which largely determines the quality. Without such a factor, it is possible that architectural means are needed. Further research on this topic is required. Short life span of reusable assets requires light-weight variability mechanisms. A software product family with a very short life span requires light-weight yet effective ways of realising and implementing variability mechanisms. One Fathammer game title forms a disposable software product family. Therefore, effective variability implementation is far more important than effective variability management.

7

Validity and Reliability

We tried to ensure validity by using multiple sources of data i.e. the interviews, document analysis, and validation session; allowing an interviewee at Fathammer to review the report; and establishing a chain of evidence i.e. stored data and using as accurate data as possible, such as transcripts. The study contains also threats to validity: data collection took relatively short time although use of the methods was relatively efficient since the researchers were familiar with the research methods, had used the method in several studies before, and the method, in particular the interview questions, was tailored to Fathammer on a basis of initial understanding; only two persons were interviewed; no game developer partners were interviewed; and the study lacked longitudinal observations in which the Fathammer would have been observed over a period of time. Reliability was improved by following the publicly documented CASFIS framework [7]. We aimed to show that software product family practices can be applied in several different contexts. Consequently, research results on software product families may need refinement for applicability. While similar practices as at Fathammer could be applied with similar success, there are several context factors that should be taken into account. However, this requires further research.

8

Conclusions

We have presented a case study of Fathammer, a company developing a software product family of 3D mobile games. The domain requires flexibility, creativity and short time-to-market; yet it has been feasible to build a software product family on such a domain.

Rapid Software Product Family Development

85

Fathammer case exemplifies the following. Firstly, software product family organisations can cross company borders. Secondly, software product family development can be outsourced, but this kind of outsourcing raises new challenges related to variability implementation. Thirdly, even small, immature companies requiring flexibility can develop a hierarchical software product family. Fourthly, a multi-device platform is not always enough per se, but hardware enforced variability needs to be taken care of during variant production, thus delaying the release. Finally, if the artifacts of the software product family are short-lived, light-weight variability management is required. Although software product family development has helped Fathammer to produce games more efficiently, several challenges remain to be solved. Therefore, areas that need further research are identified. Firstly, outsourcing combined to software product families requires further research. What are the situations in which outsourcing is applicable? How can one successfully outsource development of software that should support variability? Secondly, hardware enforced variability has not gained much research attention. At Fathammer, varying hardware creates a need for quality attribute variability. If quality attribute variability cannot be easily transformed into functional variability, there is a need for other, possibly architectural means.

Acknowledgements The authors acknowledge Ville Vat´en and others at Fathammer who participated and aided our case study. The financial support of the 100-year Foundation of Technology Industries of Finland is acknowledged.

References 1. Clements, P., Northrop, L.: Software Product Lines—Practices and Patterns. Addison–Wesley (2001) 2. Bosch, J.: Design and Use of Software Architectures: Adapting and Evolving a Product-Line Approach. Addison–Wesley (2000) 3. van der Linden, F., Bosch, J., Kamsties, E., K¨ ans¨ al¨ a, K., Obbink, H.: Software product family evaluation. In: Proc. of Software Product Line Conference. (2004) 4. Bosch, J.: Software product families in Nokia. In: Proc. of Software Product Line Conference. (2005) 5. Cohen, S.: Product line state of the practice report. Technical Report CMU/SEI2002-TN-017, Software Engineering Institute (2002) 6. Yin, R.K.: Case Study Research. 2nd edn. Sage: Thousand Oaks (1994) 7. Raatikainen, M., M¨ annist¨ o, T., Soininen, T.: CASFIS–approach for studying software product families in industry. In: Proc. of the 2nd Groningen Workshop on Software Variability Management. (2004) 8. Raatikainen, M., M¨ annist¨ o, T., Soininen, T.: Case study questions for studying industrial software product families. Technical Report HUT-SoberIT-C10, Helsinki University of Technology (2004) 9. Strauss, A., Corbin, J.: Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Newbury Park, CA: Sage Publications (1990)

86

V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o

10. ATLAS.ti: User’s manual and reference, version 4.2. (2004) 11. van der Linden, F.: Software product families in Europe: The Esaps and Cafe projects. IEEE Software 19(4) (2002) 41–49 12. Mannion, M.: Organizing for software product line engineering. In: Proc. of Workshop on Software Technology and Engineering Practice. (2002) 13. van Ommering, R., Bosch, J.: Widening the scope of software product lines—from variation to composition. In: Proc. of Software Product Line Conference. (2002) 14. Maccari, A., Heie, A.: Managing infinite variability in mobile terminal software. Software—Practice and Experience 35(6) (2005) 513–537 15. Herbsleb, J., Moitra, D.: Global software development. IEEE Software 18(2) (2001) 16–20 16. Bosch, J.: Software product lines: Organizational alternatives. In: Proc. of International Conference on Software Engineering. (2001) 17. Jaring, M., , Bosch, J.: A taxonomy and hierarchy of variability dependencies in software product family engineering. In: Proc. of Computer Software and Applications Conference. (2004) 18. Halmans, G., Pohl, K.: Communicating the variability of a software-product family to customers. Software and Systems Modeling 2(1) (2003) 15–36 19. Hallsteinsen, S., Fægri, T.E., Syrstad, M.: Patterns in product family architecture design. In: Proc. of Workshop on Software Product-Family Engineering. (2003) 20. Bass, L., Clements, P., Kazman, R.: Software Architecture in Practice. Addison– Wesley (1998) 21. Svahnberg, M., van Gurp, J., Bosch, J.: A taxononomy of variability realization techniques. Software—Practice and Experience 35(8) (2005) 705–754 22. Gacek, C., Knauber, P., Schmid, K., Clements, P.: Successful software product line development in a small organisation. Technical Report IESE-Report No. 013.01/E, Fraunhofer IESE (2001) 23. Clements, P., Northrop, L.: Salion, inc.: A software product line case study. Technical Report CMU/SEI-2002-TR-038, Software Engineering Institute (2002) 24. Alves, V., Matos, P.J., Cole, L., Borba, P., Ramalho, G.: Extracting and evolving mobile games product lines. In: Proc. of Software Product Line Conference. (2005)

Publication IV

Myllärniemi, Raatikainen, Männistö. Representing and Configuring Security Variability in Software Product Lines. In Conference on the Quality of Software Architectures (QoSA), Canada, pp.1–10, May 2015.

c 2015 ACM.

Reprinted with permission.

169

Representing and Configuring Security Variability in Software Product Lines Varvana Myllärniemi

Aalto University, Finland [email protected]

Mikko Raatikainen

Aalto University, Finland [email protected]

ABSTRACT

Tomi Männistö

University of Helsinki, Finland [email protected]

assets with commonality and variability whereas application engineering reuses these assets to develop product instances in which all variability has been resolved. The amount of development effort between domain and application engineering varies [7]: in configurable software product lines, the application engineering becomes a configuration task, which requires very little or no implementation effort [7], and is supported by automated tools herein called configurators. Configurators are needed particularly when the variability has grown large; there are complex dependencies between the variants; or configuration task is not done by the product line engineers. Security is of utmost importance in many systems, for example, a web shop has to guarantee the confidentiality and integrity of its customer information. To complicate matters in SPLs, there may be a need to purposefully vary security: security variability may be introduced to resolve the trade-offs between security and other quality attributes, for example, to vary the encryption to mitigate the increase in the response time [3]. Also, differences in the legislation may restrict the available security and privacy options [41]. Security variability in SPLs is addressed in a few studies that mostly focus on requirements [21] and goal models [13]. However, variability must be managed also in the design: security solutions, such as authentication, authorization and input validation, tend to crosscut both structures and views in the architecture [31, 22]. When managing both requirement-level and design-level variability, the configuration task can be supported from the customer needs to the architecture and implementation; the dependencies between the selections can be managed to configure a consistent and complete product; and the available selections on security and functionality can be presented in a manner that enables even a non-technical person to perform the configuration task. As a concrete example, the web shop owner or administrator can configure the web shop herself by selecting the desired security and functionality, and the configurator finds the architecture that is consistent with those selections. We aim to study how security variability can be represented and configured in software products lines. The research questions are as follows: RQ1: How to represent the differences in security so that the security variants can be distinguished to customers? RQ2: How to represent the design of functional and security variability in the product line architecture? RQ3: How to configure consistent product variants to meet given security and functional needs?

In a software product line, security may need to be varied. Consequently, security variability must be managed both from the customer and product line architecture point of view. We utilize design science to build an artifact and a generalized design theory for representing and configuring security and functional variability from the requirements to the architecture in a configurable software product line. An open source web shop product line, Magento, is used as a case example to instantiate and evaluate the contribution. The results indicate that security variability can be represented and distinguished as countermeasures; and that a configurator tool is able to find consistent products as stable models of answer set programs.

Categories and Subject Descriptors D.2.13 [Software Engineering]: Reusable Software

Keywords Security; Variability; Software product line; Software architecture;

1. INTRODUCTION A software product line (SPL) [6, 8] enables the efficient development of a set of varying but related products. A SPL must be able to efficiently handle commonality and variability. Variability represents the planned differences between the products, that is, variability is the ability of a system to be efficiently extended, changed, customized or configured for use [39]. Variability manifests itself in many different levels [40]: in the requirements, in the architecture, and in the implementation. To represent externally visible variability, feature modeling [20] has become de facto standard in the research community; however, variability in the product line architecture must also be managed [40]. SPL engineering makes a conceptual distinction between domain and application engineering. Domain engineering develops reusable Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. QoSA’15, May 4–8, 2015, Montréal, QC, Canada. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-3470-9/15/05 ...$15.00. http://dx.doi.org/10.1145/2737182.2737183.

1

The research was conducted following the design science methodology [16, 14]. We built artifacts and a design theory as the generalization of the artifacts [14]. The design theory includes modeling conceptualization and the principles of supporting the configuration task with a configurator tool operating on answer set programs [34]. As the concrete artifacts, we built a prototype implementation of the configurator tool; we also represented and configured a case example, Magento, which is a configurable web shop product line. By selecting a commercially successful case example, we were able to both evaluate the theory and argue the relevance of the artifact [16]. Compared to our previous work, KumbangSec is based on Kumbang [2, 25], which is a conceptualization and a tool set for modeling and configuring variability through functional features and components; this study extends Kumbang with security-specific modeling concepts. Preliminary results of KumbangSec has been reported earlier [23, 24, 26]; however, this paper presents a completely revised conceptualization. As a novel contribution to all previous work, this study describes the design theory and principles of supporting the configuration task, and provides a case study that validates the artifact and motivates its industrial relevance. The rest of the paper is organized as follows. Section 2 outlines previous work on security and security variability. Section 3 presents the method and describes the Magento case. Section 4 describes the design theory and the instantiated artifacts. Section 5 evaluates the testable propositions of the theory. Section 6 discusses and Section 7 draws conclusions.

2.

is a technique that meets or opposes a threat, a vulnerability, or an attack [17]. Countermeasures can be defined at different levels of abstraction [9]. At the requirement level, countermeasures correspond to the requirements on preventing, detecting or reacting to security incidents [12]. Similarly, the security use cases that are added to mitigate misuse cases [35] are one form of requirements-level countermeasures. At the design level, countermeasures correspond to security tactics and security patterns. A security tactic is a design decision that influences how the system responds to a security attack, and they can be classified into resisting attacks, detecting attacks, or recovering from an attack [5]. A security pattern packages one or more security tactics in a concrete manner [5]; example security patterns include Authenticator, Defense in Depth, and Policy Enforcement Point [15]. The security variability in SPLs has been studied to some degree. Firstly, some studies address quality attribute variability in general and utilize an example of security [27]. In particular, studies about goal and softgoal variability in SPLs often use security as an example. Softgoals, such as User authenticity, represent system intentions the satisfaction of which cannot be unambiguously evaluated [13]. In contrast, goals can be operationalized into varying tasks, for example, Login into alternative tasks Fingerprint and Password [13]. Thereafter, the qualitative impact of tasks on softgoals can be captured, for example, Fingerprint is good to assure User authenticity [13]. However, goal models focus on the problem space, and do not address the design or solution space. Secondly, there are studies that specifically address security variability in SPLs. A process for SPL security requirements definition is proposed [21]: the varying security requirements are related to Common Criteria [18] concepts, such as threats, assets, countermeasures, and objectives. However, the process is focused on the requirements engineering, and omits how the varying security requirements are realized in the design. A conceptual model of SPL architectures with varying security requirements has been proposed [10]. Risks are assessed through threats, assets, vulnerabilities, and unwanted incidents; product variants use countermeasures to mitigate risks; and scenarios support the decision process [10]. Complementary to our work, the model [10] focuses on the architecture design activities: the conceptual model [10] can be used to elicit, define and design varying countermeasures, which can then be represented and configured with our contribution. There exists work that focuses specifically on privacy variability in web personalization [41]. Due to varying user preferences and differences in the privacy legislation, the web site personalization methods may need to be varied. The user’s preferences and country are represented and composed to enable dynamic configuration of web personalization. In component-based software engineering, there have been attempts to predict security from the properties of components, but the challenge has been to come up with a suitable component-level property. One such proposal is the use of a vulnerability index [32], which is a probability that the vulnerabilities of a component are exposed in a single execution. However, it requires considerable effort to enumerate all vulnerabilities, and construct, execute and measure all corresponding attacks.

BACKGROUND

Security is the capability of the software to protect information and data so that unauthorized persons or systems cannot read or modify them and authorized persons or systems are not denied access to them [19]. Further, security can be characterized through assets, threats, countermeasures, and vulnerabilities [18]. Security is about protecting assets from threats: assets are sensitive information or resources [18]. A threat exploits vulnerability in the software and materializes as an attack [18]. Countermeasures are imposed in the software to reduce vulnerabilities [18]. Given the complexity of the software security definition, it is not easy to elicit or define security requirements [9, 11]. In particular, care must be taken to avoid making architectural decisions prematurely [11]. A comparison of methods for defining security requirements is presented [9]; examples include misuse cases [1] and Common Criteria [18]. The definition of the security requirements and the design are heavily intertwined. On one hand, security requirements, including high-level security goals and system level specifications, need to be realized in the design. On the other hand, security requirement definition relies on having the design in place: without information about the intended solutions, the threat analysis cannot fully identify possible targets of an attack [9]. Only after all kinds of requirements have been fixed, threats against assets can be identified and countermeasures be designed [9]. From the design point of view, countermeasures, also known as security controls or mechanisms [9], are the link between security requirements and architecture. A countermeasure

2

3.

RESEARCH METHOD

edition that provides the deployment environment for the customer. Magento architecture follows a modular, configurationbased MVC (Model-View-Controller) style; the organization is somewhat complex to allow separation of MVC concerns, configurability and extensions. Firstly, the static code structure is organized along Magento modules that implement certain functionality (e.g., module Payment): each module is mapped to a certain source code directory. Secondly, the runtime structure is organized with a variant of MVC; each runtime element (e.g., Model ) is composed of a number of module-specific runtime elements (e.g., Payment Model ). This also means that all modules have to implement their own versions of models, views, and controllers. Thirdly, to enable configuration and customization, Magento utilizes several levels of configuration files: a module-specific file config.xml is used to configure the functionality of each module, whereas file modules.xml indicates the modules and the dependencies in a certain Magento configuration. Overall, Magento can be considered to be a variabilityrich, highly configurable product line. Magento web shops can be configured and tailored in three ways: core modules can be configured; external community modules can be purchased from the marketplace; and own custom modules can be developed from scratch. Even the core variability is quite large: there are in total 81 core modules, each of which contain several configuration options. Magento can be configured when setting up the web shop and reconfigured during operation, for example, to add new payment methods. Typically, the configuration activity is performed by a technical customer, a web shop administrator, or even a consultant that sells services for web shop hosting. For a hosted Magento web shop, the configuration takes place via a web-based Admin Panel. The Admin Panel options more or less directly correspond to the config.xml options in each module; the correct usage of the options is then explained extensively in Magento documentation. At the moment, Admin Panel does not check any inconsistencies between the selections. Most of the Admin Panel options are related to configuring functional variability; only a handful of the options configure web shop security. Additional security features are available as external community modules.

Design science as a research paradigm aims to solve realworld problems in using, managing or implementing software systems [16]. This is achieved by building and evaluating an artifact [16] and by contributing to the design theory [14]. The relevance in design science is ensured by focusing on solving business needs that are either heretofore unsolved or solving them in a more effective way, and by evaluating the utility of the artifact against the needs of the application environment [16]. The rigor in design science is ensured by utilizing existing scientific knowledge [16] and by generalizing knowledge from the concrete artifacts into design theories [14]. A design theory lays out the generalized constructs, relations and scope of the design knowledge codified in and needed to instantiate the concrete artifact [14]. Following the design science paradigm, we built the theory of the artifact and the concrete artifacts (Table 1). The theory of the artifact included a modeling conceptualization and several principles. The concrete artifacts included a prototype implementation of the configurator tool and the concrete configuration model and configuration tasks for the case example, thus serving as expository instantiations of the theory [14]. A commercially successful open source web shop, Magento, served as the case example. The data needed in the case study included the following: Magento source code1 ; Magento documentation2 ; and the demo version of Magento Admin Panel. The modeling, configuration and evaluation tasks were performed by the authors. The study was conducted following the steps in [30]. Firstly, the problem was identified and motivated based on the existing scientific knowledge (Section 2). Also the example case as an industrial product line served to motivate the business needs. As a result, the theory purpose and scope as well as the justificatory knowledge were identified (Table 1). Secondly, the objectives of the solution were set in the form of research questions. Thereafter, the research questions were formulated into testable propositions [14] that characterize the effect of the artifact (Table 1). The testable propositions were formulated in the suggested form ”if you want to achieve Y in situation Z, the use of X helps” [14]. Thirdly, the theory of the artifact and the concrete artifacts were designed and developed iteratively. Fourthly, the evaluation was performed against the testable propositions (Table 1). The evaluation methods utilized the feasibility of the instantiation and informed arguments about the concepts (cf., [16, 14]). Also the applicability to a case example that represented a ”slice of life” [33] as an industrial configurable product line was used in the evaluation. Finally, the results were communicated through this and earlier publications.

3.1

4.

The main contribution of this study is the theory and the concrete artifacts [14] for representing and configuring security and functional variability in configurable software product lines (Table 1).

4.1

Case Description

2

Representing and Distinguishing Security Variants as Countermeasures

Research question RQ1 addressed the representation of security variability to distinguish the variants to customers. Qualitative ”low, medium, high security” levels are sometimes used [27], but this is too coarse for most purposes. The difficulty of defining security requirements through assets, threats, vulnerabilities and countermeasures was discussed in Section 2. However, representing security variability from the viewpoint of all security concepts (cf., [21]) would become too complicated in the configuration task. Therefore, we propose that, for the purpose of distinguishing and selecting among the product variants, security variability is modeled as countermeasures (Table 1). A counter-

To show the relevance of our artifact [16] and to instantiate our proposed design theory into a concrete artifact [14], we utilized a case example of Magento. Magento is an opensource framework and a configurable product line that can be used to create and host different kinds of web shops. Magento is currently operated by Magento Inc. and is used by many commercially successful web shop sites. Magento is offered in several versions, including Enterprise Edition, open-sourced Community Edition and hosted Magento Go 1

RESULTS

https://github.com/magento/ http://www.magentocommerce.com/knowledge-base/

3

Theory purpose

The theory aims at representing and configuring software product lines with security and functional variability. Theory The theory is applicable to configurable softscope ware product lines with varying functional and security requirements and composable or parameterizable architectural entities. Constructs The principle of using countermeasures to repand prin- resent and distinguish security variants to cusciples tomers (Section 4.1). KumbangSec modeling conceptualization; the principle of separating the concepts for configuration model and configuration (Section 4.2). The principles of building and using a configurator operating on stable model semantics; the principle of separating configuration knowledge from configurator implementation; the principles of translating the modeling concepts into answer set programs (Section 4.3). Testable In the situation described by the theory scope: propositions P1 for RQ1: to represent and distinguish security variants to customers, the use of countermeasures helps; P2 for RQ2: to represent the design of security and functional variability, the KumbangSec conceptualization helps; P3 for RQ3: to configure consistent products to meet given security and functional needs, the KumbangSec configurator helps. Justificatory Countermeasures as a characterization of softknowlware security [18]. edge Kumbang [2, 25]. Using stable models and answer set programs for product configuration [34, 36, 37]. Expository KumbangSec configurator. artifact Configuration model for Magento. instantia- Configuration task with Magento configuration tions model. Instantiated A configurator can be implemented in different artifact ways to implement the principles of the theory. mutabilDifferent configuration models for different ity product lines can be instantiated within the limits of the modeling conceptualization.

The different reasons to vary security in a SPL can all be handled with varying countermeasures. Firstly, security variability may be due to the need to balance trade-offs: countermeasures that enhance security may impact other quality attributes negatively. In Magento, using encryption for all authenticated users instead of just at checkout may impose a performance penalty for casual web shop browsing. By varying the encryption countermeasure, Magento web shops can be configured to have better security but lower performance and vice versa. As another example, using two-way authentication for the web shop administrators, that is, requiring both passwords and identification tokens generated by Google Authenticator, increases security but decreases user efficiency. Secondly, countermeasures often impose additional development or operation costs; thus countermeasure variability can be introduced to balance the trade-off between security and cost. The use of encryption requires a SSL (Secure Socket Layer) certificate from the Magento web shop owner, which costs up to several hundreds of dollars per year; the installation of the SSL certificate may impose additional costs. Thirdly, security variability may be due to the variability of assets, threats, attacks, vulnerabilities. For example, assets may vary: one web shop instance handles and stores credit card information, while the other does not. Further, different vulnerabilities may exist in the products, depending on the software components used. Also threats, attacks, and their probabilities may vary: code injection attacks become less relevant if the product variant does not allow users to enter data to web site forms. All this variability can be addressed with countermeasure variability. By the very definition, varying threats, vulnerabilities, and attacks can be met or opposed with varying countermeasures. Varying assets can be protected with varying countermeasures: for example, the variability of whether to handle and store credit card information is related to the variability of using encryption in the checkout process. Finally, there may be differences in the user identification needs that call for different authentication mechanisms. For example, using the needed Google application for the twoway authentication may not be possible or preferable for all web shop administrators.

4.2

Table 1: The design theory [14] codified in the instantiated artifacts.

Modeling Conceptualization

KumbangSec modeling conceptualization (Figure 1) defines the necessary domain concepts to represent security and functional variability in a software product line. Similarly to traditional product configuration [37], we adopt the principle of separating domain engineering models, i.e., models with variability, from the application engineering models where all variability has been bound. Therefore, our modeling conceptualization distinguishes the configuration model, which contains types, and the configuration, which contains instances of types (Figure 1). A KumbangSec configuration describes one particular software product variant, while a KumbangSec configuration model describes the structure, the variability, and the rules upon which valid product variants can be constructed from the product line. The modeling conceptualization defines the elements and their relations in the models. In addition, a textual modeling language is provided so that the configuration models can be defined in a machine-readable form. Finally, a visual

measure in KumbangSec is a requirement or specification of an action or technique that opposes a threat, an attack or a vulnerability by preventing, detecting or reacting to it [17, 12, 9]. That is, KumbangSec countermeasures specify what the system must do to prevent harm to assets. For example, encrypting communication during web shop checkout is one countermeasure in Magento. Countermeasures are more concrete in the sense that they are properties of the product, compared to threats and attacks, which are properties of the outside world. Further, they are easier to recognize than vulnerabilities, since they are purposefully designed in the product. Countermeasures often lend themselves for variation quite naturally: a common example is to vary encryption strength [38].

4

KumbangSec Configuration Model

Part Definition types[1..*] : Composable Type similarity[1] : Similarity

WebShopCountermeasures

KumbangSec Type

Composable Type

Attribute Type

Interface Type

parts[*] : Part Definition attributes[*] : Attribute Definition constraints[*] : Constraint

values[1..*] : AttributeValue

methods[*] : Method

encryption sessionValidation EncryptCommunication

Attribute Definition

Component Type interfaces[*] : Interface Definition

BrowserSessionValidation protectionLevel : { nothing, medium, high, custom }

adminAuth encryptCustomers

Interface Definition Feature Type implementationConstraints[*] : Constraint

NoEncryptedCommunication

encryptAdmin: { yes, no }

type[1] : Attribute Type

types[1..*] : Interface Type isOptional[1] : Boolean direction[1] : Direction

customSettings [0...1] AdminAuthentication

CustomBrowserSessionValidation

EncryptAfterAuthentication EncryptOnlyInCheckout

checkRequestAgainstSessionIP : { yes, no } checkRequestAgainstSessionBrowser : { yes, no } useSessionIDinURL : { yes, no }

Countermeasure Type description[0..1] : String

{value(protectionLevel)=custom present(customSettings); value(protectionLevel)!=custom not present(customSettings);}

auth2Way [0...1]

restrictAccess[0...1]

Domain engineering modelling concepts

RestrictAccessToAllowedIP

TwoWayAdminAuthentication

Application engineering modelling concepts KumbangSec Configuration

KumbangSec Instance

Composable Instance parts[*] : Composable Instance

Attribute Instance value : AttributeValue

Component Instance

Feature Instance

countermeasure type EncryptCommunication { contains (EncryptAfterAuthentication,EncryptOnlyInCheckout) encryptCustomers; attributes Boolean encryptAdmin; implementation value(component-root.static.core.core.config, web_secure_base_url) = https; has_instances(SSLCertificate); value(encryptAdmin) = yes => value(component-root.static.core.core.config, web_secure_use_in_adminhtml) = 1; value(encryptAdmin) = no => value(component-root.static.core.core.config, web_secure_use_in_adminhtml) = 0; description ”Encrypts the traffic between the browser and the server.” ”Requires an installed and authorized SSL certificate, which may impose additional operational costs.” ”May impact the response time of the page requests negatively.” ”Attribute encryptAdmin indicates whether encryption is used when the administrator is logged in.” }

interfaces[*] : Interface Instance

Interface Instance

Connection

direction[1] : Direction connections[*] : Connection

provided : Interface Instance required : Interface Instance

Countermeasure Instance

Figure 1: Design theory: KumbangSec conceptualization. Illustrates the principle of distinguishing between the configuration model and configuration.

{value(keepShoppingCartPersistent) = yes => value(allowGuestCheckout) = no;}

WebShopFeatures

checkout payment Checkout

modeling language can be used to ease communication between the stakeholders. Examples of both textual and visual models are shown in Figure 2.

Payment

allowGuestCheckout: { yes, no } keepShoppingCartPersistent: { yes, no }

Shipping allowedForAllCountries : { yes, no }

options[1...3] {different} CreditCardPayment CashOnDelivery

4.2.1

... shipping

carrier[1...2] {different}

BankTransfer

Modeling security variability

pricing

Security, and consequently countermeasures, can vary in a SPL in different ways. Some countermeasures can be optional: for example, some web shops use encryption while some do not. Some countermeasures can be common among products: for example, all products require that the web shop administrators must be authenticated. Some countermeasures can be applied at different levels: for example, the minimum length of the administrator password can be varied. Further, countermeasures may depend on functional features: if credit cards are accepted as a payment method, the checkout must be protected with encryption. Finally, countermeasure variability has impact, e.g., to other quality attributes. In order to represent all this, our modeling conceptualization defines countermeasure variability through composition, attributes, inheritance, constraints and descriptive countermeasure impacts (Figure 1). Firstly, countermeasures can be composed to organize them into hierarchies. To vary the composition, a countermeasure type can specify a part definition, where the number and types of constituent countermeasures can be varied with cardinality of the form [n...m], n, m ∈ N , and by specifying one or more possible types for composition. For example, part definition encryption defines two possible types that can be composed with cardinality [1...1]. To compare to the popular concepts in feature modeling [20], it is possible to use represent mandatory countermeasures (cardinality [1...1]), optional countermeasures (cardinality [0...1]) and alternative countermeasures (more than one possible types). Secondly, a countermeasure type can define an attribute with varying values. For example, variability of browser session validation is represented as attribute definition protection level that has possible values nothing, medium, high,

{has_instances(CreditCardPayment) => has_instances(EncryptCommunication);}

Courier

DestinationPriceRate DestinationWeightRate

Postal

FlatRate

Figure 2: Instantiated artifact: the Magento configuration model for countermeasure and feature types (parts omitted for brevity), following the visual notation of [2]. An excerpt from the textual configuration model is also shown. custom. Attributes are a convenient way of specifying countermeasures that can be parameterized, like encryption key length, or countermeasures that need to be captured with qualitative levels. In the case example, the browser session parameters have been abstracted to qualitative levels in order to enable less experienced customers to select a variant. However, there is also the possibility to set custom session protection by setting individual session parameters under CustomBrowserSessionValidation. Thirdly, countermeasure types can be inherited from other countermeasure types, and countermeasure types can define constraints to represent complex relations that cannot be represented in other ways. For example, BrowserSessionValidation defines two constraints to ensure that the simple protection level (nothing, medium, high) is not used simultaneously with custom settings. The KumbangSec constraint language enables writing complex Boolean expressions; further details are given in [2]. Finally, countermeasure types may need to characterize the impact, e.g., to other quality attributes. However, it may be quite difficult to quantify or even qualitatively characterize the impacts in a way that is independent of the

5

encrypted channel. Therefore, the conceptualization enables creating constraints between features and countermeasures. Figure 2 exemplifies how the constraint between credit card payment feature and encryption countermeasure can be defined.

4.2.3

Figure 3: Instantiated artifact: the Magento configuration model for component types (parts omitted for brevity).

product instance and its context: for example, how much does two-way-authentication decrease user experience, and how should this information be used to configure a product? But even a textual description of the countermeasure impact can be useful. For example, the description in Figure 2 tells that the encrypted page requests may impact response times negatively. Therefore, the conceptualization simply attaches textual descriptions to the countermeasure types to describe the countermeasure impact or any other information that is relevant during the configuration task.

4.2.2

Modeling design in the architecture

Research question RQ2 is about representing the design of security and functional variability. In order to do so, the product line architecture and its variability need be modeled; and the countermeasures and features need to be related to the architectural elements that implement them. The conceptualization is applicable to both compositionbased and parameterization-based software product line architectures (theory scope in Table 1). Therefore, the conceptualization has components, interfaces, attributes and constraints (Figure 1). Consequently, the components can be composed, can have interfaces that can be connected and can be parameterized with attributes; these variability mechanisms are similar to [2]. The conceptualization is purposefully agnostic on the semantics of a component. From the security design point of view, different kinds of architectural elements may need to be modeled as components: a source code module, an encryption key file, third-party authentication web service or even a configuration file. This is because security design tends to cross-cut several views in the architecture [31]. Moreover, hard-coding any specific views in a modeling conceptualization may not be reasonable: the choice of specific views depends on the needs of the stakeholders and on the product context [31]. Further, an architectural ”component” or ”element” is used as a generic term, the semantics of which vary between views and systems [31]. Figure 3 illustrates how the conceptualization is used to model the Magento architecture, and in particular, a static code view, runtime view, and deployment view. The components in the runtime view represent Magento runtime elements that are organized to follow MVC style and interact with each other through interfaces. In contrast, the components under the static code view capture source code modules as well as configuration files. As an example, component type CoreConfig extends XMLConfigurationFile represents one configuration file for module Core. The static code modules and configuration files are then used to implement the corresponding runtime logical components. The deployment view captures information related to, e.g., Apache web server and SSL certificate files. To represent how countermeasures and features and implemented in the product line architecture, implementation constraints can be defined, as exemplified by EncryptCommunication in Figure 2. The syntax of the implementation constraints is discussed elsewhere [2].

Modeling functional variability

In the conceptualization, functional variability is represented as features. KumbangSec features are functional characteristics at the requirement or specification level; they can be used to distinguish a functional variant during configuration. As an example, feature Shipping indicates functionality related to shipping a web shop order. To capture feature variability, similar conceptualization to [2] is used: features can be composed; the number and types of the composed features can be varied; feature types can define attributes; feature types can be inherited from other feature types; and feature types can define complex constraints (Figure 1). For example, feature type Payment in Figure 2 defines up to three different payment options. As another example, feature type Checkout defines variability through attribute definitions allowGuestCheckout and keepShoppingCartPersistent, with an additional constraint that states that when persistent shopping cart is use, both registered customers and guest shoppers are required to either log in to an existing account, or create a new account before going through the checkout process. In general, feature variability may depend on security variability, and vice versa. As a concrete example, web shop merchants who accept credit cards are required by the Payment Card Industry to process transactions over a securely

4.3

Principles of Building and Using a Configurator

Research question RQ3 is about configuring consistent products to meet given security and functional needs. For this purpose, the design theory codifies the principles of building and utilizing a configurator operating on stable model semantics and KumbangSec conceptualization (Figure 4). The expository instantiation of the configurator has been implemented by using and extending the existing tool set [25] and relying on efficient inference engine smodels [34]. Since the configuration knowledge is separated from the con-

6

Product line scoping and requirements engineering

Product line architecture design

requirements, is there a KumbangSec configuration C, that is, a stable model of CM ∪ GF , such that C satisfies R? Firstly, the configurator translates the configuration model to a set of weight constraint rules CM . For example, each countermeasure type is declared with a rule

Product line implementation

Capture KumbangSec configuration knowledge Model functional variability as features

Model security variability as countermeasures

Model architecture variability as components

Model constraints between features and countermeasures

Map components to implementation arfifacts

Refine features and countermeasures

cmT ype(C): −. where C is replaced with the name of the countermeasure type. As another example, for each part definition in a countermeasure type, a rule with the following format is added:

Model realization of features and countermeasures

n{haspart(X1 , X2 , P ) : ppart(X1 , X2 , P, I)}m KumbangSec configuration model (in KumbangSec modelling language) Configurator translates the model to weight constraint rules (CM), enumerates possible instances/attributes (GF), and grounds both

:- C(X1 ), in(X1 ). where C is replaced with the name of the countermeasure type, P with the name of the part definition, and n, m replaced with the lower and upper bounds of the cardinality. In general, this mapping follows the translation of Kumbang to weight constraint rules [2]. Secondly, the configurator uses ground facts GF to describe the possible instances and the attribute values of instances that can exist in a KumbangSec configuration. For example, a ground fact

Answer set program CMUGF

Domain engineering Application engineering = Configuration task

Configurator initializes the configuration task Specify a requirement: select a feature, countermeasure, attribute value, component.

Configurator finds and visualizes consequences of R

C not consistent, cancel selection

C consistent

Configurator adds the selection to requirements R and calculates stable models C to check consistency and completeness

For a complete and consistent C: Indicate that all requirements have been selected

KumbangSec configuration Legend

Activity performed by a human

Activity performed by the tool

Activity outside the scope

Specify all requirements as features, countermeasures, attribute values, components C not found

Configurator calculates stable models C to satisfy requirements R

cmAdminAuthentication(i). Instantiate or reconfigure product variant

indicates that countermeasure instance with identifier i is of countermeasure type AdminAuthentication; each possible instance will get a unique identifier i in GF . The grounding here means that all variables used in the rules are removed; further details are given [34]. Thirdly, the configurator treats requirements R as the set of rules that a specific product instance must satisfy, stated as KumbangSec instances that must be present in the configuration, or as attribute values that these instances have. For example, a requirement

Artifact

Figure 4: Design theory: the principles of building and using a configurator tool.

figurator implementation (Table 1), the same configurator is reusable across many product lines. In general, configurator tools are needed in three cases. Firstly, with large and complex enough variability, humans cannot keep track of all dependencies between the variation points, and even ad hoc implemented configurators may have difficulties. Secondly, the configuration task may be done by a person that does not know the implementation, for example, web shop owners or administrators in the case of Magento. Thirdly, the configuration task may be done at runtime, or by a machine, as is exemplified in [23, 41].

4.3.1

hasattr(i, encryptAdmin, yes). states that the encryption for authenticated admins must be enabled. Finally, the configurator calculates the configuration C as a stable model from the program CM ∪ GF ; the configuration consists of a set of positive and negative atoms. Positive atoms represent the instances and attribute values that are in the configuration. Due to the characteristics of the stable models [34], the instances and attribute values in the configuration C both satisfy the configuration model and its requirements, and are justified by them. Consequently, the configuration C is both consistent and complete. Informally, a consistent configuration is such that no rules of the configuration model are violated. A complete configuration is such that all the necessary selections have been made.

Building a configurator operating on stable model semantics

The main task of a configurator tool is to support the configuration task. To define and implement the configuration task, the configurator utilizes answer set programming, and in particular, weight constraint rules [36, 34]. The configurator translates the configuration model to an answer set program written as weight constraint rules; and invokes the inference engine smodels to find the configurations as answer sets to such programs. Additionally, the translation to answer set programs provides formal semantics to the KumbangSec modeling conceptualization. We define the KumbangSec configuration task as follows, based on [36]. Given a KumbangSec configuration model as a set of weight constraint rules CM , a set of ground facts GF representing the possible instances from the types in the configuration model, and a set of rules R representing the

4.3.2

Preparing for the configuration task

In order to utilize the configurator in the configuration task, the configuration knowledge must be represented into a configuration model (the top part of Figure 4). Firstly, during the product line scoping and requirements engineering phase, the product line engineers can model the initial understanding of the varying functional requirements as features. Similarly, any identified security requirements can be analyzed to find initial countermeasures; the countermeasures can be captured at a relatively high abstraction level first. If any relations between features and countermeasures

7

are identified, they can be modeled as constraints. Secondly, during the product line architecture design, the product line engineers need to model the architecture and its variability as components; also the exact semantics of the components must be decided. Also, the architecture design may reveal the need to refine countermeasures and features: for example, countermeasure BrowserSessionValidation may be discovered or concretized during the architecture design. After features and countermeasures have been refined, their design within the components needs to be captured with implementation constraints. Thirdly, in order to support automated product instantiation, the product line engineers need to map the architectural entities to concrete implementation artifacts, that is, to configuration files, build files, source code bundles, and so forth. For Magento case, the components in the static code view are mapped directly to directories and resources in the source code hierarchy. However, this activity is outside the scope of our artifact. After the KumbangSec configuration model represents all necessary configuration knowledge, the configurator tool translates it into a form that enables stable model calculation, that is, into the answer set program CM ∪ GF .

4.3.3

interface and web page layouts need to be designed. However, the exact details of the instantiation and installation are outside the scope of our artifact. The configurator that supports the configuration task in Figure 4 can be implemented in many different ways: as a standalone tool with a graphical user interface (c.f., [25]) or as a service integrated with other systems (c.f., [29]). For example, the Magento Admin Panel could be integrated with a service-based KumbangSec configurator: whenever the user makes a selection in Admin Panel, the configurator is invoked to check consistency, completeness and consequences.

5.

EVALUATION

In the following, we evaluate the design theory (Table 1) by using the testable propositions P1, P2, P3 as evaluation criteria. As the first level of evaluation, the theory was instantiated as concrete artifacts that included the KumbangSec configurator and the configuration model and task for Magento. A realistic instantiation tests potential problems in the theorized design and demonstrates that the design is worth considering [14]. Further, the instantiation represents a ”slice of life”, which as such ”is most likely to be convincing” [33]. Given the concrete artifacts, the testable propositions were evaluated descriptively, and in particular, by using an informed argument [16] to compare to the current state in case Magento. The testable proposition P1 claims that countermeasures help in representing and distinguishing security variants to customers. It was possible to represent all configurable security variability in Magento as countermeasures. As an observation, the core Magento does not have much security variability: most of the configuration options in Magento focus on setting the web shop functionality. Compared to the current state in Magento, the advocation of a specific modeling construct, countermeasure, makes it easier to elicit security variability. Moreover, since Magento Admin Panel does not make a separation between functional and security options, identifying countermeasures as specific configuration options may make security variability more visible to the customers. However, the Magento case also pointed out a challenge: countermeasures can be described at different levels of abstraction [9]. Within the Magento Admin Panel, the countermeasures were described at a relatively technical level, for example, as option Use Secure URLs in Frontend. Within the configuration model, we modeled Magento countermeasures at the requirement or specification level, for example, as EncryptCommunication. Thus, the requirement and specification countermeasures in KumbangSec may help in better communicating the security variants to the customers. Nevertheless, some of the varying countermeasures in Magento, such as BrowserSessionValidation, were challenging in this respect. The testable proposition P2 states that the KumbangSec conceptualization helps in representing the design of security and functional variability in the product line architecture. Within the instantiation, it was possible to use the conceptualization to represent the Magento product line architecture and its security and functional variability. As a varying product line architecture, Magento is more geared towards parameterization than composition. Most of the core variability is in the configuration files; such configuration files

Performing the configuration task

The lower part of Figure 4 illustrates how the configurator is utilized during the configuration task. The goal is to find a product configuration C that matches the requirements R for one customer or customer segment; this product configuration can then be used to instantiate or reconfigure a product instance. The requirements R can be entered iteratively or all at once, and they can be entered by a sales person or by a customer representative. In addition, the requirements R can be specified by another machine. The configurator initializes the configuration task based on the configuration model and the corresponding answer set program (Figure 4). During the configuration task, requirements R can be selected iteratively, and the selections are added to the set of R. For example, if the user selects feature CreditCardPayment, a requirement f eatCreditCardP ayment(i) is added as a positive atom to the compute statement that represents R. After each selection on the requirements, the configurator tool checks the configuration for consistency and completeness by calculating the stable models. If there are no stable models that satisfy R, the previous selection is removed from R and its inconsistency is reported to the user. Otherwise, the configurator calculates and visualizes the consequences of requirements R by computing an approximation of the set of facts that must and cannot be true for the configurations satisfying R. For example, the selection of CreditCardPayment implies that EncryptCommunication must also be selected as a consequence. As another option, the requirements R can be entered all at once. When the resulting configuration is complete and it corresponds to the needs of the customer, the configurator tool produces a configuration, which is a description of the features, countermeasures and components; the configuration can be used to instantiate or reconfigure the product instance. For example, the web shop description can be used to load necessary code modules and set configuration file parameters. Other additional tasks may also be needed: when installing the Magento web shop for the first time, the user

8

were represented through KumbangSec components and attributes. Compared to the current state in Magento, the conceptualization was used to record implementation constraints from features and countermeasures to the architectural entities. Thus, all inconsistencies between the selections could be checked; this aspect is currently lacking in Magento. Further, the current options in the Magento Admin Panel follow more or less the options in the configuration file, such as Use Secure URLs in Frontend ; separating the representation of the customer-visible selections from the configuration file options enables to organize and name the selections in a more customer-friendly way. Compared to the current state in Magento, KumbangSec has one deficiency. The conceptualization does not support all inputs in the Magento Admin Panel, for example, the domain name that is connected to the SSL, or the IP address mask to restrict the admin access. However, such inputs are not relevant in calculating the consistency of the configuration, and thus this simplification retains the essence of the problem being solved [33]. The testable proposition P3 states that to derive consistent products, KumbangSec configurator helps. The configurator instantiated from the theory was able to find a consistent and complete configuration. This improves the current situation, since Magento does to check any dependencies or inconsistencies during the configuration task, which means it cannot help in finding a consistent configuration. Although the configuration task has been shown to be NP-hard [36], the configurator operated without no noticeable delay.

sive empirical studies in real environments and with real stakeholders. Thus, further empirical research is needed to evaluate our contribution. Finally, it is possible to extend KumbangSec to cover also other quality attributes than security. KumbangSec artifact assumes that security variability is implemented as a number of purposefully introduced design tactics. That is, a security variant consists of a number of countermeasures, which are implemented as design-level tactics in the product line architecture. The use of tactic-based approach for quality attribute variability has been reported for performance as well [28].

7.

CONCLUSIONS

We presented a design theory and artifacts for representing and configuring software product lines with varying security and functionality from the requirements to the product line architecture. The theory included a modeling conceptualization and the principles of building and using a configurator operating on stable model semantics. The concrete artifacts included a prototype configurator tool as well as the configuration model and configuration task for Magento, which is an open-source web shop product line. To distinguish security variability to customers (RQ1), security variability can be represented as countermeasures, which describe what the system must do to prevent, detect or react to threats and attacks. To represent the design of security and functional variability (RQ2), the product line architecture can be represented as components and interfaces, even potentially from several architectural views; the implementation constraints then represent how countermeasures and features are implemented. To configure consistent products to meet given security and functional needs (RQ3), the configuration models are translated to answer set programs; the consistent and complete configurations are then calculated as stable models of such programs. As future work, further evaluation is needed from the utility point of view, for example, by conducting surveys or interviews among Magento consultants and vendors. Further, systematic performance testing of the configurator tool is needed to study the efficiency of the configuration task.

6. DISCUSSION Our contribution uses countermeasures to distinguish the security differences to customers. Currently in the literature, there are no established concepts to represent security variability, although there are several examples of representing security variability through entities that resemble countermeasures [38, 41, 13]. The challenge is that countermeasures can be modeled at different levels of abstraction [9], some of which may not be understandable by the customer. Therefore, KumbangSec countermeasures were defined as requirement or specification level techniques or actions. Nevertheless, challenges may remain, as exemplified by the Magento countermeasure BrowserSessionValidation. As a complementary approach, security variability could also be represented as a varying security policy [4]; a security policy states which assets each user group is allowed to handle. Thereafter, varying countermeasures could be used to handle all violations to the security policy. Our contribution utilized textual descriptions to capture the impact of countermeasures to threats, or to other quality attributes. This approach could be refined to enable automated reasoning on countermeasure impacts. For example, by extending the modeling conceptualization with softgoals, one could represent and reason the impact of countermeasure on these goals. However, as discussed in Section 4.2, the challenge lies in representing the countermeasure impacts in a way that is applicable to all contexts. As a threat to the validity, our evaluation focused mostly on feasibility, that is, as ”the use of X works” for the instantiation. Additionally, we argued the utility of the artifact [16] by comparing to the current state in Magento. Properly arguing the utility of the artifact is difficult without exten-

Acknowledgements We acknowledge Digile N4S Program funded by Tekes.

8.

REFERENCES

[1] I. Alexander. Misuse cases: use cases with hostile intent. IEEE Software, 20(1), 2003. [2] T. Asikainen, T. M¨ annist¨ o, and T. Soininen. Kumbang: A domain ontology for modelling variability in software product families. Advanced engineering informatics journal, 21(1), 2007. [3] J. Bartholdt, M. Medak, and R. Oberhauser. Integrating quality modeling with feature modeling in software product lines. In ICSEA, 2009. [4] D. Basin, J. Doser, and T. Lodderstedt. Model driven security: From uml models to access control infrastructures. ACM Trans. Softw. Eng. Methodol., 15(1), 2006. [5] L. Bass, P. Clements, and R. Kazman. Software Architecture in Practice. Addison-Wesley, 2nd edition, 2003.

9

[6] J. Bosch. Design and Use of Software Architectures: Adapting and Evolving a Product-Line Approach. Addison-Wesley, 2000. [7] J. Bosch. Maturity and evolution in software product lines: Approaches, artefacts and organization. In SPLC, 2002. [8] P. Clements and L. Northrop. Software Product Lines — Practices and Patterns. Addison-Wesley, 2001. [9] B. Fabian, S. Gurses, M. Heisel, T. Santen, and H. Schmidt. A comparison of security requirements engineering methods. Requirements Engineering, 15(1):7–40, 2010. [10] T. E. Fægri and S. Hallsteinsen. A software product line reference architecture for security. In T. K¨ ak¨ ol¨ a and J. C. Due˜ nas, editors, Software Product Lines — Research Issues in Engineering and Management. Springer, 2006. [11] D. Firesmith. Engineering security requirements. Journal of Object Technology, 2(1), 2003. [12] D. Firesmith. A taxonomy of security-related requirements. In International Workshop on High Assurance Systems (RHAS’05), 2005. [13] B. Gonzales-Baixauli, J. Prado Leite, and J. Mylopoulos. Visual variability analysis for goal models. In RE, 2004. [14] S. Gregor and D. Jones. The anatomy of a design theory. Journal of the Association for Information Systems, 8(5), 2007. [15] M. Hafiz, P. Adamczyk, and R. Johnson. Organizing security patterns. IEEE Software, 24(4), 2007. [16] A. R. Hevner, S. T. March, J. Park, and S. Ram. Design science in is research. MIS Quarterly, 28(1), 2004. [17] IETF RFC 4949. Internet security glossary, version 2, 2007. [18] ISO/IEC 15408-1. Information technology — Security techniques — Evaluation criteria for IT security — Part 1: Introduction and general model, 1999. [19] ISO/IEC 9126-1. Software engineering — Product quality — Part 1: Quality model, 2001. [20] K. Kang, J. Lee, and P. Donohoe. Feature-oriented product line engineering. IEEE Software, 19(4), 2002. [21] D. Mellado, E. Fern´ andez-Medina, and M. Piattini. Towards security requirements management for software product lines: A security domain requirements engineering process. Comput. Stand. Interfaces, 30(6):361–371, 2008. [22] Microsoft. Microsoft Application Architecture Guide. Microsoft Press, 2nd edition, 2009. [23] V. Myll¨ arniemi, C. Prehofer, M. Raatikainen, J. van Gurp, and T. M¨ annist¨ o. Approach for dynamically composing decentralised service architectures with cross-cutting constraints. In ECSA, 2008. [24] V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o. KumbangSec: An approach for modelling functional and security variability in software architectures. In VaMoS, 2007. [25] V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o. KumbangTools. In SPLC, vol.2, 2007.

[26] V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o. Using a configurator for predictable component composition. In EUROMICRO SEAA, 2007. [27] V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o. A systematically conducted literature review: Quality attribute variability in software product lines. In SPLC, 2012. [28] V. Myll¨ arniemi, J. Savolainen, M. Raatikainen, and T. M¨ annist¨ o. Performance variability in software product lines: proposing theories from a case study. Empirical Software Engineering, To appear. [29] V. Myll¨ arniemi, M. Ylikangas, M. Raatikainen, J. P¨ a¨ akk¨ o, T. M¨ annist¨ o, and T. Aaltonen. Configurator-as-a-service: tool support for deriving software architectures at runtime. In WICSA/ECSA Companion Volume, 2012. [30] K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee. A design science research methodology for information systems research. Journal of Management Information Systems, 24(3):45–77, 2007. [31] N. Rozanski and E. Woods. Software Systems Architecture. Addison-Wesley, 2005. [32] V. S. Sharma and K. S. Trivedi. Architecture based analysis of performance, reliability and security of software systems. In Workshop on Software and Performance, 2005. [33] M. Shaw. Writing good software engineering research papers. In ICSE, 2003. [34] P. Simons, I. Niemel¨ a, and T. Soininen. Extending and implementing the stable model semantics. Artificial Intelligence, 138(1–2), 2002. [35] G. Sindre and A. L. Opdahl. Eliciting security requirements with misuse cases. Requirements Engineering, 10(1):34–44, 2005. [36] T. Soininen, I. Niemel¨ a, J. Tiihonen, and R. Sulonen. Representing configuration knowledge with weight constraint rules. In Answer Set Programming, 2001. [37] T. Soininen, J. Tiihonen, T. M¨ annist¨ o, and R. Sulonen. Towards a general ontology of configuration. AI EDAM, 12(4):357–372, 1998. [38] H. Sun, R. Lutz, and S. Basu. Product-line-based requirements customization for web service compositions. In SPLC, 2009. [39] M. Svahnberg, J. van Gurp, and J. Bosch. A taxononomy of variability realization techniques. Software — Practice and Experience, 35(8), 2005. [40] J. van Gurp, J. Bosch, and M. Svahnberg. On the notion of variability in software product lines. In WICSA, 2001. [41] Y. Wang, A. Kobsa, A. Hoek, and J. White. PLA-based runtime dynamism in support of privacy-enhanced web personalization. In SPLC, 2006.

10

Publication V

Myllärniemi, Prehofer, Raatikainen, van Gurp, Männistö. Approach for Dynamically Composing Decentralised Service Architectures with Cross-Cutting Constraints. In European Conference on Software Architectures (ECSA), Cypros, pp.180–195, September 2008.

c 2008 Springer.

Reprinted with permission.

181

Approach for Dynamically Composing Decentralised Service Architectures with Cross-Cutting Constraints Varvana Myll¨ arniemi1 , Christian Prehofer2, Mikko Raatikainen1 , Jilles van Gurp2 , and Tomi M¨ annist¨ o1 1 2

Helsinki University of Technology, P.O. Box 9210, 02015 TKK, Finland Nokia Research Center, P.O. Box 407, 00045 NOKIA GROUP, Finland {varvana.myllarniemi,mikko.raatikainen,tomi.mannisto}@tkk.fi, {christian.prehofer,jilles.vangurp}@nokia.com

Abstract. The emergence of open, composable Internet services and mashups means that services cannot be composed in a centralised manner. Despite this, cross-cutting constraints might exist between services, stemming from, e.g., security. Especially when used with mobile devices, these service compositions need to be constructed at runtime. This paper proposes a knowledge-based approach for dynamically finding and validating decentralised service compositions while taking into account cross-cutting constraints. The approach is exemplified with a case of a shopping mall portal.

1

Introduction

The emergence of various second-generation Web technologies enables the creation of increasingly complex new services by composing multiple services from multiple Internet locations [1]. For example, mashups combine data or services from multiple sources into one integrated user experience. At the same time, the emergence of personal mobile devices for Web browsing sets new requirements for service compositions. On the one hand, adapting to the user’s context and personal preferences requires that compositions need to be changed dynamically. On the other hand, security and privacy issues create constraints in how services can be composed. Thus service compositions should be able to address dynamism, decentralisation and cross-cutting constraints. The first requirement, dynamism, means that service compositions cannot be predefined, but must be created and recomposed at runtime. In some cases, even the number or the identities of the services cannot be predefined prior to runtime. The second requirement, decentralisation, stems from the fact that services participating in the composition are distributed in the Internet. However, decentralisation is not only about distribution, but it implies that there is no central, trusted party that can manage and govern composition. The third requirement, existence of cross-cutting constraints, means that there are dependencies between services that must be taken into account in order to achieve R. Morrison, D. Balasubramaniam, and K. Falkner (Eds.): ECSA 2008, LNCS 5292, pp. 180–195, 2008. c Springer-Verlag Berlin Heidelberg 2008 

Approach for Dynamically Composing Decentralised Service Architectures

181

a meaningful composition. Often such constraints are related to non-functional properties, especially security. Together, dynamism, decentralisation, and cross-cutting constraints make the composition much more difficult. Because of the decentralisation, security constraints cannot be decided by one centralised party. Because of the dynamism, relationships between decentralised services cannot be established beforehand. Despite this, existing literature typically addresses these issues separately. This paper addresses service compositions with dynamism, decentralisation, and cross-cutting security constraints. We describe a knowledge-based approach that enables finding and validating decentralised service compositions at runtime. Validating a composition means ensuring that the architecture or constraints set by different parties are not violated. The approach describes required knowledge, activities, and responsibilities; however, going through these in detail is beyond the scope of the paper. Instead, this paper lays out requirements and first solutions for required knowledge, activities, and responsibilities. We exemplify this approach using a case of a personalised search service inside a shopping mall. The rest of the paper is organised as follows: Section 2 presents the case; Section 3 presents the approach; Section 4 compares the approach to previous work; Section 5 discusses the results; and Section 6 concludes the paper.

2

Case

The example case in this paper is a shopping mall that provides a Web-based portal through which the customers inside the mall premises can access and personalise the services via mobile devices. Using a browser for consuming mobile services is argued to offer superior portability and scalability [2]. Further, customers are familiar with various Web-based portals, which can be personalised by the user, and can encompass services provided by third parties. Popular examples of such Web sites include Facebook and MySpace. However, such service portals can be enhanced by tying them to particular physical places, and augmenting them with information on the user’s location, for example. To concretise a small slice of the service composition in the shopping mall portal, this paper concentrates on the following running example inspired by the search feature in [3]. The running example is a personalised search over available shops in the mall portal. Using the search service, users can search for, e.g., campaigns, offers, and information. The search can be performed over all available shops, or over a restricted set of shops. The search service is composed of several services (Fig. 1): mall search interacts with the user interface and collects search results from shop search services. In the following, we discuss the dynamism, decentralisation and cross-cutting constraints in the case. Firstly, service compositions in the case must be formed dynamically. In the shopping mall, each user may wish to personalise the portal, subscribe to services, and have different device capabilities. Naturally, completely new services may be published and old services removed. Consequently, service compositions cannot be decided beforehand; instead, one service composition represents those services

182

V. Myll¨ arniemi et al.

Fig. 1. Running example consists of a composed search service

that can be accessed by one user during one session. In the running example, there are several factors affecting the composition that cannot be decided before runtime (see informal explanations in Fig. 1). The number and identity of the shop searches depend on the available services, on the possible restriction to cover only certain shops, as well as on the authentication mechanism that the user used for logging in. Secondly, the case illustrates how service compositions can be decentralised. Some of the services may be provided by the mall itself, while some may be provided by the shops or third parties; in some cases, even customers can act as service providers. In the running example, the decentralisation stems from the fact that some of the shop search services are provided by the shops themselves, not by the mall. This also means that the shop search services may reside on completely different hosts. To utilise, e.g., past purchase history for personalising the search, shop search services may require that users authenticate to the services provided by the shops (see Fig. 1). Thirdly, there are several cross-cutting constraints that are mostly due to security and privacy considerations. Providing any kind of personalised service inherently involves handling sensitive information, such as a customer’s location or personal preferences. Therefore, there may be a need to authenticate users of the mall portal. There are several mechanisms for authentication: customers can use an anonymous login, traditional passwords, or OpenID [4] as a decentralised, single sign-on (SSO) digital identity framework. However, in the running example, not all authentication mechanisms are supported by all shop search services. The shopping mall has decided to set a constraint that all participating shop search services must share the authentication mechanism used by the customer. However, this shared authentication mechanism cannot be decided before runtime, since it is established when the user logs in to the mall portal (see Fig. 1). To summarise, the case as well as the running example portray a decentralised service composition in which there are cross-cutting security dependencies in how services can be composed. Further, the compositions for one particular user session cannot be decided beforehand, but they must be constructed at runtime. This calls for support in finding and validating a service composition dynamically. Our solution for tackling this particular problem is described next.

Approach for Dynamically Composing Decentralised Service Architectures

3

183

Approach

The overall goal of our approach involves finding a valid service composition. A valid composition adheres to the preferences and constraints of service providers, service consumers, and service aggregators, from structural, functional, and security points of view. The task of finding and validating the composition should be performed dynamically, as services and requirements for the service compositions evolve. Further, the approach should not assume any centralised party that can govern the composition. Finding a valid service composition can be accomplished in several ways. Compared to composition by trial and error, or to composition through autonomous interacting agents, our approach relies on capturing architectural knowledge based upon which service compositions can be found and validated. In this section, we describe how our approach accomplishes this overall goal in terms of architectural knowledge, activities that produce and process the knowledge, and responsibilities and roles related to the knowledge. 3.1

Architectural Knowledge

In general, the architectural knowledge for service compositions should satisfy the following requirements. The knowledge should: 1. support automated finding and validation of service compositions; 2. be captured as models of Web-based services and their interfaces; 3. support modelling cross-cutting constraints and structural rules in how services can be composed; 4. support dynamically changing services; and 5. support distribution of the services as well as the knowledge itself. To capture the knowledge, we propose a technology-independent conceptualisation that borrows concepts from WSDL [5] and many architectural description languages, for example, Koala [6] and UML 2.0. Fig. 2 illustrates a graphical representation of the running example. Besides this graphical representation, an XML-based language has been defined for capturing the knowledge shown in Fig. 2; due to space limitations, this XML language is not described here. In the following, we discuss each of the above requirements. Our focus here is on bringing all of these aspects together in a consistent and simple way. As some of the above items are very general problems, we cannot cover each of these items in all possible ways, but rather propose basic concepts to model these aspects. The first requirement states that the service composition should be found and validated with automated tools. In order for this to be possible, our approach distinguishes between two kinds of knowledge: composition model and composition configuration. Composition model depicts the participating service types, their interfaces and characteristics; it specifies the structural rules and cross-cutting constraints that govern how services can be composed. In contrast, the architecture of one particular service composition is called composition configuration. If composition model specifies the rules, composition configuration is the target

184

V. Myll¨ arniemi et al.

that is composed from and checked against those rules. In order to support automated finding and validation, our composition model conceptualisation has been built to be compatible with metamodelling language Nivel with formal semantics [7]. This enables the use of smodels [8], which is a general-purpose inference tool based on the stable model semantics of logic programs. Using smodels, one can check the composition model for validity, as well as check the composition configuration for consistency, for completeness, and to deduce consequences. Running example. For the search scenario, the composition model defines rules on how valid service compositions can be formed for different customer sessions within the shopping mall. The composition model, as well as an example composition configuration, is shown in Fig. 2. The composition model captures the service architecture and constraints, illustrated informally in Fig. 1. Further, the composition model states session parameters as options; in this case, an option is the method that the user used for logging in to the portal. In contrast, the composition configuration represents those services that are available for one particular user session at a time.   The second requirement states that the knowledge should be captured as models of Web-based services and their interfaces; thus the models represent the architectural knowledge. For this purpose, our approach uses the concept of a service type for composition model and a service instance for composition configuration. A service type describes a set of service instances with similar properties. Similarly to, e.g., WSDL [5], a service interacts with other services through its interfaces only; hence a service exists only as a placeholder of interfaces. An interface type consists of a number of operations, whereas an interface is an instantiated interface type in a particular service instance. To attach an interface to a service, a service type can define an interface definition. Similarly to, e.g., Koala [6], an interface definition is either required or provided; provided interface definition means that the instantiated service provides the operations for others to be accessed, while required means that the service instance depends upon other services for providing these operations. To support varying interactions for the service, the interface definition can define several possible interface types as well as the minimum and maximum number of instantiated interfaces. Running example. The composition model in Fig. 2(a) defines a service type called MallSearchService with a provided interface definition search of type DoSearch. Although not illustrated here, DoSearch consists of one operation that takes the search term as an input and returns the results as an output. The example composition configuration in Fig. 2(b) shows the corresponding MallSearchService service instance with the provided DoSearch interface. To illustrate the possibility of defining varying capabilities for services, service type MallSearchService defines a provided interface definition named login, which can be either of type LoginOpenID or LoginPasswd, or it can optionally be left out altogether. This means that all service instances of MallSearchService must implement these interfaces, and when the composition is constructed, the interface

Approach for Dynamically Composing Decentralised Service Architectures

185

(a) Composition model

(b) composition configuration

(c) Legend Fig. 2. Graphical representation of the knowledge describing the service compositions of the running example

186

V. Myll¨ arniemi et al. Table 1. Constraint language in the running example

present(ref) instance of(ref, type) value(ref, attr) for all(X:ref) and, or, not , => =, !=

true if an instance referenced by ref is in the composition true if ref is an instance of type the set of values that instances referenced by ref have by the name attr universal quantifier ∀ ∧, ∨, ¬ equivalence ⇔, implication ⇒ equals, does not equal

type is selected to match the needed authentication mechanism. In Fig. 2(b), the interface has been instantiated as LoginOpenID interface.   The third requirement states that the knowledge should support adding crosscutting constraints and structural rules for how services can be composed. The proposed conceptualisation provides three mechanisms for this purpose: composite services, constraints, and options. A simple mechanism for stating how services can be composed is to model composite services. For this purpose, a service type can define the number and types of services that it is composed of using a construction called part definition. A composite service can then delegate calls to some of its interfaces to its constituent services. Running example. The composition model in Fig. 2(a) defines one composite service type known as ComposedSearchService. Through part definition mallSearch, it states that any ComposedSearchService instance contains exactly one MallSearchService service instance. Since the number of the participating shop search services can vary, part definition shopSearch states that any ComposedSearchService instance contains from zero to N ShopSearchService instances. The corresponding composite service instance ComposedSearchService is shown in Fig. 2(b).   Besides composite services, more fine-grained and cross-cutting constraints can be defined by composition models stating constraints that restrict the instances in composition configurations. The supported constraints can consist of references to service instances, predicates on these references, boolean conjunctions, and comparison operators. Due to space limitation, the entire constraint language is not shown, but Table 1 lists those constructs that are used in the running example. Running example. ComposedSearchService defines a constraint that states that all shop search service instances must have the same authentication mechanism. Further, YMartSearchService defines a constraint that denotes it cannot support OpenID as an authentication mechanism, whereas GadgetsRUsSearchService defines a constraint which denotes that it must always use some authentication mechanism, either OpenID or traditional password.  

Approach for Dynamically Composing Decentralised Service Architectures

187

There might also be constraints between services and other options of the session. These options can relate to the user’s device, or to the context or the session itself. Running example. The authentication method that the user used to log in to the portal affects the service composition. Fig. 2(a) illustrates how the composition model specifies an option type LoginOption. LoginOption has one attribute definition portalLogin, with possible types defined by an attribute value type AuthMethod. Further, the LoginOption type defines a constraint that relates the login mechanism to service composition. It states that authentication mechanism provided by the composite search service must be the same as the user used for logging in.   The fourth requirement states that the knowledge should support dynamically changing and discoverable services. Since service types are defined independently of each other in the composition model, newly discovered services can be added to the model. However, it should be possible to state constraints on these dynamically changing services, even if their identities are not yet known. This is supported by providing abstract service types, which can be inherited by other service types. An abstract service type can be used as a representative of a set of concrete service types; all interfaces and constraints defined in an abstract service type are applicable to the inherited service types as well. Running example. Since the identities of the participating shop search services may evolve over time, they are represented with an abstract ShopSearchService type in the composition model in Fig. 2(a). This way, it is possible to state constraints on the interfaces of all shop search services without knowing their identities beforehand. Inherited shop search service types can then state further constraints: for example, YMartSearchService adds a further constraint that excludes OpenID.   The fifth requirement states that the knowledge should support distribution of the services as well as the knowledge itself. The distribution of services can be captured in the composition models by stating the location of concrete service types. The distribution of the knowledge itself is again supported by the possibility of defining service types independently and then combining this knowledge to derive composition configurations. Running example. Concrete shop search service types, such as YMartSearchService, are attributed with location information including protocol, address and port. Further, concrete search service types can be defined independently of the model, excluding information on abstract service type SearchService. The detailed process of how this definition is conducted in described in Section 3.2.   3.2

Activities

Fig. 3 illustrates the activities that create and process the architectural knowledge described in Section 3.1. Activities numbered from 1 to 4 create and process composition model, while activities 5 and 6 create and process composition configuration.

188

V. Myll¨ arniemi et al.

Fig. 3. Activities related to the approach

The first activity involves capturing the overall service architecture, architectural cross-cutting constraints, and global options. This activity utilises abstract service types to group together services with known interfaces and constraints; thus this activity can be performed without any references to the identities of concrete service types. Typically, this activity requires understanding the domain, and hence cannot be fully automated. As a result of the first activity, an initial composition model is created. Already at this stage, it is possible to check whether there are any inconsistencies in the initial composition model, for example, due to inconsistent constraints. This checking comprises the second activity. Running example. The first activity involves defining service types ComposedSearchService, MallSearchService, and ShopSearchService; interface types; and relevant constraints, as well as option type LoginOption with its constraints. Since neither the number nor the identity of the participating shop search service types is known before runtime, they can be grouped together as an abstract ShopSearchService service type.  The third activity in Fig. 3 continues from the initial composition model by listing concrete service types that can participate in the composition; new constraints can also be added. The concrete service types may fill the roles in the service architecture by inheriting abstract service types defined in the initial composition model. This activity may be performed automatically, as part of service discovery or service registration. Depending on the decentralisation, there may be several of such registries (see Section 3.3). Again, the fourth activity checks the inconsistencies in the model. Running example. The third activity involves registering concrete shop search service types ZMartSearchService, ShopALotSearchService, GadgetsRUsSearchService, and YMartSearchService. They are marked to inherit the abstract ShopSearchService type and all its interface definitions. Further, they can specify further constraints, e.g., YMartSearchService can specify that it does not

Approach for Dynamically Composing Decentralised Service Architectures

189

support OpenID authentication. The third and fourth activities are done automatically when registering new services to the mall.   After the composition model has been constructed, composition configurations can be found and validated. This is begun by stating those session options and requirements for the composition that are known; this is activity number five in Fig. 3. These known requirements and options are captured in an initial composition configuration. The final activity in Fig. 3 involves validating the initial composition configuration against the composition model, and filling in the consequences to obtain the final service composition configuration. In this case, a valid composition configuration is such that it does not violate the rules or the constraints specified in the composition model. Since activities five and six rely on an existing composition model, they can be fully automated. Typically, for one composition model, activities five and six can be repeated whenever there is a need to find or validate compositions. Running example. The fourth activity starts by identifying the authentication mechanism that the user used for logging in, as well as possible user-set restrictions on the shops participating in the search. If the user had logged in using his or her OpenID identity, without any restrictions on the shops, the sixth activity would involve finding and validating a composition configuration that aggregates all shops providing an OpenID authentication mechanism.  Again, the activities in Fig. 3 can be evaluated from the point of view of crosscutting constraints, dynamism, and decentralisation. The first aspect, support for cross-cutting constraints, is realised by providing the possibility to model and check such constraints in activities one through four. Inevitably, the division between the first and the third activity depends on the availability of top-down architectural information. In a fully bottom-up service composition, the first activity can be omitted altogether, by relying on modelling concrete services in the third activity. This way, the division aims at balancing between bottom-up composition of services and top-down, typically cross-cutting architectural constraints. The second aspect, the level of dynamism, is mainly determined by the time when the activities are performed. The more activities are performed at runtime, the higher the level of dynamism. Therefore, dynamism is not just about composing services dynamically, like is suggested in the taxonomy of composing adaptive software [9]. Instead, it should be separated whether also composition model knowledge is created and processed dynamically. In the simplest dynamic case, the initial and final composition configurations are created at runtime (activities five and six in Fig. 3), but composition models are created before runtime. However, in a very dynamic case, both the composition model as well as composition configurations are created at runtime. Running example. The first and second activities in Fig. 3, which model and check the service architecture, are performed at design-time. In contrast, available services and related constraints can be added and removed at runtime, as part of the service registration; this implies that the third and fourth activities

190

V. Myll¨ arniemi et al.

Fig. 4. Some example ways of allocating responsibilities of creating and managing knowledge. Each rounded rectangle represents one realm of responsibility for one actor.

are performed dynamically. Finally, the fifth and sixth activity are performed dynamically when the user starts a new session, subscribes to new services, sets personal preferences, or otherwise changes the options or requirements affecting the required service composition.   The third aspect, the support for decentralisation, depends on how different parties participate in performing the activities, and how the resulting knowledge is managed. This is discussed further in the next section, which covers the roles and responsibilities related to the activities. 3.3

Responsibilities and Roles

In addition to the architectural knowledge (Section 3.1) and activities (Section 3.2), it should be established who is responsible for managing the knowledge and performing the activities. Fig. 4 illustrates four different ways of allocating responsibilities between different actors. Fig. 4(a) corresponds to a fully centralised situation in which both composition model and composition configurations are governed by one party. Fig. 4(b) corresponds to a situation in which some services are governed by separate parties, but composition models and compositions are collected by one actor. Fig. 4(c) depicts a situation in which several parties govern some parts of the composition model knowledge, but this knowledge is integrated into one in order to find and validate composition configurations. Finally, Fig. 4(d) illustrates a situation in which different parties do not trust each other enough to share any composition knowledge, but composition configuration is found and validated against fragments of composition models. Running example. The search scenario corresponds to Fig. 4(c); some services and their composition model knowledge are created and managed by separate

Approach for Dynamically Composing Decentralised Service Architectures

191

shops. However, because of the existence of the central shopping mall, it makes sense to collect an integrated composition model to the shopping mall. The benefit of such centralised knowledge is that one can validate all composition configurations against this centralised model. The composition model is collected to the portal server. When new shop services are registered to the mall, composition model fragments are also registered, which describe the registered service types, their interfaces, and relevant constraints.   The division of responsibilities can be evaluated against dynamism, decentralisation, and cross-cutting constraints. Firstly, dynamism affects how relationships between different roles can be established. If new actors can emerge dynamically, there must be a mechanism for discovering those actors, and consolidating their possible composition model fragments. In our running example, this is implemented with new shop services being registered to the mall portal. Secondly, the level of decentralisation mainly determines how responsibilities are divided. The more the responsibility for performing the activities and managing the knowledge is distributed, the higher the level of decentralisation. Finally, the division of responsibilities affects how cross-cutting constraints are managed. If most cross-cutting constraints can be defined by one party, like in our running example by the mall portal, it is easier to manage them as part of the composition model. In contrast, handling cross-cutting constraints in Fig. 4(d) can only rely on specifying the properties of other services through their interfaces.

4

Comparison to Previous Work

This section compares the approach described in Section 3 to existing literature. The comparison evaluates the literature from the points of view of dynamism, decentralisation, and cross-cutting (especially security) constraints. There are a wealth of studies on runtime architectural adaptation and adaptability. In general, software adaptation can be categorised to be either parameterised or composed [9]; typically architectural adaptation addresses the latter. Studies on composed adaptation stem from architectural description language studies [10], software product families and software variability [11,12,13], or from adaptive software in general [14,15]. Typically, these approaches adapt an existing architecture based on an adaptation model that has been defined preruntime; hence they do not address whether adaptation models are also adapted dynamically. In some cases, e.g., in [12], the dynamism is limited to selecting among predefined compositions resolved before runtime. Some of these dynamic approaches address also decentralisation. For example, [16] provides an overview of dynamic evolution of distributed, component-based systems. However, decentralisation typically means that composed elements are distributed; further, some kind of centralised adaptation model still exists to govern the composition. In many respects, [17] is close to our approach, since it addresses dynamic adaptation and distributed systems, and it provides a mechanism for constraining compositions using utility functions, which are matched to the adaptation needs of the system, compared to explicit constraints in our approach. However,

192

V. Myll¨ arniemi et al.

this approach is oriented more towards adapting single systems for one user with possibly distributed resources, not towards adapting truly decentralised systems. Further, such component-based approaches require special middleware to be present in the adapting system. A study that applies software product family techniques for dynamic Web personalisation with varying privacy constraints has been presented in [18]. Similar to our approach, it addresses personalising the Web experience, while using simple boolean constraints to specify privacy concerns. However, their aim is to find an architecture consisting of User Modeling Components, which encapsulate personalisation methods, e.g., used recommendation algorithms. In contrast, our aim is to find and validate an architecture consisting of Web-based services. Essentially, the principles of service-oriented architecture (SOA) and serviceoriented computing (SOC) promise to deliver distributed, independent services that can be discovered and composed dynamically [19]. However, most approaches do not address truly dynamic, adaptive compositions, but different dynamic and adaptive aspects of service compositions are still research challenges [20]. For decentralisation, SOC does not assume anything about the location of the services participating in the composition. However, many facets of SOC still rely on having centralised knowledge, based on which compositions can be formed. For example, business process notations, such as BPEL, that orchestrate one service composition, are typically created statically and governed by a centralised engine. An example of a more dynamic approach for composing services using process descriptions is proposed in [21]. However, this approach does not discuss the decentralisation of the required knowledge, nor the possibility for crosscutting constraints. The artifacts in our approach have been described using a notation that resembles WSDL [5]. At the abstract level, WSDL describes message types, port types and operations; port types roughly correspond to interface types in our solution. However, there are several differences. WSDL is oriented more towards describing properties of a single service. Therefore, it cannot be used for expressing constraints among services that cross-cut many services. Further, our approach is more explicit in describing varying rules of combining elements in the services. Further, Web Services are much more complicated compared to the simple request-response operation mode of Web-based services in our case example. Finally, WSDL does not separate between composition model and composition configuration, but describes services only at the level of a composition model. The tenets of SOA highlight the importance of flexibility and autonomous services. Therefore, imposing cross-cutting security constraints becomes a challenge. Security solutions for SOA have been categorised to comprise of messagelevel security, security as a service, and policy-driven security [22]. Out of these, policy driven security, such as WS-Policy and WS-SecurityPolicy for Web Services, resembles our approach for declaratively specifying constraints. Although implementing security as a service could encapsulate security constraints across

Approach for Dynamically Composing Decentralised Service Architectures

193

many services, it is limited to certain scenarios. Message-level security, with WSSecurity extension for Web Services, is mainly interested in protecting serviceto-service communication. Finally, current composition support in SOA is complicated for end users when building their own applications [23]. Consequently, mashups have emerged as a light-weight method of composing Web-based services. Several composer tools aimed at easy composition of Web-based services have emerged, including Marmite [24] and YahooPipes [25]. Marmite and YahooPipes both support a data flow architecture, where data is processed by a series of operators in a manner similar to Unix pipes. Thus both are suitable for manipulating and combining, e.g., Web feeds. However, while visual mashup composers provide a more user-friendly approach for service composition than SOA techniques, they still require the user to construct and validate the composition herself. Within our approach, the finding and validation of a composition is made completely invisible to the end users, since these activities can rely on the rules set in the composition model. Finally, current mashup composers do not address security, nor provide means for specifying cross-cutting security constraints.

5

Discussion

The following discusses the approach and lays out future work items. Section 3 described the knowledge as well as the activities required for automating compositions. A tool suite for the automation should consist of the following three tools. Firstly, a graphical modelling tool is needed to produce and check initial composition models in XML. Secondly, a tool is needed to register concrete service types at runtime and consequently check the validity of the resulting composition model. Depending on the decentralisation, several instantiations of such a tool can be deployed at a time. Thirdly, a tool is needed to find and validate composition configurations at runtime; this tool should utilise smodels [8] inference engine. Another issue that is not addressed is instantiating and executing the service composition after the composition configuration has been found. In general, this requires the integration between composition process and used technologies. However, compared to dynamically adaptable component-based approaches, there is no need to shut down or start up components in order to deploy the composition, since services are already deployed and running. On the downside, executing the service composition must not rely on the availability of the services in the composition, since previously available services may not be available at the time of execution. Thus, in an extreme case, dynamism does not cover only service composition, but also services in the composition can appear or disappear dynamically. The activity six in Fig. 3 could be elaborated further to take into account a situation in which there are several competing services and thus several competing composition configurations that match the options and requirements stated in activity five. The running example in this paper was such that all search services

194

V. Myll¨ arniemi et al.

that did not violate against the authentication constraints could be included in the composition. However, there could be several mutually exclusive alternative services, among which selection should be made. If activity six in Fig. 3 results in several possible composition configurations, the approach can be augmented with a selection or optimisation algorithm. This paper addressed security as a non-functional property to illustrate crosscutting concerns. Security constraints often cross-cut many services in the architecture, therefore requiring them to be considered during the composition. The constraints in this paper were rather functional; this is typical for security. However, in order to address quality properties expressed in numeric metrics, such as availability or performance, the approach could be extended. Firstly, the conceptualisation should be able to capture numerical quality properties of interfaces and services. Secondly, the approach could include a means of evaluating the overall quality property of the composition based on the quality properties of constituent services; this can utilise existing methods available for, e.g., predictable assembly [26]. Finally, our approach does not address the semantics of modelling constructs. The more knowledge is decentralised, the more semantic issues are bound to rise. Within the field of service-oriented computing, semantic issues have been studied. However, such considerations are out of the scope of this paper.

6

Conclusions

In this paper, we presented an approach for dynamically finding and validating decentralised service compositions with cross-cutting security constraints. The approach is knowledge-based in the sense that it relies on capturing architecture and rules of the compositions and then utilises the collected knowledge to find and validate compositions. For our approach, we presented the knowledge that needs to be captured, the activities that create and manage knowledge, and responsibilities related to knowledge and activities. Although there has been considerable work regarding distributed, dynamic, and cross-cutting constraints in architecture modeling, the issue has not been sufficiently covered in our view. As our related work survey shows, most related works address one of these aspects; for instance, our running example could not be fully covered by state of the art works. However, the approach is still lacking tool support as well as integration with service implementation technology. As a future work item, we aim to build tool support that utilises existing inference engine smodels [8] for validating the compositions.

References 1. Murugesan, S.: Understanding Web 2.0. IT Professional 9(4) (2007) 2. Bosch, J.: Service orientation in the enterprise: Towards mobile services. IEEE Computer 40(11) (2007) 3. van Gurp, J., Prehofer, C., di Flora, C.: Experiences with realizing smart space Web service applications. In: Proc. of Consumer Communications and Networking Conference (CCNC) (2008)

Approach for Dynamically Composing Decentralised Service Architectures

195

4. OpenID: http://openid.net/ 5. WSDL: http://www.w3.org/tr/wsdl 6. van Ommering, R., van der Linden, F., Kramer, J., Magee, J.: The Koala component model for consumer electronics software. IEEE Computer 33(3) (2000) 7. Asikainen, T., M¨ annist¨ o, T.: Nivel: A metamodelling language with a formal semantics. Software and Systems Modeling (to appear) 8. Simons, P., Niemel¨ a, I., Soininen, T.: Extending and implementing the stable model semantics. Artificial Intelligence 138(1–2) (2002) 9. McKinley, P.K., Sadjadi, S.M., Kasten, E.P., Cheng, B.H.: Composing adaptive software. IEEE Computer 37(7) (2004) 10. Magee, J., Kramer, J.: Dynamic structure in software architectures. SIGSOFT Software Engineering Notes 21(6) (1996) 11. Lee, J., Kang, K.: A feature-oriented approach to developing dynamically reconfigurable products in product line engineering. In: Proc. of Software Product Line Engineering Conference (SPLC) (2006) 12. Gomaa, H., Saleh, M.: Feature driven dynamic customization of software product lines. In: Morisio, M. (ed.) ICSR 2006. LNCS, vol. 4039, pp. 58–72. Springer, Heidelberg (2006) 13. van der Hoek, A.: Design-time product line architectures for any-time variability. Science of Computer Programming 53(3) (2004) 14. Ye, J., Loyall, J., Shapiro, R., Neema, S., Abdelwahed, S., Mahadevan, N., Koets, M., Varner, D.: A model-based approach to designing QoS adaptive applications. In: Proc. of Real-Time Systems Symposium (RTSS) (2004) 15. Floch, J., Hallsteinsen, S., Stav, E., Eliassen, F., Lund, K., Gjørven, E.: Using architecture models for runtime adaptability. IEEE Software 23(2) (2006) 16. Fung, K.H., Low, G., Ray, P.K.: Embracing dynamic evolution in distributed systems. IEEE Software 21(2) (2004) 17. Alia, M., Hallsteinsen, S., Paspallis, N., Eliassen, F.: Managing distributed adaptation of mobile applications. In: Indulska, J., Raymond, K. (eds.) DAIS 2007. LNCS, vol. 4531, pp. 104–118. Springer, Heidelberg (2007) 18. Wang, Y., Kobsa, A., van der Hoek, A., White, J.: PLA-based runtime dynamism in support of privacy-enhanced Web personalization. In: Proc. of Software Product Line Engineering Conference (SPLC) (2006) 19. Erl, T.: Service-Oriented Architecture: Concepts, Technology, and Design. Prentice-Hall, Englewood Cliffs (2005) 20. Papazoglou, M., Traverso, P., Dustdar, S., Leymann, F.: Service-oriented computing: State of the art and research challenges. IEEE Computer 40(11) (2007) 21. Vukovi´c, M., Kotsovinos, E., Robinson, P.: An architecture for rapid, on-demand service composition. Service Oriented Computing and Applications 1(4) (2007) 22. Kanneganti, R., Chodavarapu, P.A.: SOA and Security. Manning Publications (2007) 23. Xuanzhe, L., Yi, H., Wei, S., Haiqi, L.: Towards service composition based on mashup. In: Proceedings of IEEE Congress of Services (2007) 24. Wong, J., Hong, J.: Making mashups with Marmite: Towards end-user programming for the Web. In: Proc. of Computer/Human Interaction Conference (2007) 25. Trevor, J.: Doing the mobile mash. IEEE Computer 41(2) (2008) 26. Crnkovic, I., Schmidt, H., Stafford, J., Wallnau, K.: Anatomy of a reseach project in predictable assembly. In: Proc. of 5th Workshop on Component-Based Software Engineering (2002)

Suggest Documents