Purposeful Performance Variability in Software Product Lines: A Comparison of Two Case Studies Varvana Myllärniemi, Mikko Raatikainen Aalto University, Finland
Juha Savolainen
Tomi Männistö
Roche Diagnostics, Switzerland
University of Helsinki, Finland
ABSTRACT Within software product lines, customers may have different quality needs. To produce products with purposefully different quality attributes, several challenges must be addressed. First, one must be able to distinguish product quality attributes to the customers in a meaningful way. Second, one must create the desired quality attribute differences during product-line architecture design and derivation. To study how performance is varied purposefully in software product lines, we conducted a comparison and re-analysis of two industrial case studies in the telecommunication and mobile game domains. The results show that performance variants must be communicated to the customer in a way that links to customer value and her role. When performance or its adaptation are crucial for the customer, performance differences must be explicitly ”designed in” with software or hardware means. Due to the emergent nature of performance, it is important to test performance and manage how other variability affects performance.
CCS Concepts Software and its engineering—Software product lines
1.
INTRODUCTION
Quality attributes, such as performance, security, reliability and maintainability, affect how the stakeholders perceive the product quality. As one quality attribute, performance is important in the industrial practice [3]. Performance covers such aspects as latency, throughput, resource consumption and capacity [15]. Typically, software product lines differentiate the products by their functional capabilities. However, customers may have different quality needs: a weather station targeting the general consumer market has less stringent data reliability requirements than a military weather station [23]. To satisfy such needs, software product lines can be built to produce products with purposefully different quality attributes. Quality attribute variability is the ability to create Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected].
SPLC ’16, September 16 - 23, 2016, Beijing, China c 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-4050-2/16/09. . . $15.00 DOI: http://dx.doi.org/10.1145/2934466.2934474
product variants with differences in quality attributes (c.f., definition of variability [44]). Quality attribute variability in software product lines has been studied mostly from the modeling and derivation point of view. Empirical evidence on how quality attribute variability is handled in industrial software product lines is scarce [31]. Several open challenges remain. First, quality attributes may be difficult to distinguish so that the customer understands the differences and is able to select a product variant. Even if there are quantitative metrics, such as uptime percentage for availability, the customer may not be able to understand or relate the measures to her needs. Second, product-line architecture is an important part of software product lines [4]. Given that quality attributes often have a cross-cutting, architectural nature [1], designing quality attribute differences into the product-line architecture and then deriving products with desired quality attributes is a real challenge. Our aim is to study how to vary performance purposefully in software product lines. Hence, we focus on the variability of one specific quality attribute, performance. The research questions are set as follows: RQ1: How to distinguish performance differences to the customers? RQ2: How to design performance differences in the productline architecture? RQ3: How to derive a product variant and to ensure desired performance? Towards this aim, we conducted a descriptive, multiplecase case study [47] in two different domains: telecommunication and mobile phone games. Our contribution is to compare evidence-based knowledge from different industrial contexts and to propose one synthesized model for each research question. The first take-away is that maximum throughput, resource consumption or other quantitative measures are not always used to communicate performance differences to the customers: another option is to describe the operating environments to which products are adapted. The second take-away is that when performance or its adaptation are important for the customers, performance differences should be purposefully ”designed in” from the start. To create performance differences, purposefully introduced software design tactics and hardware scaling are viable means. Thirdly, one needs to ensure the desired performance, that is, to manage how other software variability impacts performance and to conduct performance testing: these activities can be done at the product-line or product level. Finally, model-based derivation and impact
Table 1: The characteristics of the selected case study cases. Case Nokia Company Telecommunication infrastructure; more than 50.000 employees. Product IP-BTS, a customizable and configurable line base station in 3G mobile radio access networks. Customers: operators. CharacThe product line contains both software and teristics dedicated hardware. Long-lived investment products, variability management focusing on reconfiguration. Status The product line was designed and evaluated, but was canceled due to market reasons before the production was started; data was collected approximately ten years later. Case Fathammer Company Mobile games; less than 50 employees. Product X-Forge 3D game platform and several game line titles built on top of it. Customers: operators, device manufacturers and game portal users. CharacThe product line is dependent on the hardteristics ware of the target mobile devices, which differ drastically. No evolution of product variants, light-weight variability handling. Status At the time of data collection: 80 licensees of X-Forge, 15 game titles shipped. Years later, Fathammer was acquired and merged.
management as proposed in the current research may not be needed in all cases. Preliminary results from both cases have been reported earlier [29, 30, 31]. For this study, we did not collect new data, but revisited existing data to conduct re-analysis and synthesis. One case [29] had not been previously analyzed for any of our research questions, whereas the other case [30, 31] had been previously analyzed only for RQ2. This study contributes by proposing two synthesized models for RQ1 and RQ3 that are not visible in our previous work. This study also contributes by testing and refining our previously proposed theory [31] for RQ2. Hence, majority of the results are novel and would have not existed without revisiting the data and conducting cross-analysis and synthesis. The rest of this paper is organized as follows. Section 2 describes how the case study was conducted. Section 3 outlines the related work. Section 4 describes the results for each research question. The results and validity are discussed in Section 5 and 6, while Section 7 concludes.
2.
RESEARCH METHOD
A case study investigates a non-manipulable phenomenon in its real-life context [47] as one form of empirical software engineering research [35]. Thus case studies are suitable for studying performance variability in industrial product lines.
2.1
Case Selection
For this study, two cases were selected: Case Nokia and Case Fathammer (Table 1). The case selection combined criterion sampling and convenience sampling [33]: both cases exhibited performance variability; were easily accessible due to existing personal connections; and there were no confidentiality issues to collect the data and publish the results. The product line in Case Nokia was discontinued before
Table 2: The data collection in the cases. Case Nokia Approximately 300 pages of internal documents, including the product line software architecture document, detailed subsystem architecture document, and a product specification document First-hand experience of the third author, informal discussions recorded through notes Publicly available information about the domain, e.g. [12] Two validation sessions, where four product line chief architects reviewed the case account Written clarifying questions exchanged with e-mail and answered by one chief architect Case Fathammer A 3-hour long joint interview with the process manager and derivation manager; recorded and transcribed Public and non-public documentation One validation session with the interviewees, where clarifying questions and uncertain issues were resolved
it reached production (Table 1), the reasons were not related to the findings of this study. The post mortem nature affected study validity, but it also enabled us to access confidential documentation. Hence, selecting another case to improve validity would have decreased the amount of available data. Moreover, similar performance variability was a common phenomenon in subsequent and successful product lines in Nokia.
2.2
Data Collection
Several sources of existing data were used (Table 2). Hence, we did not collect any new data for this study. For Case Nokia, there were two main data sources: internal technical documents about the product line and observational first-hand experience of one co-author who had participated in the product-line architecture evaluation. These data sources were triangulated with two validation sessions and personal contact with the chief architects who had designed the case product line. The validation also enabled us to collect further data. In addition, publicly available information about the domain was used to augment the internal documents. For Case Fathammer, the main data collection method was a 3-hour long interview with two product line managers. The interview questions were modified from an existing research framework. The interview results were triangulated against public and internal documents about the product line: we compared that the findings were similar and added further information from the documents to the analysis. The results were later validated with the interviewees.
2.3
Data Analysis
Earlier, both cases were analyzed and the results reported: Case Nokia in [30, 31] and Case Fathammer in [29]. For both cases, the data analysis practices of the grounded theory [45, 43] were adopted. For Case Nokia, the earlier analysis involved open coding; building a theory from the data without any existing hypotheses; utilizing data comparison to identify and saturate concepts and relations; and using additional slices of data to guide the analysis [45]. For Case Fathammer, the earlier analysis involved using deductive
coding of data [43] and utilized data comparison to identify and saturate concepts and relations [45]. For both cases, the coding was partly guided by the data, partly by pre-existing theoretical knowledge. For this study, qualitative cross-case analysis [13] was conducted by comparing and synthesizing the cases. First, the data and analysis results were revisited. Since the analysis for Case Fathammer was more than 10 years old, it was extended by identifying new concepts related to RQ1, RQ2 and RQ3. Also the analysis for Case Nokia was revisited to identify concepts related to RQ1 and RQ3. Example concepts included creating performance differences and describing performance. Thereafter, we cross-analyzed the cases by identifying concepts that revealed the differences and similarities. Such emerging concepts included, e.g., impact of other variability and testing. Thereafter, we compared the cases across these emerging concepts through comparison tables. The results of the comparison were further analyzed to identify relations between the cases, and main similarities and differences were synthesized and classified into three models, one for each research question.
3.
RELATED WORK
Quality attribute variability is mostly studied from the modeling point of view in the current research. In particular, feature modeling dominates the research: varying quality attributes can be represented as features [26, 25] or by embedding quality attribute information into features [38, 46]. Also, varying quality attributes can be represented as softgoals [6, 24, 48] or as requirements [23, 28]. However, it is not always clear whether the proposed representations are meant to be communicated to the customers directly (RQ1), or whether they are meant for internal product line management. In general, when communicating with the customer about product line variability, one should focus on the customer-relevant, essential variability and not on the technical variability [11]. It is yet unknown what part of quality attribute variability is essential. When describing quality attribute differences between products, the literature seems to make a division between two kinds of attributes: soft and hard. Soft quality attributes have no clear-cut satisfaction criteria and only impose restrictions on how functional requirements should be met [17]. Such soft quality attributes are typically described qualitatively [38] with a measure on the ordinal scale [42], for example, as ”high performance”. In contrast, hard quality attributes are quantifiable [38]: they can be characterized on the interval or ratio scale [42] and be measured unambiguously, for example, as ”latency less than 100ms”. In the previous work, it seems that performance is more often treated as a hard attribute, that is, characterized through quantitative, measurable properties [38]. The previous studies do not often explicitly cover the architecture design of quality attribute variability (RQ2) and the derivation of product variants from that design (RQ3). Instead, both aspects are typically addressed through feature models. This is because features can be used to represent, besides problem domain entities, also solution domain entities even to the level of design decisions and technologies [25, 17, 20]. Therefore, many studies on feature models and quality attribute variability actually describe design. As a concrete example, Linux Ubuntu packages, i.e., architectural entities, are modeled as varying features with a certain
impact on memory footprint [34]. As another example, redundancy controls active and standby, i.e., design tactics for availability, are modeled as alternative implementation technique features [20]. Additionally, there are studies that focus on product-line architecture design activities [21, 22]. Also the link from feature models to product-line architecture has been studied [18, 19, 20]. The architectural, emergent nature of quality attributes makes design and derivation more challenging. It has been argued that quality attributes cannot be directly derived, that is, derived by simply selecting single features that represent product quality attributes in the feature models [39]. This is because quality attributes are the result of and impacted by many functional features [39]. In other words, quality attributes are indirectly affected by other variability in the software product line [32]. This impact of other variability must be taken into account in the design and derivation approach. To address the impact of other variability, most studies use model-based, externalized approaches. In particular, impact of other variability is explicitly represented in feature models as feature impacts. A feature impact characterizes how a particular feature contributes to a specific quality attribute: for example, selecting feature Credit Card adds 50 ms to the overall response time [41]. There can be two kinds of feature impacts that depend on the nature of a specific quality attribute. First, there can be qualitative feature impacts, for example, feature Verification improves reliability [38]; these can be used as guidelines during the product derivation. Second, there can be quantitative feature impacts that can be either directly measured or inferred from other measurable properties, for example, one can compute to which extent a feature influences the memory footprint of an application [38]. However, there are also quality attributes that are quantifiable but not measurable per feature: for example, it has been claimed that response time can only be measured per product variant [38]. To complicate matters further, the impact of one feature may depend on the presence of other features [37, 36, 39, 5]. A feature interaction means that combining features may have an unexpected effect on quality attributes compared with having them in isolation. For example, when both features Replication and Cryptography are selected, the overall memory footprint is 32KB higher than the sum of the footprint of each feature when used separately [38]. Feature interactions may occur when the same code unit participates in implementing multiple features, when a certain combination of features requires additional code, or when two features share the same resource [38, 37]. Managing feature impacts and interactions may require explicit representation and dedicated tool support, as manifested by the Intrada case [40]. It may be possible to manage feature impacts manually by trying to codify tacit knowledge into heuristics or by comparing with predefined reference configurations [40]. However, when Intrada needed to create a high-performance product variant, the complications of manual impact management caused the derivation to take up to several months, compared with only a few hours [40]. Even when supported with a tool that was able to evaluate the performance of a given configuration, only a few experts were capable of conducting a directed optimization towards a high-performance configuration [40]. Perhaps surprisingly, the previous work does not that much
describe how quality attribute differences are designed in the product-line architecture (RQ2). It has been proposed that quality attribute differences can be designed by introducing several variants of an architectural pattern [9, 27] or tactic [21, 22, 14]. The challenge with varying patterns and tactics is that they may crosscut the architecture [10] and thus may be costly to implement, test, manage and derive. Also, besides creating purposeful differences with design patterns and tactics, the impact of other variability may create additional differences between product variants. In contrast, there has been considerable research attention on deriving products with different quality attributes (RQ3). The derivation can be about finding a variant that meets specific quality attributes, for example, to find a product that has 64 MB or smaller memory usage [40], or about optimizing over one or more quality attributes, for example, to find the most accurate possible face recognition system that can be constructed with a given budget [46]. Most of the studies use feature models and explicitly represented feature impacts as the basis for derivation. From the computational point of view, algorithms that are needed for finding and optimizing variants from feature models are computationally very expensive. Earlier solvers based on constraint satisfaction problems resulted in exponential solution times to the size of the problem [2]. Finding an optimal variant that adheres to feature model and system resource constraints is an NP-hard problem [46]. Therefore, several approximation algorithms have been proposed to find partially optimized feature configurations [8, 46]. Other proposals utilize hierarchical task network planning [41].
4.
RESULTS
The following describes the results. Cases are compared in Tables 3, 4 and 5. Synthesized models are presented in Figures 1, 2 and 3.
4.1
Distinguishing to Customers (RQ1)
We describe and compare Case Nokia and Case Fathammer as follows: what was the varying performance attribute and how performance was distinguished to the customers when describing products (Table 3). In Case Nokia, the base stations differed in their phone call capacity (Table 3). Phone call capacity was defined as the maximum throughput of phone calls, that is, using an established, externally observable performance measure. This definition was dictated by the reason to purposefully vary performance in the first place. The base station capacity was one of the most valuable aspects to the operators, which led Nokia to differentiate capacity in pricing. Since serving phone calls was the main functionality of base stations at that time, capacity was set to measure this main functionality. The phone call capacity differences were communicated to the customers in two ways: directly as phone call capacity to acquirers or as the number of channel elements to administrators (Table 3). Since the ability to serve as many phone calls as possible was one of the most valuable aspects to the operators, phone call capacity was used as a selling point when acquiring network elements. However, when configuring base stations, the base station capacity was communicated to the operator administrators as channel elements. A channel element was an abstraction of the internal resources needed to deliver certain phone call ca-
Communicating performance differences to the customers
Describing the product
Describing externally visible performance (Nokia)
Describing the product environment (Fathammer)
Describing internally visible resources (Nokia)
Legend
Approach (instantiated in case)
Is-a
Figure 1: A synthesized model of RQ1: Distinguishing performance differences to the customers.
pacity. Channel elements dictated phone call capacity independently from other network planning parameters, such as base station interference and power. Hence, channel elements were easier to configure separately. To summarize, the varying phone call capacity was communicated either directly, as externally observable capacity, or as internally observable resources needed to deliver the capacity. In Case Fathammer, the 3D mobile games differed in their resource consumption and refresh rate (Table 3). The resource consumption was measured as the maximum runtime heap consumption of a game; the size of the compressed game file when downloading the application; and the size of the installed game on the device, including all required data files and libraries. Refresh rate was varied as game graphics and game action refresh rate in a given device.The reason for this variability was to maximize the use of varying device capabilities. Game playability and graphics attractiveness were valuable to the market success, yet were in conflict with resource consumption and game refresh rate. Since the device capabilities differed so much from each other, a single solution would either have been too heavy for the lower-end devices, or have had too poor graphics for the high-end devices. These differences in resource consumption and refresh rate were not explicated as such to the customers, that is, to endusers and game distributors. Instead, product performance was communicated to the customers only as the target device to which resource consumption and refresh rates were adapted (Table 3), for example, by stating that the game was optimized for Nokia 6600 mobile phone. Hence, varying resource consumption and refresh rates were communicated indirectly as the target operating environment. This was because customer value was not in performance per se, but in having performance, playability and attractiveness adapted to maximize the use of limited device resources. To synthesize and compare the results in Table 3, the varying performance attributes included capacity, resource consumption and refresh rate. These are all established, externally observable product quality attributes that have clearly defined quantitative measures. However, these attributes were not necessarily used to distinguish variants to the customers. Instead, performance differences were communicated to the customers in three different ways (Figure 1): as the externally observable product performance,
Table 3: A comparison of RQ1: Distinguishing performance differences to the customers. Case Nokia Case Fathammer The varying Phone call capacity: the maximum number of Resource consumption: the game heap memperformance phone calls the base station can process per a time ory consumption, the application size when downattribute unit, that is, the maximum throughput of phone loading and the application size when installed on calls. the device. Refresh rate: the game action and game graphics refresh rate. Communicated As externally observable performance: as As target operating environment: as the tarto the phone call capacity to acquirers. get mobile device to end-users and game distribucustomers when As internally observable resources: as tors; game resource consumption and refresh rate describing channel elements to administrators; these mea- were not explicitly communicated but adapted to products sure the internal resources needed for certain each device. phone call capacity and are independent of other (re)configuration parameters.
as the internally observable resources needed to deliver that performance, and as the target operating environment to which the product performance was adapted. The cases selected their approach to distinguish performance (Figure 1) based on the customer value and her role. If performance was differentiating, that is, the customer got more value from better performance, products were distinguished through external performance characteristics that directly linked to the customer value. For example, base stations were distinguished through maximum throughput of phone calls, since serving more phone calls directly brought monetary value to the customers. If the customer value came from how well performance was adapted to resources, products were distinguished through the environments that dictated resources. This helped the customer to understand that a particular variant was optimized for her needs. Finally, when the customer administrator needed to configure performance of the product, products were distinguished as internal resources that could be set independently of other product parameters.
4.2
Designing Performance Variability (RQ2)
We describe and compare Case Nokia and Case Fathammer from the following viewpoints: how to design performance differences and how to handle impact of other variability (Table 4). We also discuss the rationale for these approaches. Within Case Nokia, there were two approaches for designing capacity differences in the base station architecture. First, major differences in capacity were achieved by scaling out and up the base station hardware. Since there was dedicated hardware for transferring phone calls in the base station, the maximum number of phone calls could be altered by adding more CPUs, upgrading more powerful CPUs, or adding more memory for this dedicated hardware. Second, differences in capacity were achieved through software means, and in particular, by downgrading the maximum system capacity achieved with the full hardware configuration. Capacity downgrading was implemented by limiting the number of channel elements visible for the software components. Changing available channel elements also involved programmatically disabling or enabling the dedicated hardware resources. A software component, Resource Manager, monitored and controlled how many channel elements were used by the software. Since other software components were unaware of the actual hardware resources, available channel elements could be changed at runtime without af-
fecting other operations in the base station. A key characteristic of Case Nokia was that the impact of other software variability on channel capacity was minimized (Table 4). That is, other software variability in the base station mostly did not impact how the channel elements were used to serve phone calls. This was because there was dedicated hardware for call handling: varying management and control functionality did not change the use of these hardware resources, with the exception of the choice on the radio access standard. The rationale for the design approach in Case Nokia was as follows. Since hardware costs were a major driver in the base station products, creating capacity differences with hardware saved on production costs. Additionally, scaling base station hardware was an established practice in the domain. When the hardware configuration was fixed and capacity was varied through downgrading, there were no cost differences between the variants and the lower capacity variants had ”too good” hardware. Instead, downgrading approach was due to price differentiation: the customers valued different price points and the possibility to upgrade capacity without affecting other product characteristics. Also, the capacity needed to be reconfigured at runtime, which supported the decision to minimize impact of other variability. In Case Fathammer, software design was purposefully varied to adapt game resource consumption and refresh rate to different devices. The design varied game content, level-ofdetail, visuals and sounds. For example, resource consumption was varied by changing 3D materials, textures, object models and game levels. Graphics refresh rate was varied by changing level-of-detail, such as the number of polygons, materials, and rendering algorithms. By optimizing game content and lowering visual quality, a speed gain as high as 200% was possible: this required significant content changes. Since all these design tactics improved game visual attractiveness and playability at the expense of performance, Case Fathammer traded off several conflicting quality attributes in software design (Table 4). Both resource consumption and game refresh rate were affected by other variability in the game. For example, adding new content to the game affected resource consumption. However, no attempts were made to minimize, handle or represent the impact in the product-line design. The design approach in Case Fathammer was motivated partly by the drastic differences in mobile devices, partly by the inherent trade-offs between game graphics, content and performance. An example of such an inherent trade-off
Table 4: A comparison of RQ2: Designing performance differences in the product-line architecture. Case Nokia Case Fathammer Differences Scaling hardware: varying product hardware Trading off with software design tactics: created by that was dedicated to transferring phone calls. varying several aspects of game graphics and conDowngrading with software design tactics: tent that improve visual attractiveness or playalimiting the number of channel elements visible bility but worsen resource consumption or refresh for the software components to decrease capacity rate. without affecting other quality attributes. Impact of other Minimized: the impact of other software vari- Ignored: resource consumption and refresh rate variability ability on channel capacity was minimized in the were affected by other game variability, but this design by having dedicated hardware. was not managed explicitly. Rationale Hardware cost: hardware scaling justified by ex- Inherent trade-offs: game graphics and content pensive hardware and ability to develop scalable inherently in conflict with performance, known software. practices of scaling graphics and content. Capacity differentiation and rebinding: downgrading with minimal impacts justified by the need to vary capacity separately from other quality attributes and to support controlled runtime rebinding.
was that having more game levels increased game download size. Therefore, it was straightforward to adapt game performance to different devices by varying these inherent trade-offs. The domain also has many known practices of scaling level-of-detail and game content. As a comparison and synthesis, both cases utilized purposefully introduced software design tactics to create performance differences (Figure 2). In Case Fathammer, the software design traded off performance with other quality attributes. In Case Nokia, the software design simply downgraded the resources that were visible to the software components without trying to affect other quality attributes. In addition, Case Nokia also scaled hardware to create differences in performance. Finally, the cases were very different in handling impact of other software variability in the product-line design. In Case Nokia, impact of other software variability was kept to the minimum to enable controlled capacity differentiation and reconfiguration at runtime; this relied on having dedicated hardware. In Case Fathammer, impact of other software variability was ignored in the product-line design. The synthesis in Figure 2 is in line with the theory pro-
Creating performance differences in design
Scaling hardware (Nokia)
Downgrading software design tactic (Nokia)
Varying software design tactic
Trading off software design tactic (Fathammer)
Legend
Approach (instantiated in case)
Is-a
Figure 2: A synthesized model of RQ2: Designing performance differences in the product-line architecture; slightly adapted from [31].
posed in our previous work [31]. Hence, we contribute by testing and refining the theory of creating performance differences in the product-line architecture design. According to this theory, the performance differences can be designed through software design tactics or by scaling hardware; software design tactics can either downgrade performance without affecting other quality attributes or trade off performance and other quality attributes. The impact of other software variability can be minimized in the design, if dedicated hardware is a viable option.
4.3
Deriving Performance Variants (RQ3)
We describe and compare Case Nokia and Case Fathammer from the following viewpoints: how derivation was conducted; how impact of other variability was managed; and how performance was tested (Table 5). In Case Nokia, since base stations were expensive longlived products and capacity needs increased over time, upgrading base stations after deployment was essential. Therefore, derivation could mean either configuration or reconfiguration (Table 5). An operator administrator was responsible for both tasks. Reconfiguration involved either purchasing new a license or upgrading base station hardware: a new license upgraded capacity at runtime by changing the number of available channel elements. During derivation, there was no need to check how other variability affected performance: the product-line design was built to minimize impacts. In contrast, systematic performance testing was crucial, since capacity was guaranteed and valuable to the customers. However, it was not possible to do any testing when the customer wanted to upgrade capacity at runtime; hence performance testing was done beforehand, at the product-line level. Instead of testing all capacity variants against all configurations, it was sufficient to test only the maximum, minimum and some downgraded variant. This was because the impact of software variability was minimized in the product-line design and hence could be ignored in testing. Hence, Case Nokia was focused on supporting automated, customer-conducted reconfiguration of guaranteed capacity, which meant that all performance validation activities (testing, impact management) needed to be done systematically and at the product-line level. In Case Fathammer, a product-line engineer derived a game to a specific mobile device and sales channel by adapt-
Table 5: A comparison of RQ3: Deriving product variants and ensuring desired performance. Case Nokia Case Fathammer Derivation task System-supported (re)configuration: in- Manual porting and optimization: manually stalling or upgrading base stations, either remotely adapting a game to a specific mobile device and at runtime or manually, by an operator adminis- sales channel by a product-line engineer. Bound trator. Bound variability by setting software pa- variability by tuning code parameters and compilrameters or installing hardware. ing; but also by creating new implementation or game content. Impact of other Ignored: impacts could be ignored during Checked by testing: how other game variabilvariability (re)configuration, since they were minimized in the ity affected performance was manually tested and product-line design. tuned during derivation. Performance Testing at the product-line level: perfor- Testing when deriving a product: the pertesting mance testing had been systematically done at the formance, graphics and playability of each variant product-line level to guarantee promised capacity; were tested iteratively, based on which final decinot all combinations tested. sions on the variant were made.
ing, testing and tuning (Table 5). No maintenance of games took place. The desired balance between frame rate, memory consumption, playability, and visual appeal was achieved by tuning game content and level-of-detail. Concretely, this consisted of setting appropriate parameter and configuration settings, with potentially some small amount of development and graphics design effort, and recompiling the code. Derivation involved only a few hard constraints: the device CPU should be able to refresh both graphics and game logic at the same time, and the game should fit the device memory. Otherwise, game playability, attractiveness and performance were not fixed requirements, but mostly subjective, soft goals for the game. Hence, the product-line engineer could freely modify the graphics, level-of-detail and content until a satisfactory compromise was found, relying on his domain knowledge and judgment. Testing the game continuously was a central part of the tuning process. To summarize, Case Fathammer was focused on light-weight, manual performance adaptation, and the performance validation activities (testing, impact management) were based on subjectively tuning and testing the derived product. To compare and synthesize Case Nokia and Case Fathammer, similarities can be identified. First, ensuring that performance was met was important in both cases. Both cases needed to handle impact of other variability, that is, how
the decisions on other variation points affected performance. Also, performance testing either before or during derivation was an essential activity. These activities are synthesized in Figure 3. However, the derivation tasks as well as the approaches for ensuring performance were very different (Figure 3). In Case Nokia, upgrading performance at runtime meant that all performance validation activities had to be handled at the product-line level, before deployment. Moreover, performance requirements were hard and guaranteed, which called for a systematic approach. Minimizing the impact of other variability in the product-line design was crucial: this enabled to test performance only at product-line level without having to test all possible combinations. In Case Fathammer, performance validation activities were conducted during derivation in a more ad hoc manner. Since most game characteristics were soft goals and could be freely adjusted, impact of other variability could be manually checked and tuned during derivation. Performance testing was a crucial part of this tuning. Despite the differences, the cases also shared a common characteristic: model-based impact management was not used in either case. Hence, our results show that derivation can in some cases be conducted without models that explicitly represent the impact of other variability.
5. Managing how other software variability affects performance
During product-line engineering (Nokia)
During product derivation (Fathammer)
Testing performance
During product-line engineering (Nokia)
During product derivation (Fathammer)
Legend
Approach (instantiated in case)
Is-a
Figure 3: A synthesized model of RQ3: Ensuring desired performance.
DISCUSSION
In general, quality attributes can be described and communicated at many different levels. Some quality attributes describe the product itself, that is, are product quality attributes [15]. Some product quality attributes are externally visible in the product, some are internally visible [16]. Our contribution is to show that similar variety of approaches exists when communicating quality attribute differences: some differences can be communicated by describing the product itself, some by describing the environment. The product differences can be either externally or internally visible. The decision to vary performance in the first place requires a careful analysis of the customer and design [31]. The decision on how to communicate these differences needs to be considered as well, as indicated by our study. One lesson learned is that determining essential variability [11] for performance requires understanding the customer’s role, what brings her value and utility, and why she chooses a particular variant over another. Even if established, quantitative measures exist, they are not necessarily the best way to communicate to the customer. In some cases, it may
be more informative to tell that the product is optimized to a certain environment. In some other cases, internal resources help the customer to configure the system more precisely. The latter is similar to the product line requirement ”Weather station shall be a single chip system” [23]: instead of describing product reliability, the requirement describes internal resources that affect reliability. Our results showed that even the same product can be distinguished differently to different customer roles: for example, performance can be described to a system administrator in quite technical terms. The current research rarely discusses the design that creates quality attribute differences but focuses on analyzing how other software variability creates differences in quality attributes. Consequently, quality attribute differences seem to be emergent in the product-line architecture and not purposefully introduced in the design. In Case Nokia and Case Fathammer, performance differences were explicitly ”designed in” from the start instead of being emergent in the architecture. Hence this study reasserts our previous lesson learned [31] to the research community: when quality attributes or their adaptation are a key selling point to the customer and a crucial part of the business model, quality differences must be purposefully created through software or hardware design. In particular, when the customers pay more for better quality, downgrading offers a controlled way to design performance differences. Similar differentiation could happen in music or video streaming services: freemium customers get downgraded service with a lower bit stream. The majority of current research focuses on derivation, feature models and explicitly represented impacts. We contribute by identifying that testing performance and checking impact of other variability are important activities in industrial contexts yet can be conducted in different ways. In contrast to many existing studies, Case Nokia and Case Fathammer did not use models at all to derive or to reason about impact of other variability. Although knowledge of impacts was not externalized, the previously identified challenges of manual performance configuration [40] were not encountered. There were several characteristics that made these cases different to [40]. In Case Nokia, the impact of software variability to capacity was minimized in the product-line design; hence one could systematically test beforehand how each variation point contributed to performance. In Case Fathammer, most game characteristics were soft goals, which meant impacts could be tested and all variability tuned for each game variant. It is not known whether a model-based approach would have brought added benefit compared to the effort. How can the proposed models and findings be generalized to other companies? In other words, how can the results be analytically generalized to theory [47, 35], and what is the theory scope [7]? There are several factors that affect the scope and generalizability of the proposed models. One obvious factor is whether hardware is part of the productline scope: if hardware is external to the system, like was in Case Fathammer, creating performance differences by scaling hardware is not possible. Similarly, minimizing impact of other variability by having dedicated hardware for performance-critical processing is not possible. Another factor that affects the scope and generalizability is the role of performance variability in the product-line portfolio. Is performance variability differentiated in pric-
ing similarly to Case Nokia? Is it valuable for the customer that product performance is adapted to resources like in Case Fathammer? Are performance requirements hard constraints or very difficult to achieve, or can performance and other variability be tuned and adjusted more or less freely like in Case Fathammer? Is performance variability bound at runtime, or by a product-line engineer at developmenttime? How can these results be generalized to other quality attributes? Performance has many established, quantifiable measures, which can be used directly to communicate performance differences to the customers. Some other quality attributes, for example, security, may be more difficult to distinguish to customers. As another difference to other quality attributes, performance is highly emergent and hence impacted by other variability. For example, all included code modules may alter resource consumption, application size and memory footprint. If any of these code modules change because of other variability, also resource consumption may change. Similarly, response time and throughput are both affected by the execution path as well as by contention for resources. If any step within the execution path changes because of other variability, or the contention for resources changes because of other variability, response time and throughput may change. Because performance is highly emergent in the architecture, it may be more challenging to create controlled differences in performance than for some other quality attributes. For performance, it is not enough to just add purposeful design, since the impact of other variability may drastically change the resulting performance. For some other quality attributes, impact of other variability may be less severe. For security, one can create relatively controlled, focused differences by varying countermeasures, such as authentication, authorization, encryption, and validation. Similarly, safety can be varied by varying safety functions, such as safe stop. Further, the role of hardware may be more predominant for performance variability than for some other quality attributes. After all, response time, throughput and capacity are directly affected by hardware resources and thus can be varied by scaling hardware. Whether such an approach is applicable to other quality attributes is still open: for example, is varying hardware design a reasonable way to vary usability?
6.
VALIDITY
We identified the following threats to construct validity. First, Case Nokia was conducted post mortem: the product line was discontinued before it was taken into production. Therefore, do the data collection measures correctly represent concepts related to operation, customers, selling, or other characteristics of successful software product lines? This threat was mitigated by contrasting the results to successful products in the case company product portfolio and by focusing the analysis on those matters that were established before but not the reason for the discontinuation. As another threat to construct validity, Case Nokia did not use interviews at all to collect data whereas Case Fathammer had only one interview. Interviews typically are a good source of rich qualitative data. The lacking richness in Case Nokia was alleviated by having an involved participant as an author; validating the results twice with key informants [47]; and triangulating the documents, participant observa-
tions and comments from the key informants against each other. This corresponds to having multiple sources of evidence [47]. Also Case Fathammer validated the results with key informants and used multiple sources of evidence. Finally, the data analysis in Case Nokia utilized only lightweight coding: a large part of the analysis was conducted through writing and informal discussions. Thus, are the operationalized high-level constructs grounded to data [45]? To alleviate this threat, data was revisited during analysis to check against the newly formed concepts; and the results were validated. To the best of our knowledge, no threats to internal validity were identified in the case studies. The results from the case studies did not involve any purely causal inferences in the form ”if X, then Y”, but were mostly about descriptions. External validity is about generalizing the findings of a case study to other cases; however, the generalization should be done analytically to theories and not to populations [47, 35]. The model for RQ2 and its scope of generalization [7] have been described earlier [31]. For the other proposed models, we did not provide a thorough analysis but merely identified several factors that affect the generalization.
ferentiation, and minimizing the impact of other variability makes derivation and testing easier. As another example, if product characteristics that affect performance are not fixed requirements but merely soft goals, it may be possible to simply test and tune manually each product variant. In the future, the dual nature of design and derivation should be studied further. The tension between explicit design tactics and emergent, indirect variability in the productline architecture definitely calls for future work, and in particular, within real life contexts. For example, when is externalized impact management needed? Further, given the complications of testing even functional variability, testing quality attribute variability is a real challenge. What kind of testing can or need to be conducted as part of domain engineering, and how to take into account the impact of other variability? In which cases it is sufficient to resort to testing and tuning during the derivation?
8.
We acknowledge Digile N4S Program funded by Tekes.
9. 7.
CONCLUSIONS
We studied how performance can be varied purposefully in software product lines. Towards this aim, we conducted a comparison of two case studies on industrial software product lines in different domains. Thus, we contributed to the scarce reported evidence on industrial software product lines with quality attribute variability. Based on the results, performance differences can be distinguished to the customer as externally visible product performance, internal resources or target operating environments; the selected means must link to the customer value and her role. Within the product-line design and derivation, two challenges must be addressed: how to create desired differences between the products and how to ensure product performance. First, purposefully ”designing in” performance differences is crucial when performance or its adaptation are important for the customer. Performance differences can be created by scaling hardware, downgrading software, or trading off software. Second, due to the emergent nature of performance, testing performance and handling impact of other software variability are crucial to ensure product performance; these activities can be done at the product-line or product level. Model-based derivation and impact management as proposed in the current research may not always be necessary. The implications for the state-of-the-practice are as follows. When organizations plan quality attribute variability, they need to carefully think how to communicate quality attribute differences to the customers. The communication should be done in a way that links to the value of quality attribute differences and to the customer’s role. For example, if performance is varied due to price differentiation, it makes sense to describe the most valuable aspect of performance directly. If performance is varied because of adaptation to environment, it makes sense to describe the environment. Additionally, organizations need to decide how to create differences in quality attributes, test, and handle the impact of other variability in an efficient way. We identified different ways to do this. For example, downgrading is a suitable way to design performance variability for price dif-
ACKNOWLEDGMENTS REFERENCES
[1] L. Bass, P. Clements, and R. Kazman. Software Architecture in Practice. Addison-Wesley, 2003. [2] D. Benavides, P. T. Mart´ın-Arroyo, and A. R. Cort´es. Automated reasoning on feature models. In CAiSE, 2005. [3] R. Berntsson Svensson, T. Gorschek, B. Regnell, R. Torkar, A. Shahrokni, and R. Feldt. Quality requirements in industrial practice – an extended interview study at eleven companies. IEEE Trans. Softw. Eng., 38(4):923–935, 2012. [4] P. Clements and L. Northrop. Software Product Lines—Practices and Patterns. Addison-Wesley, 2001. [5] L. Etxeberria and G. Sagardui. Variability driven quality evaluation in software product lines. In SPLC, 2008. [6] B. Gonzales-Baixauli, J. Prado Leite, and J. Mylopoulos. Visual variability analysis for goal models. In RE, 2004. [7] S. Gregor. The nature of theory in information systems. MIS Q., 30(3):611–642, 2006. [8] J. Guo, J. White, G. Wang, J. Li, and Y. Wang. A genetic algorithm for optimized feature selection with resource constraints in software product lines. J Syst. Softw., 84(12), 2011. [9] S. Hallsteinsen, T. E. Fægri, and M. Syrstad. Patterns in product family architecture design. In PFE, 2003. [10] S. Hallsteinsen, G. Schouten, G. Boot, and T. Fægri. Dealing with architectural variation in product populations. In T. K¨ ak¨ ol¨ a and J. C. Due˜ nas, editors, Software Product Lines – Research Issues in Engineering and Management. Springer, 2006. [11] G. Halmans and K. Pohl. Communicating the variability of a software-product family to customers. Softw. Syst. Modeling, 2(1), 2003. [12] H. Holma and A. Toskala, editors. WCDMA for UMTS: radio access for third generation mobile communications. Wiley, 2000. [13] A. Huberman and M. Miles. The Qualitative Data Analysis. Sage Publications, 2 edition, 1994.
[14] Y. Ishida. Software product lines approach in enterprise system development. In SPLC, 2007. [15] ISO/IEC 25010. Software engineering—product quality—part 1: Quality model, 2011. [16] ISO/IEC 9126-1. Systems and software engineering—systems and software quality requirements and evaluation (SQuaRE)— system and software quality models, 2001. [17] S. Jarzabek, B. Yang, and S. Yoeun. Addressing quality attributes in domain analysis for product lines. IEE Proc.-Softw., 153(2), 2006. [18] K. Kang, S. Cohen, J. Hess, W. Novak, and A. Peterson. Feature-oriented domain analysis (FODA) feasibility study. Technical Report CMU/SEI-90-TR-21, SEI, 1990. [19] K. Kang, S. Kim, J. Lee, K. Kim, E. Shin, and M. Huh. Form: A feature-oriented reuse method with domain-specific reference architectures. Annals Softw. Engg., 5(1):143–168, 1998. [20] K. Kang, J. Lee, and P. Donohoe. Feature-oriented product line engineering. IEEE Softw., 19(4):58–65, 2002. [21] T. Kishi and N. Noda. Aspect-oriented analysis of product line architecture. In SPLC, 2000. [22] T. Kishi, N. Noda, and T. Katayama. A method for product line scoping based on a decision-making framework. In SPLC, 2002. [23] J. Kuusela and J. Savolainen. Requirements engineering for product families. In ICSE, 2000. [24] M. A. Laguna and B. Gonz´ alez-Baixauli. Product line requirements: Multi-paradigm variability models. In Workshop on RE, 2008. [25] J. Lee, K. Kang, P. Sawyer, and H. Lee. A holistic approach to feature modeling for product line requirements engineering. Requirements Engg. J., 19(4):377–395, 2014. [26] K. Lee and K. C. Kang. Using context as key driver for feature selection. In SPLC, 2010. [27] M. Matinlassi. Quality-driven software architecture model transformation. In WICSA, 2005. [28] D. Mellado, E. Fern´ andez-Medina, and M. Piattini. Security requirements engineering framework for software product lines. Information Softw. Techn., 52(10):1094 – 1117, 2010. [29] V. Myll¨ arniemi, M. Raatikainen, and T. M¨ annist¨ o. Inter-organisational approach in rapid software product family development—a case study. In ICSR, 2006. [30] V. Myll¨ arniemi, J. Savolainen, and T. M¨ annist¨ o. Performance variability in software product lines: A case study in the telecommunication domain. In SPLC, 2013. [31] V. Myll¨ arniemi, J. Savolainen, M. Raatikainen, and T. M¨ annist¨ o. Performance variability in software product lines: proposing theories from a case study. Empirical Softw. Engg., Online Feb 2015.
[32] E. Niemel¨ a and A. Immonen. Capturing quality requirements of product family architecture. Information Softw. Techn., 49(11-12), 2007. [33] M. Q. Patton. Qualitative Evaluation and Research Methods. Sage Publications, 2nd edition, 1990. [34] C. Quinton, R. Rouvoy, and L. Ducein. Leveraging feature models to configure virtual appliances. In CloudCP, 2012. [35] P. Runeson and M. H¨ ost. Guidelines for conducting and reporting case study research in software engineering. Empirical Softw. Engg., 14(2):131–164, 2009. [36] N. Siegmund, S. Kolesnikov, C. Kastner, S. Apel, D. Batory, M. Rosenm¨ uller, and G. Saake. Predicting performance via automated feature-interaction detection. In ICSE, 2012. [37] N. Siegmund, M. Rosenm¨ uller, C. Kastner, P. G. Giarrusso, S. Apel, and S. S. Kolesnikov. Scalable prediction of non-functional properties in software product lines: Footprint and memory consumption. Information Softw. Techn., 55(3):491 – 507, 2013. [38] N. Siegmund, M. Rosenm¨ uller, M. Kuhlemann, C. Kastner, S. Apel, and G. Saake. SPL Conqueror: Toward optimization of non-functional properties in software. Softw. Quality J, 20(3-4), 2012. [39] J. Sincero, W. Schroder-Preikschat, and O. Spinczyk. Approaching non-functional properties of software product lines: Learning from products. In APSEC, 2010. [40] M. Sinnema, S. Deelstra, J. Nijhuis, and J. Bosch. Modeling dependencies in product families with COVAMOF. In ECBS, 2006. [41] S. Soltani, M. Asadi, D. Gasevic, M. Hatala, and E. Bagheri. Automated planning for feature model configuration based on functional and non-functional requirements. In SPLC, 2012. [42] S. S. Stevens. On the theory of scales of measurement. Science, 103(2684):677–680, 1946. [43] A. Strauss and J. Corbin. Basics of Qualitative Research. Sage, 2 edition, 1998. [44] M. Svahnberg, J. van Gurp, and J. Bosch. A taxononomy of variability realization techniques. Softw. Pract. Exper., 35(8):705–754, 2005. [45] C. Urquhart, H. Lehmann, and M. D. Myers. Putting the theory back into grounded theory: guidelines for grounded theory studies in information systems. Inf. Syst. J, 20(4):357–381, 2010. [46] J. White, B. Dougherty, and D. C. Schmidt. Selecting highly optimal architectural feature sets with filtered cartesian flattening. J Syst. Softw., 82(8), 2009. [47] R. K. Yin. Case Study Research. Sage: Thousand Oaks, 2nd edition, 1994. [48] Y. Yu, J. C. S. do Prado Leite, A. Lapouchnian, and J. Mylopoulos. Configuring features with stakeholder goals. In SAC, 2008.