Prototyping and Development Frameworks
EICS'14, June 17–20, 2014, Rome, Italy
Towards a Measurement Framework for Tools’ Ceiling and Threshold Rui Alves
Claudio Teixeira
rui.alves @m-iti.org
Mónica Nascimento Amanda Marinho Nuno Jardim Nunes Madeira-ITI, University of Madeira Polo Científico e Tecnológico da Madeira, floor -2, Funchal, Portugal claudioteixeira7 monica.nascimento amanda.zacarias
[email protected] @gmail.com @m-iti.org @gmail.com
ABSTRACT
development tools are not very different from 25 years ago and many feel that opportunities for better tools are being lost to stagnation.
Software development tools are not catching up with the requirements of increasingly complex interactive software products and services. Successful tools are claimed to either be low-threshold/low-ceiling or high-threshold/high-ceiling, however no research to date addressed how to define and measure these concepts. This is increasingly important as these tools undergo an evaluation and adoption process by end-users.
In a classical paper about the past, present and future of user interface software tools Myers et al postulated about the themes that seem to be important in determining which tools were successful [11]. A highly relevant theme was how the threshold and ceiling of tools concepts were important in evaluating tools. The threshold is how difficult it is to learn how to use the tool, and the ceiling is how much can be done using the tool. In the late 90s the authors suggested that the most successful tools seem to be either low-threshold and low-ceiling, or high threshold and high ceiling [11]. However, we are still struggling with some of the basic challenges of 25 years ago. We still need tools that support developers in acquiring and sharing HCI and software engineering best practices. They ought to help refine and evolve basic methods to make them fit into particular project contexts [13].
Here we hypothesized that the evaluation and adoption of tools is associated with the threshold (learnability). To assess this we conducted a learnability and usability study using three commercial Platform-as-a-Service tools. In this study we used an augmented think-aloud protocol with question asking where ten subjects were asked to create a simple web application. Our data shows that most learnability issues fall into two categories: understanding or locating. No evidence was found that usability defects correlate with the tools learnability score. Though we found an inverse correlation between the amount of issues and the learnability score.
In this paper we describe our efforts to increase our understanding of the usability of interactive software development tools. Despite the amount of research in learnability and usability we found little evidence about defining and evaluating techniques to measure the threshold and ceiling of tools. For this purpose we decided to focus specifically on platform-as-a-service (PaaS) tools because they fall into an important and emerging market segment of low-threshold tools. PaaS are becoming highly popular in cloud computing. Since their business model relies on their adoption to make money (as opposed to sold units in shrink wrapped tools), adoption as a function of learnability is crucial to PaaS vendors.
Author Keywords
CASE tools; threshold; ceiling; learnability; PaaS. ACM Classification Keywords
H.5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous. INTRODUCTION
Software development tools are key in increasing the productivity and manageability of complex interactive software products and services. Applications and services today are built using a myriad of tools, from text editors to integrated development environments (IDE) including all sorts of modeling, editing, debugging and testing tools. Despite the high level of sophistication, interactive software
The next section discusses the state of the art in terms of tools adoption and in particular ceiling and threshold. The following section discusses the hybrid research protocol we devised to conduct our research case study, which is the focus of the subsequent section. We then present the results in detail and provide a discussion before the conclusion.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. EICS 2014, June 17–20, 2014, Rome, Italy. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2725-1/14/06..$15.00.
283
Prototyping and Development Frameworks
EICS'14, June 17–20, 2014, Rome, Italy
STATE OF THE ART
RESEARCH QUESTION
Computer-aided software engineering (CASE) refers to tools that provide automated assistance to software development [1]. The goal of CASE tools is to reduce the time and cost of software development and enhance the quality of the systems developed [1]. Nevertheless, both software engineers and designers often complain that their tools are unsupportive and unusable [11]. Despite the evidence that technology reasonably improves product quality and consistency, the relationship between practitioners and their tools has always been troublesome. While many studies have analyzed and tried to better support general software development practices [4,13], qualitatively studies about user interface related practices in software development are relatively rare. Seffah and Kline showed a gap between how tools represent and manipulate programs and the software developers’ actual experiences [13]. Their work quantitatively measured developers’ experiences using heuristic and psychometric evaluation. However, they did not specifically addressed threshold and ceiling related issues, which is the focus of this research.
In this work we hypothesized that threshold is associated with tool’s learnability, thus its adoption. If our hypothesis is confirmed, we could reposition learnability as a cornerstone success factor for low-threshold and high-ceiling tool, such as PaaS. As a matter of fact we claim that the success of a PaaS tool is directly related to effective tool adoption (in the sense of actual usage), not commercial success nor number of issued licenses. This initial effort focuses on learnability. A longitudinal study would be required in order to assess adoption. Due to time constraints, this longitudinal study was not performed, thus it is not covered by the work described in this paper. TEST PREPARATION AND SAMPLE
In our study, we have exposed ten subjects to three distinct PaaS tools, namely Knack, Mendix Business Modeler 4.7.0 and OutSystems Studio 8.0. These tools were chosen among several alternatives surveyed by our team. Since we are aiming at studying initial learnability the subjects were first time users (they never saw any of these three tools before). The test duration was around two hours and a half per subject, during which they were challenged to create a simple web application, which was meant for a small store (inventory of products and prices).
In the last two decades several authors addressed multiple CASE tool adoption issues [2,4,5,8,11]. This prior research suggest that: (i) few organizations use CASE tools; (ii) many organizations abandon the use of the tools; and (iii) countless developers, working for organizations that own CASE tools, do not actually use them [8]. Moreover, back in late 90’s, Jarzabek and Huang argued that CASE tools should be more user-oriented and support creative problem-solving aspects of software development, as well as rigorous modeling, in order to better blend into the software development practice [5]. Furthermore, CASE tools were expected to be based on sound models of software process and user behavior [5]. Yet, figures on these tools adoption seem to contradict the goals driving their development. Nevertheless, a new breed of tools emerged, in particular PaaS, which claim to bridge previous gaps and promise easier and faster development, even for non-technical users. Intrigued by these apparently conflicting forces, we found room to contribute with research that could advance the state of the art in CASE tool adoption. Here we investigate the facts providing evidence on the association between PaaS tools adoption and the ceiling and threshold classification levels of these tools. In the following subsection we further detail these two central concepts.
Subjects had to create this web application three times, one per tool. To accomplish it, we handed-out a set of high-level tasks, as exemplified in Table 1, to guide them in what to do, but not how to do it. As such, they were free to complete the tasks the way they deem more appropriate. Task T1 T2 T3 T4 T5
Description Create a database and populate it with the existing spreadsheet. Create pages for products with and without price. Add buttons to set price and add new product. Create the page to add new products. Create the page to set the price. Protect the set price page with a login. Table 1: Scenario tasks description.
Such a long test (almost three hours) is demanding, namely on subjects’ engagement and motivation. Upon surveying the literature, we have decided to augment the think-aloud protocol with the question-asking protocol [7]. The rational for this decision was to keep subjects highly motivated and truly engaged. The fact they could preview the result of their work as they progressed was a plus to achieve it. All these factors combined proved effective thus reducing the risk of subjects dropping the test.
Ceiling and Threshold
In the previously mentioned study about UI design tools, Myers et al stress the dichotomy between the sophistication of what can be created (the usefulness of a tool) against the ease of use (the learnability). This is directly related to the threshold and ceiling of tools. The threshold deals with the difficulty to learn how to use a tool, whereas the ceiling is how much can be accomplished with it [11]. The optimal approach is building tools that provide both low-threshold and high ceiling, however this is at the same time a grand challenge for modern software engineering [12].
A coach and an observer supported this process. All test sessions were video and audio recorded, upon subjects’ consent. Screen casts were also recorded in order to aid the process of identifying both learnability issues and usability defects.
284
Prototyping and Development Frameworks
EICS'14, June 17–20, 2014, Rome, Italy
Sample
The coach sat side-by-side with the participant, to see user actions and provide help, if needed. An HCI researcher sat in a location where he did not interfered and was invisible to the participant, as observer. The observer controlled timings and breaks and was taking notes regarding usability defects and learnability issues.
Our sample of ten subjects, six males, was divided in two groups of five elements each comprising both genders. The first group includes people who run small businesses. The other group was composed of IT users (software engineering master students). Their age ranged from 22 to 38 years old, with an average age of 28. Regarding the professional experience, 50% had worked for less than one year, 30% had four to nine years of experience, while the remaining 20% had worked for more than 10 years. In terms of academic background, 10% are not graduated and the same percentage holds a masters degree, while 80% had a bachelor degree.
In the beginning of the test the protocol rules (Table 2 and Table 3) were explained verbally to the subject. Then, printed copies of the high-level scenario instructions were handedout to him. From this point on, the coach talked with the subject in order to follow his progress in the scenario tasks and they reported what they were thinking during the execution of the tasks. Whenever users got blocked or tried to accomplish an action without success and needed help, the coach encouraged them to ask questions in order to avoid any stress, which could lead to a decrease in motivation to finalize the task. The coach was authorized to answer questions with minimum instructions only to allow subjects to proceed.
We inquired subjects about their experience regarding the creation of simple web sites. As such, 50% had experience in web applications. Among these subjects, IDEs, such as Netbeans (60%), Dreamweaver (40%) and 20% use content management systems (CMS) were the tools used to develop such applications. The most popular language to create these applications is PHP, known by 80% of the users.
a) Moderately verbalize the rational behind your main actions, intentions and thoughts. b) Focus on completing your tasks as if you were creating this for your own business project. c) Do not attempt to use built-in application support or online help, instead ask the coach. d) Ask for help whenever you feel that you cannot progress, but only after trying to do it. e) Ask only specific questions related to the tasks. f) Off-topic conversations are not allowed.
METHODOLOGY
We used a repeated measures design, since individual scores in one condition, can be paired with scores in the other conditions (using one or another tool). The study procedure included four major activities: 1. Record subjects’ activity while being tested. 2. Collect pre-test and pos-test surveys per subject. 3. Elicit learnability and usability issues from recordings. 4. Process and analyze data.
Table 2: Protocol instructions for the participant.
We performed three pilot tests in order to iteratively improve the participant’s and coach instructions as well as measure the test duration. The tests were done in a meeting room where subjects used a 15” laptop with built-in video, screen and audio recording software. These devices are not intrusive, thus subjects did not felt observed. Subjects gave informed consent for recording the test.
a) Whenever the participant is completely stuck, ask him if he wants help to accomplish the task. b) Whenever participant laps into silence, remind him to talk about the main actions he does. c) Do not make participant nervous or stressed. d) Avoid forcing the participant to talk all the time. e) Do not help if participant do not ask for help. f) If a task is incomplete or incorrect but is required to accomplish the next tasks, complete it quickly. g) Do not tutor or explain with details. h) Keep participant motivated. i) Answer with minimum procedure actions. j) Keep focus on the task completion flow and avoid distracting the user. k) Do not encourage off-topic conversations. Table 3: Protocol instructions for the coach.
Subjects were asked to perform the same tasks sequentially on the three tools. In order to neutralize the aggregated experience bias we shuffled the order of execution among our sample. This means that, for instance, our first subject was presented with PaaS1, PaaS3 and PaaS2, whereas the second participant used PaaS2, PaaS1 and PaaS3 and so on. All subjects used the same operating system and computer. Both the operating system and tools were setup in English.
This approach provided equal control conditions and welldefined tasks, thus reducing bias. For each tool, each participant went through a tutorial (around fifteen minutes per tool) followed by approximately 30 minutes to create the application proposed in the scenario. These timings were obtained as result of the three pilot tests we conducted before deploying the actual study. Still, users were not forced to complete the test scenario within 30 minutes, as we have not set any time limit. We did five minutes breaks between distinct PaaS, so that subjects could relax.
Scenario and Procedure
Upon arriving to the test room, subjects were briefed, to act as shop owners through a scenario. The shop owner wants to replace his spreadsheet to manage products, with a simple web application. This application uses a database and two major pages (one to display products without price and another page for products with price). Finally, subjects should set security settings in the set price page, since only the owner can set prices.
285
Prototyping and Development Frameworks
EICS'14, June 17–20, 2014, Rome, Italy Awareness is when a user is not aware of some feature, which is available to use in order to complete some action. Transition covers issues when the user is aware of some feature but chooses other features to complete the task (often more time consuming and difficult). Finally, a task-flow issue is when a user is knowledgeable of the task high level but do not know how to start it or the sequence of actions to achieve it [3]. In Table 5 we summarize all identified learnability categories weight, per tool and group.
RESULTS
Four datasets are analyzed in this study, namely (1) learnability scores, (2) learnability issues, (3) performance and (4) usability defects. Two groups are presented: (G1) which refers to IT subjects, (G2) business. Additionally, due to privacy issues, the data is anonymous, both regarding subjects and tools involved, to which we refer to as PaaS1, PaaS2 and PaaS3 from here on. Learnability Score
Among several learnability metrics [3], we have selected two metrics: (M1) the percentage of users who complete a task without any help and (M2) percentage of users who complete a task optimally. In M1 users cannot ask any kind of taskflow related questions, whereas in M2 users must complete the task straightforward, without help [10]. After gathering M1 and M2 percentages the scores were averaged from all users in both groups, where each metric had a weight of 0.5. This formula enabled us to plot the combined percentages from M1 and M2 to a 0-100% scale. For instance, if a group of n users complete all tasks without help but not optimally the system will have a learnability score of 50%. Conversely if the group completes all tasks optimally, the tool will have a score of 100% and if half of users complete optimally and the other half without help, system will end up with a 75% score. We classified the tools learnability score in four levels: (i) extremely difficult to use [0 to 24%], (ii) difficult to use [25 to 49%], (iii) easy to use [50 to 74%] and (iv) extremely easy to use [75 to 100%].
PaaS1
G1 G2 All
Easy 66% Easy 68% Easy 67%
PaaS2
PaaS2
PaaS3
G1
G2
G1
G2
G1
G2
Understanding
22%
52%
17%
11%
33%
52%
Locating
33%
22%
45%
38%
17%
9%
Awareness
22%
0%
14%
6%
17%
16%
Transition
17%
9%
3%
9%
0%
0%
Task Flow
6%
17%
21%
36%
33%
23%
Total
18
23
29
47
24
44
Table 5: Learnability issues.
Aggregating all values per category, we found that transition issues are the less frequent. By computing a ratio between the all categories and transition, we found what types of learnability issues hinder users the most (see Table 6). Likewise we also computed a weighted average, where G1 accounts for 25%, G2 another 25% and all together 50%. Because we have five subjects in each group and ten in total, thus the average is evenly weighted.
Table 4 summarizes the computed data for all three PaaS. Remarkable is the fact that in PaaS1 results do not vary across groups or when aggregating all subjects. The opposite happens with PaaS2, where G1 results position this tool as easy to learn, whereas for G2 it is an extremely difficult tool to learn. PaaS3 presents a blend of results, yet following PaaS2 results, yet with less extreme values. Group
PaaS1
Category
Category Understanding Locating Awareness Transition Task Flow
PaaS3
Easy 62% Easy 62% Ext. Difficult 20% Difficult 42% Difficult 41% Easy 52%
G1 3.6 4.7 2.6 1.0 3.0
G2 6.7 4.0 1.3 1.0 4.4
All 5.0 4.4 2.0 1.0 3.6
Average 1.7 1.5 0.7 0.3 1.2
Table 6: Learnability issues frequency.
Based on these results, empirical evidence seems to indicate that for first time users, PaaS1 and Paa3 are easy to learn whereas PaaS2 is difficult to learn.
Given these results, the most frequent learnability issues are related to understandability, closely followed by locating, task-flow is the third, followed by awareness and transition issues. On G2 understanding issues are 6.7 times more frequent than transition issues, whereas in G1 locating issues are 4.7 times more frequent than transition related problems.
Learnability Issues
Performance
Table 4: Learnability Scores with Classification
Within the scope of our test, performance stands for the elapsed time to complete a given task. The first fact is that IT users (G1) performed faster in all cases. In rough numbers, the difference between these two groups, ranged from 25% up to 50%. We realized that G1 performs better than G2 subjects in 66.6% of all tasks of the complete test.
Along with learnability scores we have identified also learnability issues. We have reused Grossman’s et al classification schema, where issues are classified according to five categories: (i) understanding, (ii) locating, (iii) awareness, (iv) transition and (iv) task-flow [3]. Understanding is when a user is aware of some functionality but could not understand how to use it. Locating is when a user is aware of some feature that was available in the tool but could not locate it in the interface.
286
Prototyping and Development Frameworks
EICS'14, June 17–20, 2014, Rome, Italy
Usability Defects
extracted the contextual features. After merging the previously extracted four sets of features we then created a checklist in order to match each tool’s features with the ones in the merged list (highest number of features).
Table 7 summarizes the total and average usability defects per user, for all evaluated tools. Group
PaaS1 ∑ x-
PaaS2 ∑
G1
27
5
G2
35
7
Total
62
6
PaaS3 x-
∑
30
6
40
8
50
10
70
14
80
8
110
10
We have applied this technique in our study which provided the following results: from a total of 31 features, PaaS1 matches with 29, PaaS2 matches with seven features and PaaS3 matches with 31. Therefore PaaS1 gets a ceiling score of 0.94, PaaS2 a score of 0.23 and 1.0 for PaaS3. A four level scale approach was used to classify the tools, namely: (i) extremely low ceiling [0 to 24%], (ii) low ceiling [25 to 49%], (iii) high ceiling [50 to 74%] and (iv) extremely high ceiling [75 to 100%]. In this scale PaaS2 is extremely low ceiling, while PaaS1 and PaaS3 are both extremely highceiling. Upon analyzing all this data we concluded that our results could be threatened by our sample size. As such we further investigated our sample size adequacy.
x-
Table 7: Usability Defects Encountered per User
Figures show that PaaS3 has more usability defects, followed by PaaS2 and PaaS1. We have clustered the usability defects according to their relation to the interface. Category
PaaS1
PaaS2
PaaS3
G1
G2
G1
G2
G1
G2
Icons
7%
0%
13%
26%
15%
21%
Bars/Windows
4%
9%
13%
12%
20%
16%
Canvas
0%
0%
0%
0%
10%
14%
Menus/Commands
22%
26%
33%
30%
15%
14%
Interaction
41%
37%
7%
12%
15%
17%
Text/Feedback
26%
29%
33%
20%
25%
17%
27
35
30
50
40
70
Total
Threshold and Ceiling Classification
Table 14 summarizes our test results, namely threshold and ceiling classification. Threshold is obtained by inverting the learnability scores, i.e., 100% learnability score means low threshold, as threshold a measure for how difficult is to learn a tool. Ceiling is determined directly, i.e., 100% matching of features is high ceiling. We have classified the tools according to the groups under evaluation in terms of ceiling and threshold. It’s important to remark that this classification is only valid for this experiment context, i.e. for the first time usage of the tools (initial learnability).
Table 8: Usability Defects
Most usability defects in PaaS2 are related to commands, while in PaaS3 the most common fall into the icons and text categories. In PaaS1 interaction aspects standout (Table 8). Usability defects that fall into the icons/graphics category are related to a graphical design issue (similar icons). Under the bars/windows category we clustered defects that are directly related to using bars or high level commands which belong to the tools' window (positioning of properties box). Defects classified as being canvas refer to the design area of the interface (egg. cannot drag items to put them above the table). When the defect was related to buttons or input fields then it was categorized as menus/commands, whereas defects classified as interaction are related to the interaction paradigm (. double clicking of an object creates something without questioning the user). Finally, the text/feedback category refers to issues on the textual terminology and text feedback from the interface. In addition we also measured each tool ceiling.
Group
PaaS1
PaaS2
PaaS3
G1
Low-Ceiling LowThreshold
High-Ceiling Low-Threshold
High-Ceiling LowThreshold
G2
Low-Ceiling LowThreshold
High-Ceiling High-Threshold
High-Ceiling HighThreshold
All
Low-Ceiling LowThreshold
High-Ceiling High-Threshold
High-Ceiling LowThreshold
Table 9: Ceiling and threshold classification matrix
Having these classifications, one cannot argue that PaaS2 is a high-ceiling and high-threshold as it is a generic classification, since as stated before this classification only applies to initial learnability. Additionally we would like to remark also that, according to literature [12], an optimal software tool should have a high-ceiling and low-threshold which we can observe in the following table several times in G1 but not for G2. The target audience of the tools under study is both G1 and G2, so any conclusions made from this table should be handled with this under consideration.
Ceiling
Ceiling is inherently a function of features: “how much can be done using the system” [11]. In order to be able to classify and compare the three tools under analysis, we propose a simple method to determine the ceiling. We extract the contextual features (e.g. centralized app governance, Service oriented architecture refactoring tools) from the three tools and identify an extra tool, which is supposed to be one of the most complete in the PaaS context. From this extra tool we
287
Prototyping and Development Frameworks
EICS'14, June 17–20, 2014, Rome, Italy
DISCUSSION
CONCLUSION
We found that the most frequent learnability issues are related with understanding issues. Thus, PaaS providers should pay special attention to this aspect, in order to increase their tools learnability, thus increasing their adoption chances. Moreover, locating issues are also common, which provides evidence that poor interaction design is being devoted to build these PaaS tools. Furthermore, by relating learnability scores to learnability issues, we observed that, in general, the greater amount of issues, the lower the learnability scores.
Our ongoing work aims to improve the state of the art in assessing tools’ ceiling and threshold. We found that most learnability issues fall into two categories: understanding or locating. Additionally, we found an inverse correlation between the amount of issues and the learnability score. We plan to expand the sample size and do a longitudinal study to validate identified trends and gather statistical evidence on existing correlations. We believe this work is valid to other tools thus increasing our understanding on how to design the tools of the future.
Regarding the ceiling/threshold discussion, our results are in line with Myers et al, who claim high-threshold tools also have a high-ceiling [11]. This reinforces the problem both academia and industry are still struggling with, and failing, to build tools that provide both low-threshold and high ceiling [12]. Having low-threshold and high ceiling, basically means that users can do development quite easily. This goal is proving to be a demanding challenge. Thus, further research is required to properly address this issue.
REFERENCES
1. Banker, R. and Kauffman, R. Reuse and productivity in integrated computer aided software engineering. Information Systems Working Papers Series, Vol, (1992). 2. Campos, P. and Nunes, N.J. Practitioner tools and workstyles for user-interface design. Software, IEEE 24, 1 (2007), 73–80. 3. Grossman, T., Fitzmaurice, G., and Attar, R. A survey of software learnability: metrics, methodologies and guidelines. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, (2009), 649–658. 4. Iivari, J. Why are CASE tools not used? Communications of the ACM 39, 10 (1996), 94–103. 5. Jarzabek, S. and Huang, R. The case for user-centered CASE tools. Commun. ACM 41, 8 (1998), 93–99. 6. Jeng, J. Usability assessment of academic digital libraries: effectiveness, efficiency, satisfaction, and learnability. Libri 55, 2-3 (2005), 96–121. 7. Kato, T. What “question-asking protocols” can say about the user interface. International Journal of Man-Machine Studies 25, 6 (1986), 659–673. 8. Lending, D. and Chervany, N.L. The use of CASE tools. Proceedings of the 1998 ACM SIGCPR conference on Computer personnel research, (1998), 49–58. 9. Lewis, J.R. Evaluation of procedures for adjusting problem-discovery rates estimated from small samples. International Journal of Human-Computer Interaction 13, 4 (2001), 445–479. 10. Linja-aho, M. Evaluating and Improving the Learnability of a Building Modeling System. Helsinki University of Technology, (2005). 11. Myers, B., Hudson, S.E., and Pausch, R. Past, present, and future of user interface software tools. ACM Trans. Comput.-Hum. Interact. 7, 1 (2000), 3–28. 12. Myers, B.A. and Rosson, M.B. Survey on user interface programming. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (1992), 195–202. 13. Seffah, A. and Metzker, E. The obstacles and myths of usability and software engineering. Communications of the ACM 47, 12 (2004), 71–76. 14. Virzi, R.A. Refining the test phase of usability evaluation: how many subjects is enough? Human Factors: The Journal of the Human Factors and Ergonomics Society 34, 4 (1992), 457–468.
As thoroughly discussed by Jeng, there is no consensus on what are the attributes that characterize usability but the most relevant attribute is learnability (53% of these authors support it) [6]. Yet, according to our data, when it comes to usability defects, no patterns were identified. This points towards a preliminary conclusion that, despite the intrinsic relationship between learnability and usability, in our experiment we have not found evidence that correlates usability defects with tools learnability score, learnability issues or performance. Further research will be needed to investigate the reasons behind these findings. In what concerns users’ performance, we found that the IT group (G1) performed faster in all cases. The difference between G1 and G2 users ranged from 25% in PaaS3 to 34% in PaaS1 and 48% in PaaS2. The post-test survey suggests that users from G1 have experience with look-alike tools, such as IDE tools. Another possibility is that G1 may be more familiar with the terminology and domain, yet we do not managed to gather evidence on this hypothesis. In our study we had no access to users’ performance standards (real users performance). Jeng claims that learnability is inferred from the amount of time required to achieve user performance standards [6]. In order to have standard performance figures we would need to perform an extended test with more tools and users. Nevertheless, our goal for this phase was to measure initial learnability, not a longitudinal study. One possible weakness of the presented work is the sample small size. These figures were constrained by the available time but are is in line with the fact that the first four to five subjects in an usability study discover around 80% of all usability defects [14], including the severest usability defects. Our sample size is adequate for covering 70% of learnability issues, according to Lewis [9].
288